[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: w3m-filter-delete-regions feature enhancements (patch included)
On 2017-05-30 19:22, Katsumi Yamaoka wrote:
> In [emacs-w3m : No.12664]
> . . .
>
> Ok, I'm not so perversity not to implement it. ;)
> Do you have a good idea to toggle those filters using
> `w3m-toggle-filtering' (with prefix arg)?
>
> I made it omit the filters that use `w3m-filter-delete-regions'
> even if they are listed in `w3m-filter-configuration' because
> I have no idea for a way for a user to identify such a filter.
> Instead, I think it's better to remake every those filter so as
> to have a function name, e.g.:
Ooh, that's right, I had overlooked that. Your solution sounds good.
> (nil
> ("Remove garbage in http://www.geocities.co.jp/*"
> "http://www.geocities.co.jp/* でゴミを取り除きます")
> "\\`http://www\\.geocities\\.co\\.jp/"
> w3m-filter-geocities)
>
> In that case `w3m-filter-delete-regions' will get useless, or it
> will not necessarily need to be more flexible.
My idea was that functions `w3m-filter-delete-regions' and
`w3m-filter-replace-regex' were generic support functions to help user
build their own specific filters.
> In addition, could you show some filter examples by which the
> revised `w3m-filter-delete-regions' is used conveniently?
Sure, I have a bunch handy already, two of which are for sites popular
enough to be considered for the package. The following code contains:
1] My latest version of `w3m-filter-delete-regions', which adds
optional bounds and count to limit the scope of the function.
2] A modified version of 'w3m-filter-replace-regex'
3 & 4] Filters for slashdot.org and rt.com
=====================================================================
(defun w3m-filter-delete-regions (url start end &optional without-start without-end use-regex start-pos end-pos count)
"Delete regions surrounded with a START pattern and an END pattern.
If argument WITH-START is nil, do not delete START strings.
If argument WITH-END is nil, do not delete the END strings.
If argument USE-REGEX is t, treat START and END as REGEXes.
Argument START-POS is a position from which to begin deletions.
Argument END-POS is a position at which to stop deletions.
Argument count is the maximum number of deletions to make."
(when (not start-pos) (setq start-pos (point-min)))
(when (not end-pos) (setq end-pos (point-max)))
(goto-char start-pos)
(let (p (i 0))
(while
(and
(if count (< i count) t)
(if use-regex
(re-search-forward start end-pos t)
(search-forward start end-pos t))
(setq p (if without-start (match-end 0) (match-beginning 0)))
(if use-regex
(re-search-forward end end-pos t)
(search-forward end end-pos t)))
(delete-region p (if without-end (match-beginning 0) (match-end 0)))
(incf i))
(> i 0)))
(defun w3m-filter-replace-regexp (url regexp to-string &optional start-pos end-pos count)
"Replace all occurrences of REGEXP with TO-STRING.
Optional args START-POS, END-POS, and COUNT limit the scope
of the replacements"
(when (not start-pos) (setq start-pos (point-min)))
(when (not end-pos) (setq end-pos (point-max)))
(goto-char start-pos)
(let ((i 0))
(while
(and
(if count (< i count) t)
(re-search-forward regexp end-pos t))
(replace-match to-string nil nil)
(incf i))
(> i 0)))
(defun w3m-filter-rt (url)
"filter top and bottom cruft for rt.com."
(w3m-filter-delete-regions url
"<body.*>" "<h1.*>" t t t nil nil 1)
(w3m-filter-delete-regions url
"<div class=\"layout__footer\"" "</body>"))
(add-to-list 'w3m-filter-configuration
'(t "filter for rt.com" "\\`https://www.rt\\.com/" w3m-filter-rt))
(defun w3m-filter-slashdot (url)
"filter js deadlinks, top and bottom cruft for slashdot"
(w3m-filter-delete-regions url
"<body.*>" "<h2.*>" t t t nil nil 1)
(when
(search-forward "<aside class=\"grid_24 view_mode\">" nil t)
(w3m-filter-replace-regexp url "<i>" "" nil (match-beginning 0)))
(w3m-filter-delete-regions url
"<aside class=\"grid_24 view_mode\">"
"<span class=\"totalcommentcnt\">" nil nil t (point) nil 1)
(insert "<h2>")
(w3m-filter-delete-regions url
"</a>"
"<ul id=\"commentlisting\" class=\"d2\">" nil nil t (point) nil 1)
(insert "</h2><ul>")
(w3m-filter-delete-regions url
"<div class=\"grid_10 d1or2\""
"<section id=\"besttabs.*>" nil nil t (point) nil 1)
(w3m-filter-delete-regions url
"<div class=\"commentSub\""
"<div id=\"replyto_[0-9]+\">" nil nil t)
(w3m-filter-delete-regions url
"<a id=\"reply_link_"
"Flag as Inappropriate</a>")
(w3m-filter-delete-regions url
"<noscript><p><b>There may be more comments"
"</body>"))
(add-to-list 'w3m-filter-configuration
'(t "filter for slashdot"
"\\`http[s]?://\\([a-z]+\\.\\)?slashdot\\.org/"
w3m-filter-slashdot))
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0