[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: w3m-filter-delete-regions feature enhancements (patch included)



On 2017-05-30 19:22, Katsumi Yamaoka wrote:
> In [emacs-w3m : No.12664]
> . . .
>
> Ok, I'm not so perversity not to implement it. ;)
> Do you have a good idea to toggle those filters using
> `w3m-toggle-filtering' (with prefix arg)?
>
> I made it omit the filters that use `w3m-filter-delete-regions'
> even if they are listed in `w3m-filter-configuration' because
> I have no idea for a way for a user to identify such a filter.
> Instead, I think it's better to remake every those filter so as
> to have a function name, e.g.:

Ooh, that's right, I had overlooked that. Your solution sounds good.

>     (nil
>      ("Remove garbage in http://www.geocities.co.jp/*";
>       "http://www.geocities.co.jp/* でゴミを取り除きます")
>      "\\`http://www\\.geocities\\.co\\.jp/";
>      w3m-filter-geocities)
>
> In that case `w3m-filter-delete-regions' will get useless, or it
> will not necessarily need to be more flexible.

My idea was that functions `w3m-filter-delete-regions' and
`w3m-filter-replace-regex' were generic support functions to help user
build their own specific filters.

> In addition, could you show some filter examples by which the
> revised `w3m-filter-delete-regions' is used conveniently?

Sure, I have a bunch handy already, two of which are for sites popular
enough to be considered for the package. The following code contains:

1] My latest version of `w3m-filter-delete-regions', which adds
optional bounds and count to limit the scope of the function.

2] A modified version of 'w3m-filter-replace-regex'

3 & 4] Filters for slashdot.org and rt.com

=====================================================================

(defun w3m-filter-delete-regions (url start end &optional without-start without-end use-regex start-pos end-pos count)
  "Delete regions surrounded with a START pattern and an END pattern.

  If argument WITH-START is nil, do not delete START strings.
  If argument WITH-END is nil, do not delete the END strings.
  If argument USE-REGEX is t, treat START and END as REGEXes.
  Argument START-POS is a position from which to begin deletions.
  Argument END-POS is a position at which to stop deletions.
  Argument count is the maximum number of deletions to make."
  (when (not start-pos) (setq start-pos (point-min)))
  (when (not end-pos) (setq end-pos (point-max)))
  (goto-char start-pos)
  (let (p (i 0))
    (while
      (and
        (if count (< i count) t)
        (if use-regex
          (re-search-forward start end-pos t)
         (search-forward start end-pos t))
        (setq p (if without-start (match-end 0) (match-beginning 0)))
        (if use-regex
          (re-search-forward end end-pos t)
         (search-forward end end-pos t)))
      (delete-region p (if without-end (match-beginning 0) (match-end 0)))
      (incf i))
    (> i 0)))

(defun w3m-filter-replace-regexp (url regexp to-string &optional start-pos end-pos count)
  "Replace all occurrences of REGEXP with TO-STRING.

  Optional args START-POS, END-POS, and COUNT limit the scope
  of the replacements"
  (when (not start-pos) (setq start-pos (point-min)))
  (when (not end-pos) (setq end-pos (point-max)))
  (goto-char start-pos)
  (let ((i 0))
    (while
      (and
        (if count (< i count) t)
        (re-search-forward regexp end-pos t))
     (replace-match to-string nil nil)
     (incf i))
    (> i 0)))

(defun w3m-filter-rt (url)
  "filter top and bottom cruft for rt.com."
  (w3m-filter-delete-regions url
    "<body.*>" "<h1.*>" t t t nil nil 1)
  (w3m-filter-delete-regions url
    "<div class=\"layout__footer\"" "</body>"))
(add-to-list 'w3m-filter-configuration
  '(t "filter for rt.com" "\\`https://www.rt\\.com/"; w3m-filter-rt))

(defun w3m-filter-slashdot (url)
  "filter js deadlinks, top and bottom cruft for slashdot"
  (w3m-filter-delete-regions url
    "<body.*>" "<h2.*>" t t t nil nil 1)
  (when
    (search-forward "<aside class=\"grid_24 view_mode\">" nil t)
   (w3m-filter-replace-regexp url "<i>" "" nil (match-beginning 0)))
  (w3m-filter-delete-regions url
    "<aside class=\"grid_24 view_mode\">"
    "<span class=\"totalcommentcnt\">" nil nil t (point) nil 1)
  (insert "<h2>")
  (w3m-filter-delete-regions url
    "</a>"
    "<ul id=\"commentlisting\" class=\"d2\">" nil nil t (point) nil 1)
  (insert "</h2><ul>")
  (w3m-filter-delete-regions url
    "<div class=\"grid_10 d1or2\""
     "<section id=\"besttabs.*>" nil nil t (point) nil 1)
  (w3m-filter-delete-regions url
    "<div class=\"commentSub\""
    "<div id=\"replyto_[0-9]+\">" nil nil t)
  (w3m-filter-delete-regions url
    "<a id=\"reply_link_"
    "Flag as Inappropriate</a>")
  (w3m-filter-delete-regions url
    "<noscript><p><b>There may be more comments"
    "</body>"))
(add-to-list 'w3m-filter-configuration
  '(t "filter for slashdot"
       "\\`http[s]?://\\([a-z]+\\.\\)?slashdot\\.org/"
    w3m-filter-slashdot))

--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1  7286 0036 9E45 1595 8BC0