[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

w3m-filter-delete-regions feature enhancements (patch included)

I'm finding it useful to work with a modified version of function
`w3m-filter-delete-regions', so I'm offering it here as suggestion to
be included in the package.

The first modification is to optionally search on REGEXes instead of
strings. I've found this useful because many html pages use, for
example, <DIV> tag identifiers that combine a page generic code and a
page specific ones.

For example, pages from slashdot.org include tags such as

  #+BEGIN_SRC java
  <div class="commentSub" id="comment_sub_53828379">
  <div id="comment_body_53828379">

where the numeric suffix of the id is not only page specific, but
specific to a particular comment on the page.

The second modification is to add optional arguments to set whether
the start and end strings/regexes are themselves deleted or not.

In the code below, three args are added at the end as optional so as
to maintain backwards-compatability with the previous version of the

If this change is accpeted, a follow-up would be to tweak a few other
filter functions so all of them have a consistent format. I'd be happy
to supply that patch once this one is accepted.

Also, if there's interest, I'd be happy to share my slashdot filter.
As you're probably familiar, without logging into a slashdot account
and setting a particular preference, one is served pages with a lot of
javascript. w3m strips the javascript but retains many dead links that
are javascript-operated. My filter removes those, as well as header
and footer cruft.

Note: In about 20 hours, I'll be offline for about 55 hours for the
upcoming Jewish holiday.


(defun w3m-filter-delete-regions (url start end &optional with-start with-end use-regex)
  "Delete regions surrounded with a START pattern and an END pattern.

  If argument WITH-START is nil, do not delete START strings.
  If argument WITH-END is nil, do not delete the END strings.
  If argument USE-REGEX is t, treat START and END as REGEXes"
  (goto-char (point-min))
  (let (p (i 0))
        (if use-regex
          (re-search-forward start nil t)
         (search-forward start nil t))
		(setq p (if with-start (match-beginning 0) (match-end 0)))
        (if use-regex
          (re-search-forward end nil t)
         (search-forward end nil t)))
      (delete-region p (if with-end (match-end 0) (match-beginning 0)))
      (incf i))
    (> i 0)))

CA45 09B5 5351 7C11 A9D1  7286 0036 9E45 1595 8BC0