[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Generic "read-mode" filter [SNIPPET INCLUDED] [UPDATE]
The snippet as orignially posted fails for web pages with titles that
have embedded regex characters. The solution is to use function
`regex-quote' as follows ( marked by "; <<<<<<<").
(defun w3m-filter-generic-page-header (url)
(let (p1 p2 p3 title found)
(w3m-filter-delete-regions url "<head>" "<title>" t t)
(setq p1 (point))
(search-forward "</title>" nil t)
(setq title (buffer-substring-no-properties p1 (setq p2 (match-beginning 0))))
(w3m-filter-delete-regions url "</title>" "</head>" t t nil p2)
(setq p1 (point)
(while (and (not found) (re-search-forward "<h[^>]+>\\([^<]+\\)<" nil t))
(setq p2 (match-beginning 0))
(when (string-match (regexp-quote (match-string 1)) title) ; <<<<<<<
(when (re-search-forward "<body" nil t)
(delete-region (match-beginning 0) p2)
(setq found t))))
(w3m-filter-delete-regions url "<body" "<h1" nil t)
(w3m-filter-delete-regions url "<body" "<h2" nil t)
(w3m-filter-delete-regions url "<body" "<h3" nil t)
(w3m-filter-delete-regions url "<body" "<h4" nil t))
"generic page header filter"
On 2018-05-30 03:56, Boruch Baum wrote:
> Some weeks or months ago, someone on the list complained about emacs-w3m
> not having a 'reader-mode', such as exists in firefox (and other
> browsers?). The idea is an option to remove all distracting material
> from a page so the user can concentrate on reading content.
> At the time, I pointed out that emacs-w3m does have filters, which can
> do the same thing.
> Today, I coded a generic filter that aims to perform much of what
> 'reader-mode' purportedly does. It is limited in that it only deals with
> cruft *ABOVE* the main text of a web page, not cruft at the bottom. That
> should be sufficient for most people, I think.
> Instead of offering it as a patch, I decided to present it as a snippet,
> but IMO it should be added to the default filters for the project if the
> project deems it desirable.
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0