[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Generic "read-mode" filter [SNIPPET INCLUDED]
- From: Boruch Baum <boruch_baum@xxxxxxx>
- Date: Wed, 30 May 2018 03:56:48 -0400
- X-ml-name: emacs-w3m
- X-mail-count: 12996
Some weeks or months ago, someone on the list complained about emacs-w3m
not having a 'reader-mode', such as exists in firefox (and other
browsers?). The idea is an option to remove all distracting material
from a page so the user can concentrate on reading content.
At the time, I pointed out that emacs-w3m does have filters, which can
do the same thing.
Today, I coded a generic filter that aims to perform much of what
'reader-mode' purportedly does. It is limited in that it only deals with
cruft *ABOVE* the main text of a web page, not cruft at the bottom. That
should be sufficient for most people, I think.
Instead of offering it as a patch, I decided to present it as a snippet,
but IMO it should be added to the default filters for the project if the
project deems it desirable.
(defun w3m-filter-generic-page-header (url)
(let (p1 p2 p3 title found)
(w3m-filter-delete-regions url "<head>" "<title>" t t)
(setq p1 (point))
(search-forward "</title>" nil t)
(setq title (buffer-substring-no-properties p1 (setq p2 (match-beginning 0))))
(w3m-filter-delete-regions url "</title>" "</head>" t t nil p2)
(setq p1 (point)
(while (and (not found) (re-search-forward "<h[^>]+>\\([^<]+\\)<" nil t))
(setq p2 (match-beginning 0))
(when (string-match (match-string 1) title)
(when (re-search-forward "<body" nil t)
(delete-region (match-beginning 0) p2)
(setq found t))))
(w3m-filter-delete-regions url "<body" "<h1" nil t)
(w3m-filter-delete-regions url "<body" "<h2" nil t)
(w3m-filter-delete-regions url "<body" "<h3" nil t)
(w3m-filter-delete-regions url "<body" "<h4" nil t))
"generic page header filter"
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0