[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: Generic "read-mode" filter [SNIPPET INCLUDED] [UPDATE]



The snippet as orignially posted fails for web pages with titles that
have embedded regex characters. The solution is to use function
`regex-quote' as follows ( marked by "; <<<<<<<").

(defun w3m-filter-generic-page-header (url)
  (let (p1 p2 p3 title found)
   (and
     (w3m-filter-delete-regions url "<head>" "<title>" t t)
     (setq p1 (point))
     (search-forward "</title>" nil t)
     (setq title (buffer-substring-no-properties p1 (setq p2 (match-beginning 0))))
     (w3m-filter-delete-regions url "</title>" "</head>" t t nil p2)
     (setq p1 (point)
           p3 p1)
     (or
       (while (and (not found) (re-search-forward "<h[^>]+>\\([^<]+\\)<" nil t))
         (setq p2 (match-beginning 0))
         (when (string-match (regexp-quote (match-string 1)) title)  ; <<<<<<<
           (goto-char p1)
           (when (re-search-forward "<body" nil t)
             (delete-region (match-beginning 0) p2)
             (setq found t))))
       (w3m-filter-delete-regions url "<body" "<h1" nil t)
       (w3m-filter-delete-regions url "<body" "<h2" nil t)
       (w3m-filter-delete-regions url "<body" "<h3" nil t)
       (w3m-filter-delete-regions url "<body" "<h4" nil t))
     (goto-char p3)
     (insert "<body>"))))

(add-to-list 'w3m-filter-configuration
  '(t
    "generic page header filter"
    "\\`http[s]?://"
    w3m-filter-generic-page-header))


On 2018-05-30 03:56, Boruch Baum wrote:
> Some weeks or months ago, someone on the list complained about emacs-w3m
> not having a 'reader-mode', such as exists in firefox (and other
> browsers?). The idea is an option to remove all distracting material
> from a page so the user can concentrate on reading content.
>
> At the time, I pointed out that emacs-w3m does have filters, which can
> do the same thing.
>
> Today, I coded a generic filter that aims to perform much of what
> 'reader-mode' purportedly does. It is limited in that it only deals with
> cruft *ABOVE* the main text of a web page, not cruft at the bottom. That
> should be sufficient for most people, I think.
>
> Instead of offering it as a patch, I decided to present it as a snippet,
> but IMO it should be added to the default filters for the project if the
> project deems it desirable.
> ...

-- 
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1  7286 0036 9E45 1595 8BC0