[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

[UPDATE] Re: Filters for stackexchange and youtube [SNIPPET]

New version of the snippet attached, but see my notes first.

On 2018-06-07 16:03, Katsumi Yamaoka wrote:
> In [emacs-w3m:12999]
> On Fri, 01 Jun 2018 08:58:55 -0400, Boruch Baum wrote:
> > I have filter functions for stackexchange and for youtube, and since
> > those are very commonly used site, I thought to share it for
> > consideration to be added to w3m-filter.el
> Because
> (byte-compile (lambda () (replace-regexp "REGEXP" "TO-STRING")))
> => Warning: `replace-regexp' is for interactive use only;
>    use `re-search-forward' and `replace-match' instead.
> , I've temporarily replaced it with a function that uses
> `perform-replace' in `w3m-filter-stackexchange' locally.  Anyway,
> it's better to try ``make clean lisp'' before posting a patch.

1] My experience today is that your proposed solution broke the filter;
   function `perform-replace' is itself a part of `replace-regexp', and
   was prompting the user repeatedly for responses, something that wasn't
   happening to me for 'replace-regexp'.

2] The solution re-coded per recommendation written in the compiler
   warning was even worse, due to some side-effect of how emacs-w3m is
   structured. For some reason, a subset of the search `while' loops
   would generate an "error in process sentinel: while: Search failed:".

3] The attached snippet thus includes a comment asking myself or someone else
   to track down whatever process sentinel is responsible, and fix the
   error. For the interim, I found myself doing what I was alarmed at
   seeing so often elsewhere in the code-base, and one of the thing that
   I started removing in the refactoring: I wrapped the code in a
   `condition-case' statement to suppress the error!

   3.1] For all I know, this may be exactly how many / most / all those
        other `condition-case' statements leaked into the code!

4] It would have simpler to just live with the compiler warnings, since
   they are just warnings, and keep the original version, but it is
   better to be strict / formal and not generate any warnings at all, so
   here attached is the version with the 'condition-case', byte-compiled
   with no errors.

CA45 09B5 5351 7C11 A9D1  7286 0036 9E45 1595 8BC0
(defun w3m-filter-stackexchange (url)
  "Filter top and bottom cruft for stackexchange.com."
  (w3m-filter-delete-regions url
    "<body.*>" "<h1.*>" t t t nil nil 1)
  (w3m-filter-delete-regions url
    "<h2 class=\"space\">Your Answer</h2>" "<h4 id=\"h-related\">Related</h4>"
    nil t nil nil nil 1)
  (w3m-filter-delete-regions url
    "<div id=\"hot-network-questions\" class=\"module tex2jax_ignore\">" "</body>"
    nil t nil nil nil 1)

;  (when (search-forward "<table>" nil t)
;    (replace-match ""))

  (goto-char (point-min))
  (w3m-filter-delete-regions url
    "<a class=\"vote-[ud]"
    "</a>" nil nil t (point))
  (goto-char (point-min))
  (w3m-filter-delete-regions url
    "<a class=\"star-off"
  (w3m-filter-replace-regexp url
    "<span itemprop=\"upvoteCount[^>]+>"
    "Votes: ")
  (w3m-filter-replace-regexp url
    "<div class=\"post-text[^>]+>"
  (w3m-filter-replace-regexp url
    "<div class=\"post-taglist[^>]+>"
  (w3m-filter-delete-regions url
    "<a name='new-answer'>"
    "</form>" nil nil nil nil nil 1)

  (w3m-filter-replace-regexp url
    "<div class=\"spacer\">[^>]+>[^>]+>+?\\([0-9]+\\)</div></a>"
    "\\1 ")
  (w3m-filter-delete-regions url
    "<td class=\"vt\">"

  (goto-char (point-min))
  ; TODO: FIXME: The following condition-case is a kludge because when
  ; the `re-search-forward' statements were not finding anything, we
  ; were getting "error in process sentinel: while: Search failed:". The
  ; proper solution is likely in the process sentinel (whichever one
  ; that turns out to be), not here.
  (condition-case nil
    (while (search-forward "<div class=\"user-info \">" nil t)
      (let ((p1 (match-end 0))
            (p2 (if (search-forward "<li" nil t)
                  (match-beginning 0)
        (w3m-filter-delete-regions url
          "<div class=\"user-details\">" "</a>" nil nil nil p1 p2)
        (goto-char p1)
        (while (re-search-forward "</?div[^>]*>" p2 nil)
          (replace-match ""))
        (goto-char p1)
        (while (re-search-forward "<span class=\"reputation-score[^>]*>" p2 nil)
          (replace-match "[rep:"))
        (goto-char p1)
        (while (re-search-forward "<span class=\"badge1\">" p2 nil)
          (replace-match  "] [gold:"))
        (goto-char p1)
        (while (re-search-forward "<span class=\"badge2\">" p2 nil)
          (replace-match  "] [silver:"))
        (goto-char p1)
        (while (re-search-forward "<span class=\"badge3\">" p2 nil)
          (replace-match  "] [bronze:"))
        (goto-char p1)
        (while (re-search-forward "</?span[^>]*>" p2 nil)
          (replace-match  ""))))

  (w3m-filter-replace-regexp url
    "<td" "<td valign=top")

  (w3m-filter-delete-regions url
    "<div id=\"tabs\">"
    "<a name" nil t)

  (goto-char (point-min))
  (while (search-forward "<div id=\"answer-" nil t)
    (replace-match "</ul><hr>\\&"))

  (w3m-filter-delete-regions url
    "<div id=\"comments-link"

  (goto-char (point-min))
  (when (search-forward "<h4 id=\"h-linked\">Linked</h4>" nil t)
    (replace-match "<p><b>Linked</b><br>")
    (let ((p1 (match-end 0))
          (p2 (progn
                (search-forward "<h4" nil t)
                (match-beginning 0))))
      (goto-char p1)
      (while (re-search-forward "^\t</a>" p2 nil)
        (replace-match  ""))
      (goto-char p1)
      (while (re-search-forward "</a>" p2 nil)
        (replace-match "</a><br>"))
      (goto-char p1)
      (while (re-search-forward "</div>" p2 nil)
        (replace-match " "))
      (w3m-filter-delete-regions url
        "<div class=\"spacer\">"
        "<div class=\"answer-votes answered-accepted [^>]+>"
        nil nil t)))

  (goto-char (point-min))
  (when (search-forward "<table id=\"qinfo\">" nil t)
    (replace-match "")
    (let ((p1 (match-end 0))
          (p2 (progn
                (search-forward "</table>" nil t)
                (replace-match "")
                (match-end 0))))
      (w3m-filter-replace-regexp url "<tr>" "" p1 p2)
      (w3m-filter-replace-regexp url "</tr>" "<br>" p1 p2)
      (w3m-filter-replace-regexp url "</?td[^>]*>" "" p1 p2)
      (w3m-filter-replace-regexp url "<b>" "" p1 p2)
      (w3m-filter-replace-regexp url "<a[^>]+>" "" p1 p2)
      (w3m-filter-replace-regexp url "</?p[^>]*>" "" p1 p2))))