[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: Google broken

In [emacs-w3m : No.11835] Katsumi Yamaoka wrote:
> Still I have no idea.

I found it out at last.  The culprit was the "rowspan" spec.
Try this new filter:

--8<---------------cut here---------------start------------->8---
(setq w3m-use-filter t)
(require 'w3m-filter)

(when (rassoc '(w3m-filter-google) w3m-filter-rules)
  (setcdr (rassoc '(w3m-filter-google) w3m-filter-rules)

(defun w3m-filter-google-2 (url)
  "Align table columns vertically to shrink the table width."
  (let ((case-fold-search t)
    (goto-char (point-min))
    (while (re-search-forward "<tr[\t\n\r >]" nil t)
      (when (w3m-end-of-tag "tr")
	  (narrow-to-region (goto-char (match-beginning 0))
			    (match-end 0))
	  (setq last nil)
	  (while (re-search-forward "<td[\t\n\r >]" nil t)
	    (when (w3m-end-of-tag "td")
	      (setq last (match-end 0))
	      (replace-match "<tr>\\&</tr>")))
	  (when last
	    (goto-char (+ 4 last))
	    (delete-char 4))
	  (goto-char (point-max)))))
    ;; Remove rowspan and width specs, and <br>s.
    (goto-char (point-min))
    (while (re-search-forward "<table[\t\n\r >]" nil t)
      (when (w3m-end-of-tag "table")
	  (narrow-to-region (goto-char (match-beginning 0))
			    (match-end 0))
	  (while (re-search-forward "\
\[\t\n\r ]*\\(?:\\(?:rowspan\\|width\\)=\"[^\"]+\"\\|<br>\\)[\t\n\r ]*"
				    nil t)
	    ;; Preserve a space at the line-break point.
	    (replace-match " "))
	  ;; Insert a space between ASCII and non-ASCII characters
	  ;; and after a comma.
	  (goto-char (point-min))
	  (while (re-search-forward "\
\\([!-~]\\)\\([^ -~]\\)\\|\\([^ -~]\\)\\([!-~]\\)\\|\\(,\\)\\([^ ]\\)"
				    nil t)
	    (forward-char -1)
	    (insert " ")
	  (goto-char (point-max)))))))
--8<---------------cut here---------------end--------------->8---