[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: Google broken

In [emacs-w3m : No.11830] Samuel Wales wrote:
> On 5/24/12, Katsumi Yamaoka <yamaoka@xxxxxxx> wrote:
>> Could you let me know a search word that reproduces the problem?

> I think it is every search.

>> Or the url of the search result page?  Since Google appears to

> http://www.google.com/search%3Fq%3Dkatsumi%2Bjpl%26btnG%3DSearch%26oe%3Dutf-8

> With your filter and (setf w3m-fill-column 50):

Ah, thanks.  I realized it's overkill that the filter removes
<br>s and trailing whitespace.  The new filter preserves a space
at the line-break point.  In addition, it makes text more easy-
to-read by separating ASCII and non-ASCII words with a space,
and inserting a space after a comma.

--8<---------------cut here---------------start------------->8---
(setq w3m-use-filter t)
(require 'w3m-filter)

(when (rassoc '(w3m-filter-google) w3m-filter-rules)
  (setcdr (rassoc '(w3m-filter-google) w3m-filter-rules)

(defun w3m-filter-google-2 (url)
  "Align table columns vertically to shrink the table width."
  (let ((case-fold-search t)
    (goto-char (point-min))
    (while (re-search-forward "<tr[\t\n\r >]" nil t)
      (when (w3m-end-of-tag "tr")
	  (narrow-to-region (goto-char (match-beginning 0))
			    (match-end 0))
	  (setq last nil)
	  (while (re-search-forward "<td[\t\n\r >]" nil t)
	    (when (w3m-end-of-tag "td")
	      (setq last (match-end 0))
	      (replace-match "<tr>\\&</tr>")))
	  (when last
	    (goto-char (+ 4 last))
	    (delete-char 4))
	  (goto-char (point-max)))))
    ;; Remove width spec and <br>s.
    (goto-char (point-min))
    (while (re-search-forward "<table[\t\n\r >]" nil t)
      (when (w3m-end-of-tag "table")
	  (narrow-to-region (goto-char (match-beginning 0))
			    (match-end 0))
	  (while (re-search-forward
		  "[\t\n\r ]*\\(?:width=\"[^\"]+\"\\|<br>\\)[\t\n\r ]*"
		  nil t)
	    ;; Preserve a space at the line-break point.
	    (replace-match " "))
	  ;; Insert a space between ASCII and non-ASCII characters
	  ;; and after a comma.
	  (goto-char (point-min))
	  (while (re-search-forward "\
\\([!-~]\\)\\([^ -~]\\)\\|\\([^ -~]\\)\\([!-~]\\)\\|\\(,\\)\\([^ ]\\)"
				    nil t)
	    (replace-match (cond ((match-beginning 1)
				  "\\1 \\2")
				 ((match-beginning 3)
				  "\\3 \\4")
				  "\\5 \\6"))))
	  (goto-char (point-max)))))))
--8<---------------cut here---------------end--------------->8---