[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
 Re: Google broken
- From: Katsumi Yamaoka <yamaoka@xxxxxxx>
- Date: Mon, 28 May 2012 08:41:04 +0900
- X-ml-name: emacs-w3m
- X-mail-count: 11831
- References: <CAJcAo8smRS-FrmZXP5UmYQ9ym+kmsz_aCH1usEsSmVOSRYt0Zg@xxxxxxxxxxxxxx> <b4mobwm9n1g.fsf@xxxxxxx> <CAJcAo8vmSana6SSqEpwWuO8A_KTNTe_JpOMs3qyOZ_Zgtyx8OA@xxxxxxxxxxxxxx> <b4m1um83ket.fsf@xxxxxxx> <CAJcAo8vAAVuHDj5CJPUtkMJKTPNf_VfPtFojh9Pt80_QcK6HYA@xxxxxxxxxxxxxx>
In [emacs-w3m : No.11830] Samuel Wales wrote:
> On 5/24/12, Katsumi Yamaoka <yamaoka@xxxxxxx> wrote:
>> Could you let me know a search word that reproduces the problem?
> I think it is every search.
>> Or the url of the search result page?  Since Google appears to
> http://www.google.com/search%3Fq%3Dkatsumi%2Bjpl%26btnG%3DSearch%26oe%3Dutf-8
> With your filter and (setf w3m-fill-column 50):
Ah, thanks.  I realized it's overkill that the filter removes
<br>s and trailing whitespace.  The new filter preserves a space
at the line-break point.  In addition, it makes text more easy-
to-read by separating ASCII and non-ASCII words with a space,
and inserting a space after a comma.
--8<---------------cut here---------------start------------->8---
(setq w3m-use-filter t)
(require 'w3m-filter)
(when (rassoc '(w3m-filter-google) w3m-filter-rules)
  (setcdr (rassoc '(w3m-filter-google) w3m-filter-rules)
	  '(w3m-filter-google-2)))
(defun w3m-filter-google-2 (url)
  "Align table columns vertically to shrink the table width."
  (let ((case-fold-search t)
	last)
    (goto-char (point-min))
    (while (re-search-forward "<tr[\t\n\r >]" nil t)
      (when (w3m-end-of-tag "tr")
	(save-restriction
	  (narrow-to-region (goto-char (match-beginning 0))
			    (match-end 0))
	  (setq last nil)
	  (while (re-search-forward "<td[\t\n\r >]" nil t)
	    (when (w3m-end-of-tag "td")
	      (setq last (match-end 0))
	      (replace-match "<tr>\\&</tr>")))
	  (when last
	    (goto-char (+ 4 last))
	    (delete-char 4))
	  (goto-char (point-max)))))
    ;; Remove width spec and <br>s.
    (goto-char (point-min))
    (while (re-search-forward "<table[\t\n\r >]" nil t)
      (when (w3m-end-of-tag "table")
	(save-restriction
	  (narrow-to-region (goto-char (match-beginning 0))
			    (match-end 0))
	  (while (re-search-forward
		  "[\t\n\r ]*\\(?:width=\"[^\"]+\"\\|<br>\\)[\t\n\r ]*"
		  nil t)
	    ;; Preserve a space at the line-break point.
	    (replace-match " "))
	  ;; Insert a space between ASCII and non-ASCII characters
	  ;; and after a comma.
	  (goto-char (point-min))
	  (while (re-search-forward "\
\\([!-~]\\)\\([^ -~]\\)\\|\\([^ -~]\\)\\([!-~]\\)\\|\\(,\\)\\([^ ]\\)"
				    nil t)
	    (replace-match (cond ((match-beginning 1)
				  "\\1 \\2")
				 ((match-beginning 3)
				  "\\3 \\4")
				 (t
				  "\\5 \\6"))))
	  (goto-char (point-max)))))))
--8<---------------cut here---------------end--------------->8---