[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: Extract real urls in Google search

In [emacs-w3m : No.11837] jidanni@xxxxxxxxxxx wrote:
> Even though I use the functions in
> http://jidanni.org/comp/configuration/.emacs-w3m
> still the links in
> http://www.google.com.tw/search?q=%E9%AB%98%E9%9B%84%E5%9C%96%E6%9B%B8%E9%A4%A8&ie=utf-8&oe=utf-8
> have
> http://www.google.com.tw/url?q=htt... attached.

Ok.  The regexp need to be improved.  Try this, or use the latest
emacs-w3m CVS:

--8<---------------cut here---------------start------------->8---
(eval-after-load "w3m-filter"
     (nconc w3m-filter-rules
	   '(("\\`https?://[a-z]+\\.google\\." w3m-filter-google)))
     (defun w3m-filter-google (url)
       "Extract real urls in Google search."
       (goto-char (point-min))
       (while (re-search-forward "\\(<a[\t\n ]+\\(?:[^\t\n >]+[\t\n ]+\\)*\
				 nil t)
      (insert (w3m-url-decode-string
		   (concat (match-string 1) (match-string 2) "\">")
		 (delete-region (match-beginning 0) (match-end 0)))))))))
--8<---------------cut here---------------end--------------->8---

> I also notice an interesting issue.
> If I browse
> httP://jidanni.org/comp/ instead of
> http://jidanni.org/comp/
> many of the link destinations in that page get messed up!

What differ between them?  I tried your .emacs-w3m and saw no