[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Extract real urls in Google search
- From: Katsumi Yamaoka <yamaoka@xxxxxxx>
- Date: Mon, 04 Jun 2012 08:29:15 +0900
- X-ml-name: emacs-w3m
- X-mail-count: 11838
- References: <87ehpwzh1z.fsf@xxxxxxxxxxx>
In [emacs-w3m : No.11837] jidanni@xxxxxxxxxxx wrote:
> Even though I use the functions in
> http://jidanni.org/comp/configuration/.emacs-w3m
> still the links in
> http://www.google.com.tw/search?q=%E9%AB%98%E9%9B%84%E5%9C%96%E6%9B%B8%E9%A4%A8&ie=utf-8&oe=utf-8
> have
> http://www.google.com.tw/url?q=htt... attached.
Ok. The regexp need to be improved. Try this, or use the latest
emacs-w3m CVS:
--8<---------------cut here---------------start------------->8---
(eval-after-load "w3m-filter"
'(progn
(nconc w3m-filter-rules
'(("\\`https?://[a-z]+\\.google\\." w3m-filter-google)))
(defun w3m-filter-google (url)
"Extract real urls in Google search."
(goto-char (point-min))
(while (re-search-forward "\\(<a[\t\n ]+\\(?:[^\t\n >]+[\t\n ]+\\)*\
href=\"\\)/\\(?:imgres\\?imgurl\\|url\\?q\\)=\\([^&]+\\)[^>]+>"
nil t)
(insert (w3m-url-decode-string
(prog1
(concat (match-string 1) (match-string 2) "\">")
(delete-region (match-beginning 0) (match-end 0)))))))))
--8<---------------cut here---------------end--------------->8---
> I also notice an interesting issue.
> If I browse
> httP://jidanni.org/comp/ instead of
> http://jidanni.org/comp/
> many of the link destinations in that page get messed up!
What differ between them? I tried your .emacs-w3m and saw no
difference.