[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF-8 links in big5 page
- From: Katsumi Yamaoka <yamaoka@xxxxxxx>
- Date: Thu, 13 Oct 2011 12:54:30 +0900
- X-ml-name: emacs-w3m
- X-mail-count: 11649
- References: <87fwix3c38.fsf@xxxxxxxxxxx>
In [emacs-w3m : No.11648] jidanni@xxxxxxxxxxx wrote:
> In http://sex.ncu.edu.tw/activities/recent.htm
> emacs-w3m thinks a link is
> 404
> http://sex.ncu.edu.tw/activities/documents/%B3%B7%A6%5A%A8%67%AD%B7(%C1%BF%AE%79).pdf
> Firefox thinks it is
> 400
> http://sex.ncu.edu.tw/activities/documents/%E9%9B%AA%E5%90%8E%E7%8B%82%E9%A2%A8(%E8%AC%9B%E5%BA%A7).pdf
> Maybe emacs-w3m is right (big5), but Firefox gets us the PDF.
AFAIK some sites require a browser to use the charset that is used
to encode the page to encode a url to retrieve, some allow both
page's charset and utf-8, and some require utf-8 unconditionally.
This is the last case, though emacs-w3m follows the first one.
I don't know what is the majority, but I think we need to have
an option to alter the behavior site by site anyway. I'll work
on this. Maybe using utf-8 always will be the default. Here is
a makeshift workaround:
(defadvice w3m-url-transfer-encode-string (before modify-charset
(url &optional coding)
activate)
"Use `utf-8' to encode urls to retrieve for http://*.ncu.edu.tw/."
(when (string-match
"\\`https?://\\(?:[^./?#]+\\.\\)*ncu\\.edu\\.tw/"
url)
(setq coding 'utf-8)))
You'd better add the following one if you try the workaround:
(add-to-list
'w3m-show-decoded-url
'("\\`http://\\(?:[^./?#]+\\.\\)*ncu\\.edu\\.tw/" . utf-8))