[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Help: using chinese-gbk
>>>>> In [emacs-w3m : No.09366] Jielei Fan wrote:
> But some chinese character can not be showed correctly in some web page,
> for example, http://www.xinhuanet.com/newscenter/ldrbdzj/index_3.htm,
> because in this web page, '?F' is a character which is not in gb2313
> but gbk.
This page uses the GB2312 charset and the world famous person's
name is encoded into "\326\354\351F\273\371". Firefox displays
it correctly, however I confirmed emacs-w3m doesn't. If this is
able to be decoded by the `chinese-gbk' coding system, you can
add a rule to the `w3m-compatible-encoding-alist' as follows:
(add-to-list 'w3m-compatible-encoding-alist '(gb2312 . chinese-gbk))
;; Add this line to the ~/.emacs-w3m.el file or evaluate it by
;; typing the `C-x C-e' key at the end of the line.
This has been implemented because many European web pages use
the WINDOWS-1252 charset in spite of specifying the ISO-8859-1
charset (WINDOWS-1252 is a superset of ISO-8859-1).
BTW, I've installed the mule-gbk-0.1.2004080701.tar.gz package
for Emacs 22. However, using it I see only boxes or question
marks for any Chinese text so far. With your Emacs 22, can you
see his name correctly by evaluating the following Lisp form?
(decode-coding-string "\326\354\351F\273\371" 'chinese-gbk)
;; Copy this line to the *scratch* buffer and type the `C-j' key
;; at the end of this line.
In Emacs 23, the `chinese-gbk' coding system is supported
natively, however it shows a box for the data "\351F" either:
This might mean only that I don't have a suitable font for it,
though.
One more thought; we might be unable to make emacs-w3m display
GBK text in Emacs 22 after all, because it doesn't seem that the
`utf-8' coding system (which is used when communicating with the
external w3m command) handles GBK text as follows:
(mapcar 'split-char
(decode-coding-string
(encode-coding-string
(decode-coding-string "\326\354\351F\273\371"
'chinese-gbk)
'utf-8)
'utf-8))
=> ((mule-unicode-e000-ffff 117 61)
(mule-unicode-e000-ffff 117 61)
(mule-unicode-e000-ffff 117 61))
OTOH, this form returns the following in Emacs 23 under the
Chinese-GBK language environment:
=> ((chinese-gbk 214 236)
(chinese-gbk 233 70)
(chinese-gbk 187 249))
> As you guess, web page that uses the GBK charset is very rare,
> but I still find one,
> http://www.lai68.cn/top.php?id=%E9%A6%99%E8%95%89%E9%B2%8D%E9%B1%BC%E4%BF%B1%E4%B9%90%E9%83%A8,
> it can not be showed in w3m.
As far as I can see, the external w3m command breaks the html
contents. It converts
<html> <head> <title>TITLE_STRING_IN_CHINESE</title>...
into
TITLE_STRING_IN_CHINESE <html> ;<head>...
when the `w3m-rendering-half-dump' function is performed, hence
the page is not displayed correctly. That's quite strange but
it should be a bug of the external w3m command. So, I have
nothing to do for it unfortunately.
[...]
> I am very confused about it, because it seems that it does not deal
> with chinese.
I'm being confused too. What have to be improved might not only
be emacs-w3m but also w3m and Emacs.
Regards,