[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Help: using chinese-gbk
>>>>> In [emacs-w3m : No.09366] Jielei Fan wrote:
(B> But some chinese character can not be showed correctly in some web page,
(B> for example, http://www.xinhuanet.com/newscenter/ldrbdzj/index_3.htm,
(B> because in this web page, '?F' is a character which is not in gb2313
(B> but gbk.
(BThis page uses the GB2312 charset and the world famous person's
(Bname is encoded into "\326\354\351F\273\371". Firefox displays
(Bit correctly, however I confirmed emacs-w3m doesn't. If this is
(Bable to be decoded by the `chinese-gbk' coding system, you can
(Badd a rule to the `w3m-compatible-encoding-alist' as follows:
(B(add-to-list 'w3m-compatible-encoding-alist '(gb2312 . chinese-gbk))
(B;; Add this line to the ~/.emacs-w3m.el file or evaluate it by
(B;; typing the `C-x C-e' key at the end of the line.
(BThis has been implemented because many European web pages use
(Bthe WINDOWS-1252 charset in spite of specifying the ISO-8859-1
(Bcharset (WINDOWS-1252 is a superset of ISO-8859-1).
(BBTW, I've installed the mule-gbk-0.1.2004080701.tar.gz package
(Bfor Emacs 22. However, using it I see only boxes or question
(Bmarks for any Chinese text so far. With your Emacs 22, can you
(Bsee his name correctly by evaluating the following Lisp form?
(B(decode-coding-string "\326\354\351F\273\371" 'chinese-gbk)
(B;; Copy this line to the *scratch* buffer and type the `C-j' key
(B;; at the end of this line.
(BIn Emacs 23, the `chinese-gbk' coding system is supported
(Bnatively, however it shows a box for the data "\351F" either:
This might mean only that I don't have a suitable font for it,
(BOne more thought; we might be unable to make emacs-w3m display
(BGBK text in Emacs 22 after all, because it doesn't seem that the
(B`utf-8' coding system (which is used when communicating with the
(Bexternal w3m command) handles GBK text as follows:
(B (decode-coding-string "\326\354\351F\273\371"
(B => ((mule-unicode-e000-ffff 117 61)
(B (mule-unicode-e000-ffff 117 61)
(B (mule-unicode-e000-ffff 117 61))
(BOTOH, this form returns the following in Emacs 23 under the
(BChinese-GBK language environment:
(B => ((chinese-gbk 214 236)
(B (chinese-gbk 233 70)
(B (chinese-gbk 187 249))
(B> As you guess, web page that uses the GBK charset is very rare,
(B> but I still find one,
(B> it can not be showed in w3m.
(BAs far as I can see, the external w3m command breaks the html
(Bcontents. It converts
(B <html> <head> <title>TITLE_STRING_IN_CHINESE</title>...
(B TITLE_STRING_IN_CHINESE <html> ;<head>...
(Bwhen the `w3m-rendering-half-dump' function is performed, hence
(Bthe page is not displayed correctly. That's quite strange but
(Bit should be a bug of the external w3m command. So, I have
(Bnothing to do for it unfortunately.
(B> I am very confused about it, because it seems that it does not deal
(B> with chinese.
(BI'm being confused too. What have to be improved might not only
(Bbe emacs-w3m but also w3m and Emacs.