[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: Help: using chinese-gbk



>>>>> In [emacs-w3m : No.09366] Jielei Fan wrote:

> But some chinese character can not be showed correctly in some web page,
> for example, http://www.xinhuanet.com/newscenter/ldrbdzj/index_3.htm,
> because in this web page, '?F' is a character which is not in gb2313
> but gbk.

This page uses the GB2312 charset and the world famous person's
name is encoded into "\326\354\351F\273\371".  Firefox displays
it correctly, however I confirmed emacs-w3m doesn't.  If this is
able to be decoded by the `chinese-gbk' coding system, you can
add a rule to the `w3m-compatible-encoding-alist' as follows:

(add-to-list 'w3m-compatible-encoding-alist '(gb2312 . chinese-gbk))

;; Add this line to the ~/.emacs-w3m.el file or evaluate it by
;; typing the `C-x C-e' key at the end of the line.

This has been implemented because many European web pages use
the WINDOWS-1252 charset in spite of specifying the ISO-8859-1
charset (WINDOWS-1252 is a superset of ISO-8859-1).

BTW, I've installed the mule-gbk-0.1.2004080701.tar.gz package
for Emacs 22.  However, using it I see only boxes or question
marks for any Chinese text so far.  With your Emacs 22, can you
see his name correctly by evaluating the following Lisp form?

(decode-coding-string "\326\354\351F\273\371" 'chinese-gbk)

;; Copy this line to the *scratch* buffer and type the `C-j' key
;; at the end of this line.

In Emacs 23, the `chinese-gbk' coding system is supported
natively, however it shows a box for the data "\351F" either:

PNG image

This might mean only that I don't have a suitable font for it,
though.

One more thought; we might be unable to make emacs-w3m display
GBK text in Emacs 22 after all, because it doesn't seem that the
`utf-8' coding system (which is used when communicating with the
external w3m command) handles GBK text as follows:

(mapcar 'split-char
	(decode-coding-string
	 (encode-coding-string
	  (decode-coding-string "\326\354\351F\273\371"
				'chinese-gbk)
	  'utf-8)
	 'utf-8))
 => ((mule-unicode-e000-ffff 117 61)
     (mule-unicode-e000-ffff 117 61)
     (mule-unicode-e000-ffff 117 61))

OTOH, this form returns the following in Emacs 23 under the
Chinese-GBK language environment:

 => ((chinese-gbk 214 236)
     (chinese-gbk 233 70)
     (chinese-gbk 187 249))

> As you guess, web page that uses the GBK charset is very rare,
> but I still find one,
> http://www.lai68.cn/top.php?id=%E9%A6%99%E8%95%89%E9%B2%8D%E9%B1%BC%E4%BF%B1%E4%B9%90%E9%83%A8,
> it can not be showed in w3m.

As far as I can see, the external w3m command breaks the html
contents.  It converts

 <html> <head> <title>TITLE_STRING_IN_CHINESE</title>...

into

 TITLE_STRING_IN_CHINESE &lt;html&gt ;&lt;head&gt;...

when the `w3m-rendering-half-dump' function is performed, hence
the page is not displayed correctly.  That's quite strange but
it should be a bug of the external w3m command.  So, I have
nothing to do for it unfortunately.

[...]

> I am very confused about it, because it seems that it does not deal
> with chinese.

I'm being confused too.  What have to be improved might not only
be emacs-w3m but also w3m and Emacs.

Regards,