[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: Help: using chinese-gbk

Quoting Katsumi Yamaoka <yamaoka@xxxxxxx>:


In [emacs-w3m : No.09362] Jielei Fan wrote:

I surf internet using the emacs-w3m written by you. But I meet a
problem, I have already installed
 mule-gbk package on my emacs 22(I use it on windows xp system), and
it works well, however it
does not work in w3m-mode, in which iso-8859-1-dos, gb2312-dos or
other code system will invoked
automatically. But if I write these codes

;; (setq w3m-bookmark-file-coding-system 'chinese-gbk)
;; (setq w3m-coding-system 'chinese-gbk)
;; (setq w3m-default-coding-system 'chinese-gbk)
;; (setq w3m-file-coding-system 'chinese-gbk)
;; (setq w3m-file-name-coding-system 'chinese-gbk)
;; (setq w3m-terminal-coding-system 'chinese-gbk)
;; (setq w3m-input-coding-system 'chinese-gbk)
;; (setq w3m-output-coding-system 'chinese-gbk)

in my .emacs,

the website will be emerged mess code.

First of all, you should never have need to modify at least `w3m-input-coding-system' and `w3m-output-coding-system'. The values for those variables should be supported by the external w3m command, and `utf-8' is a good choice. If I understand correctly, GBK is a superset of GB2312 and all characters can be expressed with Unicode.

Emacs-w3m fetches an html page as binary data, decode it
according to the charset that the page specifies[1], encode it
with a certain coding system[2], and passes it to the external
w3m command.  And then the external w3m processes it, encodes it
with a certain coding system[3], returns it to emacs-w3m, and
finally emacs-w3m decodes it with a certain coding system[3].

[1] The charset is specified in the page header or in the meta
tag which looks like:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

The `=' command shows the page header, and the `\' command shows
raw (but charset-decoded) html contents.

[2] The value of `w3m-input-coding-system'.

[3] The value of `w3m-output-coding-system'.

I think the cause of your problem is that emacs-w3m doesn't know
how to find a suitable coding system for the GBK charset, and it
might be solved by adding a proper rule to the
`w3m-charset-coding-system-alist'.  Could you let me know a
typical web page that uses the GBK charset?

Another problem is that, when I use command w3m-search(google engine),
after I input chinese character,
the content will be taken as ??????? in google website. Could you
please tell me how to solve it?

Well, it will probably be solved if all GBK pages are displayed correctly.


But some chinese character can not be showed correctly in some web page,
for example, http://www.xinhuanet.com/newscenter/ldrbdzj/index_3.htm,
because in this web page, '?F' is a character which is not in gb2313
but gbk.

As you guess, web page that uses the GBK charset is very rare,
but I still find one, http://www.lai68.cn/top.php?id=%E9%A6%99%E8%95%89%E9%B2%8D%E9%B1%BC%E4%BF%B1%E4%B9%90%E9%83%A8,
it can not be showed in w3m.

This problem is still not be solved: when I use command w3m-search(google engine), after I input chinese character,
the content will be taken as ??????? in google website. This problem is very important for me.
Maybe it is related with code

(set-w32-system-coding-system 'chinese-gbk)
(set-selection-coding-system 'chinese-gbk)
(set-keyboard-coding-system 'chinese-gbk)
(set-language-environment 'chinese-gbk)
(setq locale-coding-system 'chinese-gbk)
(setq current-language-environment "Chinese-GBK")

in .emacs.

As you mentioned variable 'w3m-charset-coding-system-alist', after C-j,
it is
((x-sjis . shift_jis) (x-shift_jis . shift_jis) (x-shift-jis . shift_jis) (x-euc-jp . euc-japan) (shift-jis . shift_jis) (x-unknown . undecided) (unknown . undecided) (windows-874 . tis-620) (iso-2022-jp-3 . iso-2022-7bit-ss2) (us_ascii . raw-text))
on my computer.

I am very confused about it, because it seems that it does not deal with chinese.

I hope you can help me to solve it,
I expect your reply.

Best regards,
an faithful emacs-w3m user.