[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help: using chinese-gbk

From: Jielei Fan <jielei.fan@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 11 Apr 2007 14:46:24 +0200
X-ml-name: emacs-w3m
X-mail-count: 09366
References: <20070410225956.a91bndk8q04c88gg@mail.tu-chemnitz.de> <b4mwt0jloc7.fsf@jpl.org>

Quoting Katsumi Yamaoka <yamaoka@xxxxxxx>:

Hi,

In [emacs-w3m : No.09362] Jielei Fan wrote:

I surf internet using the emacs-w3m written by you. But I meet a
problem, I have already installed
 mule-gbk package on my emacs 22(I use it on windows xp system), and
it works well, however it
does not work in w3m-mode, in which iso-8859-1-dos, gb2312-dos or
other code system will invoked
automatically. But if I write these codes

;; (setq w3m-bookmark-file-coding-system 'chinese-gbk)
;; (setq w3m-coding-system 'chinese-gbk)
;; (setq w3m-default-coding-system 'chinese-gbk)
;; (setq w3m-file-coding-system 'chinese-gbk)
;; (setq w3m-file-name-coding-system 'chinese-gbk)
;; (setq w3m-terminal-coding-system 'chinese-gbk)
;; (setq w3m-input-coding-system 'chinese-gbk)
;; (setq w3m-output-coding-system 'chinese-gbk)

in my .emacs,

the website will be emerged mess code.


First of all, you should never have need to modify at least
`w3m-input-coding-system' and `w3m-output-coding-system'.  The
values for those variables should be supported by the external
w3m command, and `utf-8' is a good choice.  If I understand
correctly, GBK is a superset of GB2312 and all characters can be
expressed with Unicode.

Emacs-w3m fetches an html page as binary data, decode it
according to the charset that the page specifies[1], encode it
with a certain coding system[2], and passes it to the external
w3m command.  And then the external w3m processes it, encodes it
with a certain coding system[3], returns it to emacs-w3m, and
finally emacs-w3m decodes it with a certain coding system[3].

[1] The charset is specified in the page header or in the meta
tag which looks like:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

The `=' command shows the page header, and the `\' command shows
raw (but charset-decoded) html contents.

[2] The value of `w3m-input-coding-system'.

[3] The value of `w3m-output-coding-system'.

I think the cause of your problem is that emacs-w3m doesn't know
how to find a suitable coding system for the GBK charset, and it
might be solved by adding a proper rule to the
`w3m-charset-coding-system-alist'.  Could you let me know a
typical web page that uses the GBK charset?

Another problem is that, when I use command w3m-search(google engine),
after I input chinese character,
the content will be taken as ??????? in google website. Could you
please tell me how to solve it?


Well, it will probably be solved if all GBK pages are displayed
correctly.

Regards,

But some chinese character can not be showed correctly in some web page,
for example, http://www.xinhuanet.com/newscenter/ldrbdzj/index_3.htm,
because in this web page, '?F' is a character which is not in gb2313
but gbk.

As you guess, web page that uses the GBK charset is very rare, but I still find one, http://www.lai68.cn/top.php?id=%E9%A6%99%E8%95%89%E9%B2%8D%E9%B1%BC%E4%BF%B1%E4%B9%90%E9%83%A8, it can not be showed in w3m.

This problem is still not be solved: when I use command w3m-search(google engine), after I input chinese character, the content will be taken as ??????? in google website. This problem is very important for me. Maybe it is related with code

(set-w32-system-coding-system 'chinese-gbk)
(set-selection-coding-system 'chinese-gbk)
(set-keyboard-coding-system 'chinese-gbk)
(set-language-environment 'chinese-gbk)
(setq locale-coding-system 'chinese-gbk)
(setq current-language-environment "Chinese-GBK")

in .emacs.

As you mentioned variable 'w3m-charset-coding-system-alist', after C-j, it is ((x-sjis . shift_jis) (x-shift_jis . shift_jis) (x-shift-jis . shift_jis) (x-euc-jp . euc-japan) (shift-jis . shift_jis) (x-unknown . undecided) (unknown . undecided) (windows-874 . tis-620) (iso-2022-jp-3 . iso-2022-7bit-ss2) (us_ascii . raw-text)) on my computer.

I am very confused about it, because it seems that it does not deal with chinese.

I hope you can help me to solve it,
I expect your reply.

Best regards,
an faithful emacs-w3m user.

Follow-Ups:
- Re: Help: using chinese-gbk
  - From: Katsumi Yamaoka

References:
- Help: using chinese-gbk
  - From: Jielei Fan
- Re: Help: using chinese-gbk
  - From: Katsumi Yamaoka

Prev by Date: Re: Correct? Display Horizontal Ellipsis on UTF-8 env
Next by Date: Re: Help: using chinese-gbk
Previous by thread: Re: Help: using chinese-gbk
Next by thread: Re: Help: using chinese-gbk
Index(es):
- Date
- Thread

Namazu Search: [Help]