[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: emacs-w3m, w3m-m17n and mule
>>>>> In [emacs-w3m : No.03694]
>>>>> Sergei Pokrovsky <pok@nbsp.nsk.su> wrote:
Sergei> Okay, now it lets me in, but still I see ??? for the non-ascii
Sergei> symbols.
ARISAWA Akihiro, one of the core developer on emacs-w3m, gave us
a hint. Here's a translation:
----------------------------------------------------------------
I've sorted out some problems about http://www.mypsion.ru/.
%% Mule-UCS + w3m will auto-detect the contents as iso-8859-1.
The setting
(setq w3m-coding-system-priority-list '(cyrillic-koi8))
won't work.
>>>>> In [emacs-w3m : No.03691]
>>>>> Hideyuki SHIRAI <shirai@rdmg.mgcs.mei.co.jp> wrote:
> (detect-coding-with-priority (point-min) (point-max) '((coding-category-ccl koi8-r)))
> => (iso-latin-1-unix raw-text-unix no-conversion)
It returns cyrillic-koi8 when the setting
(set-language-environment "Cyrillic-KOI8")
is used. It seems that the following process in
set-language-environment-coding-systems produces the result:
(progn
(set (coding-system-category 'cyrillic-koi8) 'cyrillic-koi8)
(update-coding-systems-internal))
%% Mule-UCS + w3m will show some ???s even if we specify
cyrillic-koi8 using `C c' key.
>>>>> In [emacs-w3m : No.03687]
>>>>> Katsumi Yamaoka <yamaoka@jpl.org> wrote:
> I also see some ???s as the following picture, but I don't know
> what they should be or how do we fix them.
I think there's no help for it. Characters which can't be
expressed with euc-japan will be converted to the entity
references, except that characters in tag's attribute will be
converted to "?" when running w3m with -halfdump.
% cat t.html
<html>
<head><title>БЯЕ</title></head>
<body>БЯЕ</body>
</html>
% w3m -halfdump t.html
<head><title_alt title="???"></head>БЯЕ
<internal>
<title_alt title="???">
</internal>
%% Characters won't be displayed correctly by using w3m-m17n.
In my environment, the contents is mis-decoded violently as:
http://www.nijino.com/ari/tmp/emacs-w3m-m17n-ru.png
Though there's no problem when I specify koi8 using `C c' key,
it also be mis-decoded as
http://www.nijino.com/ari/tmp/emacs-w3m-m17n-ru2.png
when I specify cyrillic-koi8.
It seems that w3m-m17n couldn't decide the charset correctly
when the option -I is not specified or the coding-system which
w3m-m17n does not know is specified.
The following setting makes emacs-w3m work without any other
operations, though:
(setq w3m-halfdump-command-arguments
'("-halfdump" "-o" "ext_halfdump=1" "-I" "KOI8-R" "-O" "ISO-2022-JP-2" "-o" "strict_iso2022=0"))