[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: emacs-w3m, w3m-m17n and mule
>>>>> In [emacs-w3m : No.03694]
>>>>> Sergei Pokrovsky <firstname.lastname@example.org> wrote:
Sergei> Okay, now it lets me in, but still I see ??? for the non-ascii
ARISAWA Akihiro, one of the core developer on emacs-w3m, gave us
a hint. Here's a translation:
I've sorted out some problems about http://www.mypsion.ru/.
%% Mule-UCS + w3m will auto-detect the contents as iso-8859-1.
(setq w3m-coding-system-priority-list '(cyrillic-koi8))
>>>>> In [emacs-w3m : No.03691]
>>>>> Hideyuki SHIRAI <email@example.com> wrote:
> (detect-coding-with-priority (point-min) (point-max) '((coding-category-ccl koi8-r)))
> => (iso-latin-1-unix raw-text-unix no-conversion)
It returns cyrillic-koi8 when the setting
is used. It seems that the following process in
set-language-environment-coding-systems produces the result:
(set (coding-system-category 'cyrillic-koi8) 'cyrillic-koi8)
%% Mule-UCS + w3m will show some ???s even if we specify
cyrillic-koi8 using `C c' key.
>>>>> In [emacs-w3m : No.03687]
>>>>> Katsumi Yamaoka <firstname.lastname@example.org> wrote:
> I also see some ???s as the following picture, but I don't know
> what they should be or how do we fix them.
I think there's no help for it. Characters which can't be
expressed with euc-japan will be converted to the entity
references, except that characters in tag's attribute will be
converted to "?" when running w3m with -halfdump.
% cat t.html
% w3m -halfdump t.html
%% Characters won't be displayed correctly by using w3m-m17n.
In my environment, the contents is mis-decoded violently as:
Though there's no problem when I specify koi8 using `C c' key,
it also be mis-decoded as
when I specify cyrillic-koi8.
It seems that w3m-m17n couldn't decide the charset correctly
when the option -I is not specified or the coding-system which
w3m-m17n does not know is specified.
The following setting makes emacs-w3m work without any other
'("-halfdump" "-o" "ext_halfdump=1" "-I" "KOI8-R" "-O" "ISO-2022-JP-2" "-o" "strict_iso2022=0"))