[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: emacs-w3m, w3m-m17n and mule



>>>>> In [emacs-w3m : No.03694]
>>>>>	Sergei Pokrovsky <pok@nbsp.nsk.su> wrote:

Sergei> Okay, now it lets me in, but still I see ??? for the non-ascii
Sergei> symbols.

ARISAWA Akihiro, one of the core developer on emacs-w3m, gave us
a hint.  Here's a translation:
----------------------------------------------------------------
I've sorted out some problems about http://www.mypsion.ru/.

%% Mule-UCS + w3m will auto-detect the contents as iso-8859-1.
   The setting

   (setq w3m-coding-system-priority-list '(cyrillic-koi8))

   won't work.

>>>>> In [emacs-w3m : No.03691]
>>>>>	Hideyuki SHIRAI <shirai@rdmg.mgcs.mei.co.jp> wrote:

> (detect-coding-with-priority (point-min) (point-max) '((coding-category-ccl koi8-r)))
>  => (iso-latin-1-unix raw-text-unix no-conversion)

It returns cyrillic-koi8 when the setting

(set-language-environment "Cyrillic-KOI8")

is used.  It seems that the following process in
set-language-environment-coding-systems produces the result:

(progn
  (set (coding-system-category 'cyrillic-koi8) 'cyrillic-koi8)
  (update-coding-systems-internal))


%% Mule-UCS + w3m will show some ???s even if we specify
   cyrillic-koi8 using `C c' key.

>>>>> In [emacs-w3m : No.03687]
>>>>>	Katsumi Yamaoka <yamaoka@jpl.org> wrote:

> I also see some ???s as the following picture, but I don't know
> what they should be or how do we fix them.

I think there's no help for it.  Characters which can't be
expressed with euc-japan will be converted to the entity
references, except that characters in tag's attribute will be
converted to "?" when running w3m with -halfdump.

% cat t.html
<html>
<head><title>&#x411;&#x42f;&#x415;</title></head>
<body>&#x411;&#x42f;&#x415;</body>
</html>
% w3m -halfdump t.html
<head><title_alt title="???"></head>&#x411;&#x42f;&#x415;
<internal>
<title_alt title="???">
</internal>


%% Characters won't be displayed correctly by using w3m-m17n.

In my environment, the contents is mis-decoded violently as:

http://www.nijino.com/ari/tmp/emacs-w3m-m17n-ru.png

Though there's no problem when I specify koi8 using `C c' key,
it also be mis-decoded as

http://www.nijino.com/ari/tmp/emacs-w3m-m17n-ru2.png

when I specify cyrillic-koi8.

It seems that w3m-m17n couldn't decide the charset correctly
when the option -I is not specified or the coding-system which
w3m-m17n does not know is specified.

The following setting makes emacs-w3m work without any other
operations, though:

(setq w3m-halfdump-command-arguments
      '("-halfdump" "-o" "ext_halfdump=1" "-I" "KOI8-R" "-O" "ISO-2022-JP-2" "-o" "strict_iso2022=0"))