[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: stealing a code snippet

From: Katsumi Yamaoka <yamaoka@xxxxxxx>
Date: Tue, 10 Nov 2020 07:26:34 +0900
X-ml-name: emacs-w3m
X-mail-count: 13697
References: <20201105130634.prqcitts36igkzcn@E15-2016.optimum.net> <b4mzh3vbg3y.fsf@jpl.org> <20201106075912.f56qt2wkwzztpoyg@E15-2016.optimum.net> <b4mv9efwhpt.fsf@jpl.org> <20201109134246.r4khc47ghcji7daf@E15-2016.optimum.net>

In [emacs-w3m:13696]
On Mon, 09 Nov 2020 08:42:46 -0500, Boruch Baum wrote:
>  (decode-coding-string
>    (url-unhex-string str)
>      (or file-name-coding-system
>          default-file-name-coding-system))

> Does this work properly for the common Japanese use-cases?

Yes, this works for almost modern web sites that use utf-8.  But
it is not applicable for *all* sites.  In Japan, MS-DOS/Windows
used to use japanese-shift-jis (aka. sjis) even for file names,
and unix people used to use euc-jp, so some old Japanese sites
may still use those coding systems.  Both euc-jp and sjis use
two bytes of which the MSB==1 to express a wide character, are
not so easy to discriminate, and are of course not compatible to
unicode at all.  Though this may be a special case only in Japan,
the decisive thing is that there is no agreement between Emacs'
default coding system (or a user's choice) and that of web sites.
This is why the charset value exists in the Content-Type header.

References:
- stealing a code snippet
  - From: Boruch Baum
- Re: stealing a code snippet
  - From: Katsumi Yamaoka
- Re: stealing a code snippet
  - From: Boruch Baum
- Re: stealing a code snippet
  - From: Katsumi Yamaoka
- Re: stealing a code snippet
  - From: Boruch Baum

Prev by Date: Re: stealing a code snippet
Next by Date: Problems installing emacs-w3m
Previous by thread: Re: stealing a code snippet
Next by thread: Problems installing emacs-w3m
Index(es):
- Date
- Thread

Namazu Search: [Help]