[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: stealing a code snippet

In [emacs-w3m:13696]
On Mon, 09 Nov 2020 08:42:46 -0500, Boruch Baum wrote:
>  (decode-coding-string
>    (url-unhex-string str)
>      (or file-name-coding-system
>          default-file-name-coding-system))

> Does this work properly for the common Japanese use-cases?

Yes, this works for almost modern web sites that use utf-8.  But
it is not applicable for *all* sites.  In Japan, MS-DOS/Windows
used to use japanese-shift-jis (aka. sjis) even for file names,
and unix people used to use euc-jp, so some old Japanese sites
may still use those coding systems.  Both euc-jp and sjis use
two bytes of which the MSB==1 to express a wide character, are
not so easy to discriminate, and are of course not compatible to
unicode at all.  Though this may be a special case only in Japan,
the decisive thing is that there is no agreement between Emacs'
default coding system (or a user's choice) and that of web sites.
This is why the charset value exists in the Content-Type header.