[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: stealing a code snippet
In [emacs-w3m:13696]
On Mon, 09 Nov 2020 08:42:46 -0500, Boruch Baum wrote:
> (decode-coding-string
> (url-unhex-string str)
> (or file-name-coding-system
> default-file-name-coding-system))
> Does this work properly for the common Japanese use-cases?
Yes, this works for almost modern web sites that use utf-8. But
it is not applicable for *all* sites. In Japan, MS-DOS/Windows
used to use japanese-shift-jis (aka. sjis) even for file names,
and unix people used to use euc-jp, so some old Japanese sites
may still use those coding systems. Both euc-jp and sjis use
two bytes of which the MSB==1 to express a wide character, are
not so easy to discriminate, and are of course not compatible to
unicode at all. Though this may be a special case only in Japan,
the decisive thing is that there is no agreement between Emacs'
default coding system (or a user's choice) and that of web sites.
This is why the charset value exists in the Content-Type header.