The code maintains inconsistent use of character encoding. Maybe
because parts of the code were written so long ago, before utf-8
became a defacto world-wide standard, other character encoding were
set for various files. The purpose of this commit is to try to make
the project consistently -utf-8 throughout.
This commit was originally applied in four months ago in the git
repository which I was using for development befor the project had its
own official git repository. During the intervening four months it was
publicly available for testing and was the version which I was using.
I received no complaints, and observed nothing suspicious; however, I
don't use Japanese, and I don't know whether anyone else bothered to
In practice, much of the work should be easy to test just by using the
menu system in Japanese, and by using emacs-w3m for the various
The character sets that had been in use included those which the w3
consortium say are to be especially avoided
Most that have explicit encoding are set to iso-2022-7bit, and file
w3m-bug.el is encoded for 'euc-japan'.
Since this was a huge and mind-numbing task, I automated it.
Step 1 was a few
sed commands to change the first line of the *.el files.
Step 2 was to run
iconv on the files.
for file in *.el ;
do echo "$file" ;
iconv -c -f iso-2022-jp -t utf-8 "$file" > "$file".new ;
In general, the conversion operation succeeded, in that it did
transform Japanese text from unintelligble ASCII escape sequences
into Japanese characters (likewise unintelligble to me, but with
samples verified by google translate).
Some files complained when not using the
-c flag to force
completion, but it doesn't seem important:
iconv: illegal input sequence at position 47
iconv: illegal input sequence at position 1169
iconv: illegal input sequence at position 1463
iconv: illegal input sequence at position 2723
iconv: illegal input sequence at position 4937
iconv: illegal input sequence at position 61716
Step 3 was to eyeball the results, and edit obvious errors.
Latin diacritics were sometimes clobbered by the
For example, é ó ú Á ź in lists
of month names for European languages.
On a few rare occassions, '@' was clobbered in regexes and in
Some files, such as
w3m-bug.el were fine without
`iconv ruined them.
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.