[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Czech characters displayed incorrectly

From: niels.giesen@xxxxxxxxx
Date: Sat, 26 Jan 2008 09:27:51 +0100
X-ml-name: emacs-w3m
X-mail-count: 09975

This bug report will be sent to the emacs-w3m development team,
 not to your local site managers!!
Please write in simple English, because the emacs-w3m developers
aren't good at English reading. ;-)

Please describe as succinctly as possible:
	- What happened.
	- What you thought should have happened.
	- Precisely what you were doing at the time.

Please also include any Lisp back-traces that you may have.
================================================================
Dear Bug Team!

Problem with czech characters in w3m. For instance, the sequence Е™ГЕ€ (if messed up, this
means r with a haДЌek, long i and n with a haДЌek) is displayed incorrectly.

When I retrieve the page with wget, and then visited as a file, it gets displayed
right. So this is the expected behaviour:

  character: Е™ (331897, #o1210171, #x51079, U+0159)
    charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
 code point: #x20 #x79
     syntax: w 	which means: word
   category: l:Latin
buffer code: #x9C #xF4 #xA0 #xF9
  file code: #xC5 #x99 (encoded by coding system mule-utf-8)
    display: by this font (glyph code)
     -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1 (#x159)

  character: Г (2285, #o4355, #x8ed, U+00ED)
    charset: latin-iso8859-1 (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)
 code point: #x6D
     syntax: w 	which means: word
   category: l:Latin
buffer code: #x81 #xED
  file code: #xC3 #xAD (encoded by coding system mule-utf-8)
    display: by this font (glyph code)
     -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO8859-1 (#xED)

  character: Е€ (331880, #o1210150, #x51068, U+0148)
    charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
 code point: #x20 #x68
     syntax: w 	which means: word
   category: l:Latin
buffer code: #x9C #xF4 #xA0 #xE8
  file code: #xC5 #x88 (encoded by coding system mule-utf-8)
    display: by this font (glyph code)
     -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1 (#x148)

In emacs-w3m I got instead the following: 

  character: SPC (32, #o40, #x20, U+0020)
    charset: ascii (ASCII (ISO646 IRV))
 code point: #x20
     syntax:   	which means: whitespace
   category:
             a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0])
             l:Latin
 Properties: jisx0208: 53409;
buffer code: #x20
  file code: #x20 (encoded by coding system utf-8)
    display: by this font (glyph code)
     -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO8859-1 (#x20)

  character: вЉ (343277, #o1236355, #x53ced, U+22AD)
    charset: mule-unicode-0100-24ff
             (Unicode characters of the range U+0100..U+24FF.)
 code point: #x79 #x6D
     syntax: w 	which means: word
buffer code: #x9C #xF4 #xF9 #xED
  file code: #xE2 #x8A #xAD (encoded by coding system utf-8)
    display: by this font (glyph code)
     -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1 (#x22AD),,

  character: SPC (32, #o40, #x20, U+0020)
    charset: ascii (ASCII (ISO646 IRV))
 code point: #x20
     syntax:   	which means: whitespace
   category:
             a:ASCII graphic characters 32-126 (ISO646 IRV:1983[4/0])
             l:Latin
 Properties: jisx0208: 53409;
buffer code: #x20
  file code: #x20 (encoded by coding system utf-8)
    display: by this font (glyph code)
     -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO8859-1 (#x20)

and

  character: и (232, #o350, #xe8)
    charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
 code point: #xE8
     syntax:   	which means: whitespace
buffer code: #xE8
  file code: #xE8 (encoded by coding system utf-8)
    display: by this font (glyph code)
     -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO8859-1 (#xE8),

I checked the same in stand-alone w3m, which displayed everything correctly. Please note
these are not the only czech characters displayed incorrectly. If you like me to provide
the whole alphabet, I shall. One strange thing I saw with some other characters is that
they seemed to be displayed as Thai characters. Please let me know if you want more
info.

Regards,

Niels Giesen

niels.giesen@xxxxxxxxx

================================================================

System Info to help track down your bug:
---------------------------------------
emacs-w3m-version
 => "1.4.4"
emacs-version
 => "GNU Emacs 22.1.1 (i486-pc-linux-gnu, X toolkit, Xaw3d scroll bars)\n of 2007-11-06 on terranova, modified by Ubuntu"
mule-version
 => "5.0 (SAKAKI)"
system-type
 => gnu/linux
w3m-version
 => "w3m/0.3.2+mee-p24-19+moe-1.5.0"
w3m-type
 => w3mmee
w3m-compile-options
 => ("lang=many" "kanji-symbols" "image" "color" "ansi-color" "mouse" "menu" "cookie" "ssl" "ssl-verify" "w3mmailer" "nntp" "gopher" "ipv6" "mark" "romaji")
w3m-language
 => nil
w3m-command-arguments
 => ("-o" "concurrent=0" "-F")
w3m-command-arguments-alist
 => nil
w3m-command-environment
 => (("W3MLANG" . "ja_JP.kterm"))
w3m-input-coding-system
 => ctext
w3m-output-coding-system
 => ctext
w3m-use-mule-ucs
 => nil

Follow-Ups:
- Re: Czech characters displayed incorrectly
  - From: Katsumi Yamaoka

Prev by Date: Re: w3m-safe-url-regexp ($B%Q%C%A#2(B)
Next by Date: Re: w3m-safe-url-regexp ($B%Q%C%A#2(B)
Previous by thread: Redisplay form with charset
Next by thread: Re: Czech characters displayed incorrectly
Index(es):
- Date
- Thread

Namazu Search: [Help]