[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character issue

From: Katsumi Yamaoka <yamaoka@xxxxxxx>
Date: Wed, 03 Oct 2007 13:03:03 +0900
X-ml-name: emacs-w3m
X-mail-count: 09702
References: <87fy0tkets.fsf@comcast.net> <b4mhcl94utl.fsf@jpl.org> <87zlz1ipjb.fsf@comcast.net>

>>>>> In [emacs-w3m : No.09696] Robert D. Crawford wrote:

>> How does your speech synth speak the following apostrophe?
>>
>> Freedomâs Watch

That's strange.  The apostrophe that I sent was a single character.
In the raw (i.e., encoded) message, it consists of three bytes,
octets 342, 200, and 231.  They are QP-encoded, though.  Those
bytes should be decoded into a single letter because
the Content-Type header specifies the utf-8 charset.  However your
Gnus and Emacs seem to have displayed three bytes, the letter `a'
with circumflex, octet 200, and octet 231.  I guess the above cited
one might be displayed with six bytes when you read this message.
I tried Gnus v5.11 on Emacs 23.0.50 that you use, but found no
problem.  So, I suspect you use something special in your Emacs.

> It speaks it as if it weren't there.  The synth acts as if there is a
> space there.  Spelling it out, it looks like this:

> Freedom s Watch

> But that is not so much a problem.  _Much_ better than this:

> Freedom question mark s Watch

Indeed.  I guess the synth ignores not only octal bytes but also
the letter `a' with circumflex.  Isn't it the reason you use the
special tool?  I vaguely guess that it converts some characters
into others before they are decoded according to the specified
charset.  Although it might be indispensable to you, it breaks
non-ASCII text.  In this case, it is the Latin character that
cannot be mapped in a single 8-bit datum.

> The page I was viewing

> http://www.nytimes.com/2007/09/30/us/politics/30watch.html?ex=1348891200&en=02eb54b65d042599&ei=5124&partner=permalink&exprod=permalink

I was aware that the New York Times uses many non-ASCII characters
in pages, since I made a Shimbun module for the New York Times.

> I don't think it is so much an issue with the synth.  As I understand
> it, the synth speaks what is on the screen (with some exceptions like u
> umlaut and c cedilla which are not characters in English).

Does the synth support only English text?  For that reason, don't
you use a tool that converts non-English characters into others?
It might be good if the tool performs on decoded text, however I
guess it works on raw data that are not decoded yet, and it leads
emacs-w3m to display question marks for some letters.

> Here is
> another example that might help.  If I type the word "don't", it is
> spoken correctly.  If I were to encounter it in utf-8, it would sound
> like "don tee".  Still, this is preferable to "don question mark tee".

That's too bad.  I can sympathize with your bad feelings.

[...]

> This could be an issue with the configuration as it pertains to the
> character set used in emacs itself.  The synth, as I understand it,
> chokes on multibyte characters.  emacspeak requires unibyte to be on.

I see.  So, what you need seems to be a filter that converts non-ASCII
decoded characters into others or nulls.

[...]

> The thing about this issue I don't understand is that emacs/w3 displays
> the page with the octal representation of the character.  I am afraid I
> was not clear before.  I do not "see" the octal character but only the
> octal representation of the character ("\xyz" instead of "'").

Emacs/W3 perhaps doesn't decode characters used recently that
were not in Emacs at the time Emacs/W3 was developed.

>> Otherwise, we can offer a filter program that converts the
>> apostrophe in question into the ASCII character for emacs-w3m.

> While this sort of thing would be appreciated, it seems like a bad fix.

I don't think it's so bad until emacspeak is improved further.

> I might ask on the emacspeak list to see if anyone else is seeing this
> and if so what they are doing about it.

Please don't mind showing my articles to them.  Those are in:

http://news.gmane.org/group/gmane.emacs.w3m/thread=7055/force_load=t

> BTW, I did try to set the following variables to ascii a few months ago:

> w3m-input-coding-system
> w3m-output-coding-system

> but the result was the same.

I think fiddling them will not solve the problem.

Regards,

Follow-Ups:
- Re: character issue
  - From: Robert D. Crawford

References:
- character issue
  - From: Robert D. Crawford
- Re: character issue
  - From: Katsumi Yamaoka
- Re: character issue
  - From: Robert D. Crawford

Prev by Date: Re: retrieving downed host
Next by Date: Re: character issue
Previous by thread: Re: character issue
Next by thread: Re: character issue
Index(es):
- Date
- Thread

Namazu Search: [Help]