[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: character issue
>>>>> In [emacs-w3m : No.09696] Robert D. Crawford wrote:
>> How does your speech synth speak the following apostrophe?
>>
>> Freedomâs Watch
That's strange. The apostrophe that I sent was a single character.
In the raw (i.e., encoded) message, it consists of three bytes,
octets 342, 200, and 231. They are QP-encoded, though. Those
bytes should be decoded into a single letter because
the Content-Type header specifies the utf-8 charset. However your
Gnus and Emacs seem to have displayed three bytes, the letter `a'
with circumflex, octet 200, and octet 231. I guess the above cited
one might be displayed with six bytes when you read this message.
I tried Gnus v5.11 on Emacs 23.0.50 that you use, but found no
problem. So, I suspect you use something special in your Emacs.
> It speaks it as if it weren't there. The synth acts as if there is a
> space there. Spelling it out, it looks like this:
> Freedom s Watch
> But that is not so much a problem. _Much_ better than this:
> Freedom question mark s Watch
Indeed. I guess the synth ignores not only octal bytes but also
the letter `a' with circumflex. Isn't it the reason you use the
special tool? I vaguely guess that it converts some characters
into others before they are decoded according to the specified
charset. Although it might be indispensable to you, it breaks
non-ASCII text. In this case, it is the Latin character that
cannot be mapped in a single 8-bit datum.
> The page I was viewing
> http://www.nytimes.com/2007/09/30/us/politics/30watch.html?ex=1348891200&en=02eb54b65d042599&ei=5124&partner=permalink&exprod=permalink
I was aware that the New York Times uses many non-ASCII characters
in pages, since I made a Shimbun module for the New York Times.
> I don't think it is so much an issue with the synth. As I understand
> it, the synth speaks what is on the screen (with some exceptions like u
> umlaut and c cedilla which are not characters in English).
Does the synth support only English text? For that reason, don't
you use a tool that converts non-English characters into others?
It might be good if the tool performs on decoded text, however I
guess it works on raw data that are not decoded yet, and it leads
emacs-w3m to display question marks for some letters.
> Here is
> another example that might help. If I type the word "don't", it is
> spoken correctly. If I were to encounter it in utf-8, it would sound
> like "don tee". Still, this is preferable to "don question mark tee".
That's too bad. I can sympathize with your bad feelings.
[...]
> This could be an issue with the configuration as it pertains to the
> character set used in emacs itself. The synth, as I understand it,
> chokes on multibyte characters. emacspeak requires unibyte to be on.
I see. So, what you need seems to be a filter that converts non-ASCII
decoded characters into others or nulls.
[...]
> The thing about this issue I don't understand is that emacs/w3 displays
> the page with the octal representation of the character. I am afraid I
> was not clear before. I do not "see" the octal character but only the
> octal representation of the character ("\xyz" instead of "'").
Emacs/W3 perhaps doesn't decode characters used recently that
were not in Emacs at the time Emacs/W3 was developed.
>> Otherwise, we can offer a filter program that converts the
>> apostrophe in question into the ASCII character for emacs-w3m.
> While this sort of thing would be appreciated, it seems like a bad fix.
I don't think it's so bad until emacspeak is improved further.
> I might ask on the emacspeak list to see if anyone else is seeing this
> and if so what they are doing about it.
Please don't mind showing my articles to them. Those are in:
http://news.gmane.org/group/gmane.emacs.w3m/thread=7055/force_load=t
> BTW, I did try to set the following variables to ascii a few months ago:
> w3m-input-coding-system
> w3m-output-coding-system
> but the result was the same.
I think fiddling them will not solve the problem.
Regards,