[Date Prev][Date Next][Thread Prev][][Date Index][Thread Index]

Re: character issue



Katsumi Yamaoka <yamaoka@xxxxxxx> writes:

>>> Freedom’s Watch
>
> That's strange.  The apostrophe that I sent was a single character.
> In the raw (i.e., encoded) message, it consists of three bytes,
> octets 342, 200, and 231.  

I did notice this.

> They are QP-encoded, though.  Those bytes should be decoded into a
> single letter because the Content-Type header specifies the utf-8
> charset.  However your Gnus and Emacs seem to have displayed three
> bytes, the letter `a' with circumflex, 

I only see the octal representation, not a circumflex. 

> octet 200, and octet 231.  I guess the above cited one might be
> displayed with six bytes when you read this message.  

It is 

> I tried Gnus v5.11 on Emacs 23.0.50 that you use, but found no
> problem.  So, I suspect you use something special in your Emacs.

I'm Not even sure where to start looking.  I'll be posting something to
the emacspeak list today if I get a chance to see if anyone there has an
idea.  

>> It speaks it as if it weren't there.  The synth acts as if there is a
>> space there.  Spelling it out, it looks like this:
>
>> Freedom s Watch
>
>> But that is not so much a problem.  _Much_ better than this:
>
>> Freedom question mark s Watch
>
> Indeed.  I guess the synth ignores not only octal bytes but also
> the letter `a' with circumflex.  Isn't it the reason you use the
> special tool?  

I am not sure what you are referring to here.  If you are referring to
emacspeak, then I am using it because I am visually impaired.

> I vaguely guess that it converts some characters into others before
> they are decoded according to the specified charset.  

I don't think there is any conversion going on.  It seems likely to me
that emacs itself is looking at the various octets and saying, "OK, here
is "a" and here is "B" and here is an exclamation point and... hmmm,
here is a sequence I don't understand so I will print the raw octets to
the buffer."  As I am sure you can see, I really have no idea what I'm
talking about.  Sorry.

> Although it might be indispensable to you, it breaks non-ASCII text.
> In this case, it is the Latin character that cannot be mapped in a
> single 8-bit datum.
>
>> The page I was viewing
>
>> http://www.nytimes.com/2007/09/30/us/politics/30watch.html?ex=1348891200&en=02eb54b65d042599&ei=5124&partner=permalink&exprod=permalink
>
> I was aware that the New York Times uses many non-ASCII characters
> in pages, since I made a Shimbun module for the New York Times.
>
>> I don't think it is so much an issue with the synth.  As I understand
>> it, the synth speaks what is on the screen (with some exceptions like u
>> umlaut and c cedilla which are not characters in English).
>
> Does the synth support only English text?  For that reason, don't
> you use a tool that converts non-English characters into others?

I _think_ mine only supports English text because that is the particular
language I have.  The synth comes in two parts, one of which is language
specific.  It is IBM ViaVoice, in case you were curious.

> 
> It might be good if the tool performs on decoded text, however I guess
> it works on raw data that are not decoded yet, and it leads emacs-w3m
> to display question marks for some letters.

I'm not sure, but I think the synth is the last part in the chain:

HTML > w3m > emacs-w3m > synth

>> This could be an issue with the configuration as it pertains to the
>> character set used in emacs itself.  The synth, as I understand it,
>> chokes on multibyte characters.  emacspeak requires unibyte to be on.
>
> I see.  So, what you need seems to be a filter that converts non-ASCII
> decoded characters into others or nulls.

As I said, that would be helpful but it seems a one-off solution.
Again, I think someone on the emacspeak list might have a more elegant
solution.  

>>> Otherwise, we can offer a filter program that converts the
>>> apostrophe in question into the ASCII character for emacs-w3m.
>
>> While this sort of thing would be appreciated, it seems like a bad fix.
>
> I don't think it's so bad until emacspeak is improved further.

This idea has been mentioned before.  I believe the reason it is not
"fixed" is that it would require a major rewrite and the number of
developers of emacspeak is small.

>> I might ask on the emacspeak list to see if anyone else is seeing this
>> and if so what they are doing about it.
>
> Please don't mind showing my articles to them.  Those are in:
>
> http://news.gmane.org/group/gmane.emacs.w3m/thread=7055/force_load=t

Thanks.  I will pass this along.  

Thank you for all your help,
rdc
-- 
Robert D. Crawford                                      rdc1x@xxxxxxxxxxx

I don't care for the Sugar Smacks commercial.  I don't like the idea of
a frog jumping on my Breakfast.
		-- Lowell, Chicago Reader 10/15/82