Re: Unicode HTML character entities in w3m

Hi Robert,

thank you for your reply. 

Robert D. Crawford writes ("Re: Unicode HTML character entities  in w3m"):
> I am not sure what encoding I am using here.  I looked for a command
> that would tell me the encoding or character set currently in use and
> did not really find anything.  This is not unusual, as I have no
> experience  here and I think I use the defaults.

Unfortunatley, I understand the way emacs handles multibyte characters
not as good as I would want to either, but you can use commands
like"describe-char and describe-language-environment and
describe-coding-system to get some information. Also the value of
default-enable-multibyte-characters  tells you whether you are in
unibyte or multibyte mode.

> Mine here shows exactly what yours does.
> I do not hear the characters you inserted  in this mail, nor do I hear
> them in w3m.  You might want to set tts-strip-octals to true and see if
> that helps.  

This will certainly remove the offending characters but it will also
remove other latin-1 characters I want to hear. if there really were
some ü (u-umlaut) in the buffer, I need to know because it is a
regular character in German. I guess the reason why I can hear the
characters with ViaVoice at all is the partial multi-language support
patch I am using.

As I understand the problem at the moment is that with the latin-1
language environment one should not expect to be able to have Unicode
characters beyond code point  255 in a buffer (naturally). The behaviour in multibyte
mode with utf-8 is probably also ok, one would only have to make sure
the whole path from the buffer through the emacspeak speech layer to
viavoice will all use utf-8. I guess the solution I have now is
pretty usable because web pages are the only place I have to
teal with Unicode. Getting this fixed properly would be a larger
project which should also bring better multi-language support to emacspeak.
This is on my projects list but will not happen any time soon from my

What still puzzles me is the fact the w3 seems to do some rplacement
of non-ascii Unicode characters already. 

Best regards, Lukas

