[BUG] speechd-up fails to detect one-character messages
Alexander E. Patrakov
patrakov at gmail.com
Mon Jan 14 14:39:18 CET 2008
2008/1/14, Alexander E. Patrakov <patrakov at gmail.com>:
> 3) speechd-up counts bytes for which isspace() returns false. However,
> since at this stage non-ASCII characters are represented by multibyte
> sequences, the count becomes greater than 1 (because each byte is
> counted separately). Since at this stage you know that the text is in
> UTF-8, you should count bytes less than 0x80 that are not spaces, and
> also bytes in the 0xc2..0xf4 range (inclusive). This excludes bytes in
> the 0x80..0xbf range, which are always "continuation" bytes (i.e., are
> part of the same character as the previous byte in the stream).
And I forgot to say that using a "%c" in printf is _always_ wrong with
UTF-8 text, exactly because of multibyte characters. You should pass
the complete multibyte character (i.e., all its bytes in one CHAR
command) to speechd-up, by making a string out of it (see below how to
determine the number of bytes to be copied) and using "%s" with
printf.
First byte number of bytes to copy
0x00-0x7F 1 byte
0xC2-0xDF 2 bytes
0xE0-0xEF 3 bytes
0xF0-0xF4 4 bytes
--
Alexander E. Patrakov
More information about the Speechd
mailing list