symbolic voice-types versus synthesis voices
cerha at brailcom.org
Mon Nov 8 12:11:02 CET 2010
Dne 7.11.2010 21:11, Andrei Kholodnyi napsal(a):
>> Unfortunately, I don't have a very solid proposal. I just find it
>> slightly confusing that there are two ways to select a voice.
> yes, completely agree. I also think we need to stay with one way.
>> When I replied to Andrei's message yesterday, I commented that
>> it would be nice to do away with the synthesis voice entirely. From a
>> user's perspective, it is perhaps easier to think about voices in general
>> terms, such as male1 and female1.
> also agree, it makes sense to have the same look and feel for all
> voices regardless any module specifics.
> however at the moment we have SPDVoiceType which is restricted to 3
> types per gender per module.
Historically, SPDVoiceType was initialy the only method. Then "list synthesis voices"
and "set synthesis voice" was added because it was a requirement of Orca, which exposes
the available synthesis voices directly to the user for selection. This makes sense to
me, since users are used to refer to the voices by their real name.
> Probably it is better to use approach defined in SSML?
> gender: Enumerated values are: "male", "female", "neutral", or
> the empty string "".
> age: preferred age in years (since birth) of the voice to
> speak the contained text.
> variant: a preferred variant of the other voice characteristics
> to speak the contained text. (e.g. the second male child voice).
> name: a processor-specific voice name to speak the contained text.
> languages: list of languages the voice is desired to speak.
Yes, voice selection by properties might be a good thing, but I believe it is an
independent feature. We can allow retrieving of voice properties for a particular
synthesis voice. If the client has this information, it can select the best matching
voice client-side. Or we can provide some additional convenience functions, such as
limiting the result of synth voice listing by given criteria.
> but it also gives a big diversity in naming, since different synths
> name voices differently.
But does this diversity matter? If these diverse names are exposed to the end user, I
think it is still better than exposing nicely aligned symbolic names, which carry no
information (except for the gender). The client can also expose voice properties to the
user if this is implemented (and available).
So if a had to choose between the two methods, I'd choose synthesis voices + voice
params, since this allows greater flexibility. It also allows symbolic voice names to
be implemented client side, while it is not possible the other way round (implement
synthesis voices client side if the server only supports symbolic voice types).
Best regards, Tomas
More information about the Speechd