Design suggestion: The server just for synthesis

Tomas Cerha cerha at brailcom.org
Mon Nov 15 15:01:47 CET 2010


Dne 15.11.2010 13:38, Andrei Kholodnyi napsal(a):
> Does it mean we want to provide to the application system wide
> capabilities instead of particular driver capabilities?

I'd say NO.

> e.g. if a particular driver does not implement SSML we would implement
> SSML for it inside provider?

I'd say YES.

> and deliver "can_parse_ssml" to the application?

If the applications care whether they get "emulated" or "real" SSML, I'd say we should
be able to tell them.  But this doesn't mean they need to care.

There is currently no specification of the client API, so it is up to the discussion to
decide which features of the lower level API we want to expose to the client.  The
current SSIP is a good start and we can extend it by features needed by the clients.

> If yes, what about the capabilities which we can not implement e.g.
> generic drivers can not generate speech samples as output?

These would not be emulated.  There is a distinct set of capabilities which can not be
emulated from principal, so the applications need to be able to handle both situations
in this case or ignore the drivers which don't support the capability if it is essential.

> For me it is a key design question. someone shall aggregate/handle all
> these differences.
> If we do not do it, then each app shall do this job.

Sure. I believe we should do it wherever possible.

> E.g. app wants to know all voices it can get back as speech samples.
> currently it will probably do:
> for all drivers get capability "can_retrieve_audio"
> if can_retrieve_audio
>   list all voices
>   add them to my favorite list
> 
> whereas we can do it for the app with High level API like:
> list voices with capabilities can_retrieve_audio, i.e. hide particular
> driver capabilities
> 
> This I could imagine as a high level API on top of TTS API

If I understand what you mean, the difference is whether you think of a driver as a
property of voice or vice versa.  Otherwise it is equivalent.  Both approaches can be
implemented above TTS API.

>> An SSIP bridge can also be written on top of the new API for backwards compatibility.
>> Libspeechd, Python library and other client libraries could run without a change through
>> this brigde.
> 
> the only difference in SSIP versus TTS API AFAIR are priority handling
> and history. Not sure how it can be smoothly integrated.
> probably it can be added on top of TTS API as well, but there are APIs
> missing for it,
> probably some tags can be incorporated in the messages?

Yes, TTS API is a low level API.  Priorities are handled within the layer above it.
Thus the client API must have some features not present in TTS API specification.  Also
many features present in TTS API specification do not need to be exposed to the client API.

If it was not clear from the previous discussion, the ambition of TTS API is to become a
standard API for access to TTS engines.  Speech Dispatcher would be the consumer of this
API -- the layer between the clients and the drivers which implement TTS API.  Another
speech service (like Speech Dispatcher) should be able to use the same API and reuse the
same drivers to access speech engines.  This other service might have a different client
API but we can also decide to standardize the client API.  Standardization of the client
API would be a benefit for assistive technologies and other client applications.  On the
other hand, TTS API is good for output modules (tts engine drivers).  One common driver
API can be used by differnt speech systems and the output drivers can be shared.  Both
levels of standardisation make sense, but we believe the low level API is easier to
standardize since it is easier to agree on a common set of low level features.  So we
started with this one.

Thanks everyone for your valuable input.

Best regards, Tomas



More information about the Speechd mailing list