The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM's speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.
The Speech to Text service provides the following endpoints:
/v1/models
returns information about the models (languages and sampling rates) available for transcription./v1/sessions
provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service./v1/recognize
(sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions./v1/register_callback
(asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface./v1/recognitions
(asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.The following general information pertains to the transcription of audio:
Transfer-Encoding
request header with a value of chunked
. Both forms of data transmission impose a limit of 100 MB of total data for transcription.X-Watson-Learning-Opt-Out
to true
for each request. Data is collected for any request that omits this header.For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.