Getting started

AppTek provides two API services for different use cases.

Batch APIs (HTTP)

AppTek Cloud Application Programming Interfaces (APIs) provide a set of speech and natural language services across a wide range of languages. All services are exposed as HTTP REST endpoints, very easy to work with from practically any language: Python requests library, Golang net/http package, .NET System.Net.Http.HttpClient class, and many others. You can even debug single requests from the shell command line using cURL, or using it in a shell script.


Service Discovery

Query available services and models with a unified endpoint for all API services. It returns a complete list of all services, and the models available for each of these services.


Transcription

Submit audio file for transcription with the AppTek Cloud Speech-to-Text REST API for processing by the Automatic Speech Recognition engine (ASR).


Machine Translation

Submit a text file with the AppTek Cloud Machine Translation REST API to the natural language translation engine.


Language Identification

Perform language identification with the AppTek Language Id API for determining the language of speech in the audio. The service returns a list of languages for each speech segment along with a confidence score.


Audio Alignment

Submit text for audio alignment with the AppTek Text to Audio Alignment API to obtain the alignment of an untimed reference transcript with its audio recording. The service first processes the audio file with the AppTek’s (ASR) to generate a full transcript of the audio, which is then processed with advanced alignment algorithms to provide a time-coded version of the input text.

The source transcript must be an accurate enough with respect to the speech in the audio. Markup, symbols or other non-spoken items should be removed from the submitted reference text. The closer the transcript matches the audio, the better final alignments you will obtain in the end.

The primary output of the service is the timestamped text in the industry standard closed-captioning SRT file. Additional outputs that highlight the mismatches between the ASR-generated transcript and the submitted input text are also available. These are helpful to examine for possible non-cleaned or mismatched transcription, and other data preparation errors.


Text to Speech

Submit text to synthesize speech with the AppTek text-to-speech API. The input is a plain text document and the result is a WAV file containing the synthesized speech.


Named Entity Identification

Submit text for Named Entity Identification. Identifies and tags named entities within the text. The input is plain text to be analyzed and tagged results are returned in a json object.


Sentiment Analysis

Submit text for Sentiment Analysis. The input is a json array containing text to be analyzed. The output is a json object with sentiment scores ranging from 0.0 (Negative) to 1.0 (Positive) for each text element.


Streaming API (gRPC)

Protocol Buffer Documentation

See the Protocol Buffer documentation here - Streaming gRPC API