Documentations
ASR Restful - audio file transcription
Service: asr.api.yating.tw/v1/
This page explains how to transcribe a short audio file to text using synchronous speech recognition.
Synchronous speech recognition returns the recognized text and labeled speakers from the audio. ASR can process audio content stored in a place where it can be accessed via an URL.
This API can automatically detect the number of speakers in your audio file, and each sentence in the transcription text can be associated with its speaker.
Please note that speakerDiarization and sentiment are in BETA test and it’s free for now.
Submit an audio
In the code sample below, we show how to submit the URL of your audio/video file to the API for transcription. After submitting your POST request, you will get a response that includes an ID key and a status key. The length of an audio file should always be under 2 hours.
The status key shows the status of your transcription. It will start with "pending", and then go to "ongoing", and finally to "completed".
Request
URL: /transcriptions
Method: POST
Header
Name
| Type
| Info
|
*key | String
| |
*Content-Type | String
| Only “application/json”
|
Body
Name
| Type
| Info
|
*audioUri | String
| MP3, WAV, MOV, MP4 |
*modelConfig | Object
| See variables in modelConfig table |
featureConfig
| Object
| See variables in featureConfig table |
|
Variables in model Config
Variables
| Type
| Info
|
*model | string
| see Model codes
|
*customLm | string
| You can create a custom LM and put lmID here.
|
Variables in feature Config
Variables
| Type
| Info
|
*speakerDiarization | boolean
| False by default
|
*speakerCount | int
| Default = 0=0, model do the count>0, assigned by client
|
*sentiment | boolean
| if true, angry = -1, others = 0
If false, all sentiment will be 0
|
*punctuation | boolean
|
Response
Name
| Type
| Info
|
i ds
| Array
| [“key1”,”key2”]
Transaction ID, use this ID to get processing status
|
|
Http status
| status
| info
|
400
| customLmNotExists
| |
customLmNotAvailable
| ||
customLmNotMatch
| This customLM is not compatible with the model.
|
Get status
After you submit audio files for processing, the "status" key will go from "pending" to "ongoing" and finally to "completed". If something goes wrong, it goes to "error". You can make a GET request, as shown below, to check for updates on the status of your transcription.
You'll have to make repeated GET requests until your status is "completed" or "error". Once the status key is shown as "completed", you can get the transcription in step3.
Request
URL: /transcriptions?page=1&perPage=10&status=pending
Method: GET
Header
Name
| Type
| Info
|
Authorization
| String
| Bearer {key}
|
Response
Name
| Type
| Info
|
id
| object
| status: pending, completed, ongoing, error, expired, not exists
expiredAt: RFC3339 timestamp
|
|
Get results
Once an audio transcription status key is shown as "completed", you are able to query the text, words, and other keys, including the results of any Audio Intelligence features you enabled, with the results of your transcription populated in the JSON response.
These results will be preserved up to 24 hours after the competition.
Request
URL: /transcriptions/{id}
Method: GET
Header
Name
| Type
| Info
|
Authorization | String
| Bearer {key} |
Response
Name
| Type
| Info
|
id
| string
| |
status
| String
| pending, completed, ongoing, error, not exists
|
modelConfig
| JSON
| Origin setting in submit audio request
|
sentences
| JSON[]
| speakerId: 0,1,2…..and unknown
|
expiredAt
| Datetime
| RFC3339 timestamp
|
|
If the result is expired, it will return 404.
Limit
Max concurrent audio transcriptions = 3
Appendix: Model codes
Language code
| Info
| language
|
asr-zh-en-std
| Use it when speakers speak Chinese more than English
| Mandarin and English
|
asr-zh-tw-std
| Use it when speakers speak Chinese and Taiwanese.
| Mandarin and Taiwanese
|
asr-zh-tw-health
| Use it when speakers speak Chinese and Taiwanese in the health domain.
| Mandarin and Taiwanese
|