Documentations

ASR Restful - audio file transcription

Service: asr.api.yating.tw/v1/
This page explains how to transcribe a short audio file to text using synchronous speech recognition.
Synchronous speech recognition returns the recognized text and labeled speakers from the audio. ASR can process audio content stored in a place where it can be accessed via an URL.
This API can automatically detect the number of speakers in your audio file, and each sentence in the transcription text can be associated with its speaker.
Please note that speakerDiarization and sentiment are in BETA test and it’s free for now.

Submit an audio

In the code sample below, we show how to submit the URL of your audio/video file to the API for transcription. After submitting your POST request, you will get a response that includes an ID key and a status key. The length of an audio file should always be under 2 hours.
The status key shows the status of your transcription. It will start with "pending", and then go to "ongoing", and finally to "completed".
Request
URL: /transcriptions
Method: POST
Header
Name
Type
Info
*key
String
*Content-Type
String
Only “application/json”
Body
Name
Type
Info
*audioUri
String
MP3, WAV, MOV, MP4
*modelConfig
Object
See variables in modelConfig table
featureConfig
Object
See variables in featureConfig table
{
   "audioUri": "audioUri",
   "modelConfig": {
       "model": "asr-zh-en-std",
       "customLm": ""
   },
   "featureConfig": {
       "speakerDiarization": true,
       "speakerCount": 0,
       "sentiment": true,
       "punctuation": true 
   }
}
Variables in model Config
Variables
Type
Info
*model
string
see Model codes
*customLm
string
You can create a custom LM and put lmID here.
Variables in feature Config
Variables
Type
Info
*speakerDiarization
boolean
False by default
*speakerCount
int
Default = 0 =0, model do the count >0, assigned by client
*sentiment
boolean
if true, angry = -1, others = 0
If false, all sentiment will be 0
*punctuation
boolean
Response
Name
Type
Info
i
ds
Array
[“key1”,”key2”]
Transaction ID, use this ID to get processing status
[
   {
       "uid": "f59937c7-a9fc-415e-a149-1b4140f09640",
       "audioUri": "audioUri",
       "model": "asr-zh-en-std",
       "customLm": "",
       "isPunctuation": 1,
       "isSpeakerDiarization": 0,
       "speakerCount": 2,
       "status": "pending",
       "createdAt": "2022-08-31T09:22:06.759Z",
       "updatedAt": "2022-08-31T09:22:06.759Z"
   }
]
Http status
status
info
400
customLmNotExists
customLmNotAvailable
customLmNotMatch
This customLM is not compatible with the model.

Get status

After you submit audio files for processing, the "status" key will go from "pending" to "ongoing" and finally to "completed". If something goes wrong, it goes to "error". You can make a GET request, as shown below, to check for updates on the status of your transcription.
You'll have to make repeated GET requests until your status is "completed" or "error". Once the status key is shown as "completed", you can get the transcription in step3.
Request
URL: /transcriptions?page=1&perPage=10&status=pending
Method: GET
Header
Name
Type
Info
Authorization
String
Bearer {key}
Response
Name
Type
Info
id
object
status: pending, completed, ongoing, error, expired, not exists
expiredAt: RFC3339 timestamp
{
   "page": 1,
   "perPage": 10,
   "total": 2,
   "nextPage": null,
   "data": [
       {
           "id": "75911e73-4d3d-4104-a3e5-6ed4f7966b7a",
           "key": "DNf9qx7KUMgKYyMdO9hS",
           "audioUri": "audioUri",
           "model": "asr-zh-en-std",
           "languageModelId": "",
           "isPunctuation": 1,
           "isSpeakerDiarization": 1,
           "speakerCount": 0,
           "status": "pending",
           "createdAt": "2022-08-04T18:02:38.093Z",
           "updatedAt": "2022-08-04T18:02:38.093Z"
       },
       {
           "id": "8d65db58-8f76-40f3-8ad0-b97aad803a8b",
           "key": "DNf9qx7KUMgKYyMdO9hS",
           "audioUri": "audioUri",
           "model": "asr-zh-en-std",
           "languageModelId": "",
           "isPunctuation": 1,
           "isSpeakerDiarization": 1,
           "speakerCount": 0,
           "status": "pending",
           "createdAt": "2022-08-08T09:33:39.011Z",
           "updatedAt": "2022-08-08T09:33:39.011Z"
       }
   ]
}

Get results

Once an audio transcription status key is shown as "completed", you are able to query the text, words, and other keys, including the results of any Audio Intelligence features you enabled, with the results of your transcription populated in the JSON response.
These results will be preserved up to 24 hours after the competition.
Request
URL: /transcriptions/{id}
Method: GET
Header
Name
Type
Info
Authorization
String
Bearer {key}
Response
Name
Type
Info
id
string
status
String
pending, completed, ongoing, error, not exists
modelConfig
JSON
Origin setting in submit audio request
sentences
JSON[]
speakerId: 0,1,2…..and unknown
expiredAt
Datetime
RFC3339 timestamp
{
    "uid": "8c6ef8d3-1b2a-4faa-8ea8-61bd70491e25",
    "audioUri": "audioUri",
    "model": "asr-zh-en-std",
    "customLm": "",
    "isPunctuation": 1,
    "isSpeakerDiarization": 1,
    "speakerCount": 0,
    "isSentiment": 1,
    "status": "completed",
    "createdAt": "2022-10-03T08:12:46.942Z",
    "updatedAt": "2022-10-03T08:29:47.000Z",
    "sentences":[
      {
         "sentenceId":"u8923dy8923",
         "sentence":"天氣很好",
         "start":438600,
         "end":245499,
         "confidence":0.9132,
         "sentiment":0,
         "speakerId":"speakerId",
         "words":[
            {
               "word":"天氣",
               "start":5486554,
               "end":623434
            },
            {
               "word":"很",
               "start":647543,
               "end":823234
            },
            {
               "word":"好",
               "start":867654,
               "end":932324
            }
         ]
      }
   ],
   "expiredAt":"2022-06-26T17:16:08Z"
}
If the result is expired, it will return 404.

Limit

Max concurrent audio transcriptions = 3

Appendix: Model codes

Language code
Info
language
asr-zh-en-std
Use it when speakers speak Chinese more than English
Mandarin and English
asr-zh-tw-std
Use it when speakers speak Chinese and Taiwanese.
Mandarin and Taiwanese
asr-zh-tw-health
Use it when speakers speak Chinese and Taiwanese in the health domain.
Mandarin and Taiwanese