Documentations

ASR Restful - audio file transcription

Service: asr.api.yating.tw/v1/

This page explains how to transcribe a short audio file to text using synchronous speech recognition.

Synchronous speech recognition returns the recognized text and labeled speakers from the audio. ASR can process audio content stored in a place where it can be accessed via an URL.

This API can automatically detect the number of speakers in your audio file, and each sentence in the transcription text can be associated with its speaker.

Please note that speakerDiarization and sentiment are in BETA test and it’s free for now.

Submit an audio

In the code sample below, we show how to submit the URL of your audio/video file to the API for transcription. After submitting your POST request, you will get a response that includes an ID key and a status key. The length of an audio file should always be under 2 hours.

The status key shows the status of your transcription. It will start with "pending", and then go to "ongoing", and finally to "completed".

Request

URL: /transcriptions

Method: POST

Header

Name	Type	Info
*key	String
*Content-Type	String	Only “application/json”

Body

Name	Type	Info
*audioUri	String	MP3, WAV, MOV, MP4
*modelConfig	Object	See variables in modelConfig table
featureConfig	Object	See variables in featureConfig table

{
   "audioUri": "audioUri",
   "modelConfig": {
       "model": "asr-zh-en-std",
       "customLm": ""
   },
   "featureConfig": {
       "speakerDiarization": true,
       "speakerCount": 0,
       "sentiment": true,
       "punctuation": true 
   }
}

Variables in model Config

Variables	Type	Info
*model	string	see Model codes
*customLm	string	You can create a custom LM and put lmID here.

Variables in feature Config

Variables	Type	Info
*speakerDiarization	boolean	False by default
*speakerCount	int	Default = 0=0, model do the count>0, assigned by client
*sentiment	boolean	if true, angry = -1, others = 0 If false, all sentiment will be 0
*punctuation	boolean

Response

Name	Type	Info
i ds	Array	[“key1”,”key2”] Transaction ID, use this ID to get processing status

[
   {
       "uid": "f59937c7-a9fc-415e-a149-1b4140f09640",
       "audioUri": "audioUri",
       "model": "asr-zh-en-std",
       "customLm": "",
       "isPunctuation": 1,
       "isSpeakerDiarization": 0,
       "speakerCount": 2,
       "status": "pending",
       "createdAt": "2022-08-31T09:22:06.759Z",
       "updatedAt": "2022-08-31T09:22:06.759Z"
   }
]

Http status	status	info
400	customLmNotExists
	customLmNotAvailable
	customLmNotMatch	This customLM is not compatible with the model.

Get status

After you submit audio files for processing, the "status" key will go from "pending" to "ongoing" and finally to "completed". If something goes wrong, it goes to "error". You can make a GET request, as shown below, to check for updates on the status of your transcription.

You'll have to make repeated GET requests until your status is "completed" or "error". Once the status key is shown as "completed", you can get the transcription in step3.

Request

URL: /transcriptions?page=1&perPage=10&status=pending

Method: GET

Header

Name	Type	Info
Authorization	String	Bearer {key}

Response

Name	Type	Info
id	object	status: pending, completed, ongoing, error, expired, not exists expiredAt: RFC3339 timestamp

{
   "page": 1,
   "perPage": 10,
   "total": 2,
   "nextPage": null,
   "data": [
       {
           "id": "75911e73-4d3d-4104-a3e5-6ed4f7966b7a",
           "key": "DNf9qx7KUMgKYyMdO9hS",
           "audioUri": "audioUri",
           "model": "asr-zh-en-std",
           "languageModelId": "",
           "isPunctuation": 1,
           "isSpeakerDiarization": 1,
           "speakerCount": 0,
           "status": "pending",
           "createdAt": "2022-08-04T18:02:38.093Z",
           "updatedAt": "2022-08-04T18:02:38.093Z"
       },
       {
           "id": "8d65db58-8f76-40f3-8ad0-b97aad803a8b",
           "key": "DNf9qx7KUMgKYyMdO9hS",
           "audioUri": "audioUri",
           "model": "asr-zh-en-std",
           "languageModelId": "",
           "isPunctuation": 1,
           "isSpeakerDiarization": 1,
           "speakerCount": 0,
           "status": "pending",
           "createdAt": "2022-08-08T09:33:39.011Z",
           "updatedAt": "2022-08-08T09:33:39.011Z"
       }
   ]
}

Get results

Once an audio transcription status key is shown as "completed", you are able to query the text, words, and other keys, including the results of any Audio Intelligence features you enabled, with the results of your transcription populated in the JSON response.

These results will be preserved up to 24 hours after the competition.

Request

URL: /transcriptions/{id}

Method: GET

Header

Name	Type	Info
Authorization	String	Bearer {key}

Response

Name	Type	Info
id	string
status	String	pending, completed, ongoing, error, not exists
modelConfig	JSON	Origin setting in submit audio request
sentences	JSON[]	speakerId: 0,1,2…..and unknown
expiredAt	Datetime	RFC3339 timestamp

{
    "uid": "8c6ef8d3-1b2a-4faa-8ea8-61bd70491e25",
    "audioUri": "audioUri",
    "model": "asr-zh-en-std",
    "customLm": "",
    "isPunctuation": 1,
    "isSpeakerDiarization": 1,
    "speakerCount": 0,
    "isSentiment": 1,
    "status": "completed",
    "createdAt": "2022-10-03T08:12:46.942Z",
    "updatedAt": "2022-10-03T08:29:47.000Z",
    "sentences":[
      {
         "sentenceId":"u8923dy8923",
         "sentence":"天氣很好",
         "start":438600,
         "end":245499,
         "confidence":0.9132,
         "sentiment":0,
         "speakerId":"speakerId",
         "words":[
            {
               "word":"天氣",
               "start":5486554,
               "end":623434
            },
            {
               "word":"很",
               "start":647543,
               "end":823234
            },
            {
               "word":"好",
               "start":867654,
               "end":932324
            }
         ]
      }
   ],
   "expiredAt":"2022-06-26T17:16:08Z"
}

If the result is expired, it will return 404.

Limit

Max concurrent audio transcriptions = 3

Appendix: Model codes

Language code	Info	language
asr-zh-en-std	Use it when speakers speak Chinese more than English	Mandarin and English
asr-zh-tw-std	Use it when speakers speak Chinese and Taiwanese.	Mandarin and Taiwanese
asr-zh-tw-health	Use it when speakers speak Chinese and Taiwanese in the health domain.	Mandarin and Taiwanese