文件

ASR_語音轉文字 offline 音檔辨識

Service: asr.api.yating.tw

本服務藉由URL取得聲音檔案，將聲音轉為文字，並且可以自動檢測說話人數量，讓每個句子都連結到對應的語者。

本服務在台灣的中英與混合講話上相較於其他方案有更好的表現：

將音檔轉成文字

在下面的範例中，我們展示如何將您的音檔 URL 提交到音檔轉文字 API。提交 POST 請求後，您將收到 UID 和狀態，並且音檔長度不得超過兩個小時。如果你沒有雲端空間可以上傳檔案，也可以使用「語音轉文字-檔案上傳」先將檔案上傳後再進行辨識。

請注意，語者分離(speakerDiarization)與情緒辨識(sentiment)屬於獨立功能，會衍生額外費用。但這兩個功能目前是 Beta 測試階段，所以暫時不收費，任何變動會後續公告。

Request

URL: /v1/transcriptions
Method: POST

Header

Name	Type	Info
*key	String
*Content-Type	String	Only “application/json”

Body

Name	Type	Info
*audioUri	String	MP3, WAV, MOV, MP4 Examples: - `yd://8fc4806c-88d3-4ac7-b358-186bc7349c91` - `http://some_domain.com/your_file.mp3` Note: 部分特殊編碼格式可能無法辨識，若持續失敗建議可嘗試先轉成mp3格式再進行辨識
*modelConfig	Object	See variables in modelConfig table
*featureConfig	Object	See variables in featureConfig table

{
   "audioUri": "audioUri",
   "modelConfig": {
       "model": "asr-zh-en-std",
       "customLm": ""
   },
   "featureConfig": {
       "speakerDiarization": false,
       "speakerCount": 0,
       "sentiment": false
   }
}

Variables in modelConfig

Variables	Type	Info
*model	String	請從附錄：語言模型代碼中選擇你要用的語言模型
*customLm	String	請留空，或是將你從客製化模型API中取得的 uid 帶入。請注意，只能在客製化模型狀態是 completed 時才能使用

Variables in featureConfig

Variables	Type	Info
*speakerDiarization	boolean	語者辨識開關，預設為關閉。此功能尚未支援，請保持 false
*speakerCount	int	預設為0, 讓模型判斷有多少語者如果你已經知道有多少語者，請放入大於0的整數
*sentiment	boolean	if true, angry = -1, others = 0 If false, all sentiment will be 0
punctuation	boolean	已棄用此設定，此值恆為 true

Response

[
   {
       "uid": "313fb766-cf44-421c-b818-6127ed91d739",
       "audioUri": "audioUri",
       "model": "asr-zh-en-std",
       "customLm": "",
       "isPunctuation": 1,
       "isSpeakerDiarization": 1,
       "speakerCount": 2,
       "status": "pending",
       "createdAt": "2022-08-26T11:40:42.401Z",
       "updatedAt": "2022-08-26T11:40:42.401Z"
   }
]

Http status	Status	Info
400	customLmNotExists
	customLmNotAvailable
	customLmNotMatch	This customLM is not compatible with the model.

語音辨識狀態查詢

進行處理後，status將從“待處理”變為“處理中”，最後變為“完成”。如果出現問題，就會進入“錯誤”。您可以發出 GET 請求，如下所示，以檢查轉錄狀態的更新。

您必須重複 GET 請求，直到您的狀態為“完成”或“錯誤”。一旦狀態鍵顯示為“已完成”，您就可以在中獲取轉錄。

Request
URL: /v1/transcriptions?page=1&perPage=10
Method: GET

Header

Name	Type	Info
*key	String

Response

Name	Type	Info
data	Array of Objects	status: pending, completed, ongoing, error, expired, not exists

{
   "page": 1,
   "perPage": 10,
   "total": 2,
   "nextPage": null,
   "data": [
      {
         "uid": "75911e73-4d3d-4104-a3e5-6ed4f7966b7a",
         "audioUri": "audioUri",
         "model": "asr-zh-en-std",
         "customLm": "",
         "isPunctuation": 1,
         "isSpeakerDiarization": 1,
         "speakerCount": 0,'
         "isSentiment": 1,
         "status": "completed",
         "audioDuration": 4153,
         "createdAt": "2022-08-04T18:02:38.093Z",
         "updatedAt": "2022-08-04T18:02:38.093Z"
      },
      {
         "uid": "8d65db58-8f76-40f3-8ad0-b97aad803a8b",
         "audioUri": "audioUri",
         "model": "asr-zh-en-std",
         "customLm": "",
         "isPunctuation": 1,
         "isSpeakerDiarization": 1,
         "speakerCount": 0,
         "isSentiment": 1,
         "status": "pending",
         "audioDuration": 0,
         "createdAt": "2022-08-08T09:33:39.011Z",
         "updatedAt": "2022-08-08T09:33:39.011Z"
      }
   ]
}

取得辨識結果

顯示為“已完成”後，您可以查詢文本、單詞和其他鍵，包括您啟用的任何音頻智能功能的結果，並在 JSON 響應中填充轉錄結果。

這些結果將保存至結束後 24 小時。

Request
URL: /v1/transcriptions/{uid}
Method: GET

Header

Name	Type	Info
*key	String

Response

Name	Type	Info
uid	string
status	string	pending, completed, ongoing, error, not exists
sentences	JSON[]	speakerId: 可能是 0,1,2…. 也有可能是 unknown

{
   "uid": "8c6ef8d3-1b2a-4faa-8ea8-61bd70491e25",
   "audioUri": "audioUri",
   "model": "asr-zh-en-std",
   "customLm": "",
   "isPunctuation": 1,
   "isSpeakerDiarization": 1,
   "speakerCount": 0,
   "isSentiment": 1,
   "status": "completed",
   "createdAt": "2022-10-03T08:12:46.942Z",
   "updatedAt": "2022-10-03T08:29:47.000Z",
   "sentences":[
      {
         "sentenceId":"u8923dy8923",
         "sentence":"天氣很好",
         "start":438600,
         "end":245499,
         "confidence":0.9132,
         "sentiment":0,
         "speakerId":"speakerId",
         "words":[
            {
               "word":"天氣",
               "start":5486554,
               "end":623434
            },
            {
               "word":"很",
               "start":647543,
               "end":823234
            },
            {
               "word":"好",
               "start":867654,
               "end":932324
            }
         ]
      }
   ]
}

如果結果過期，則返回 404。

使用限制

Max concurrent audio transcriptions = 3

附錄：語言模型代碼

Language Code	Info	Language
asr-zh-en-std	當說話者說中文多於英文時使用它	中文和英語
asr-zh-tw-std	當說話者說中文和台語時使用它。	中文和台語