Documentations

Text to speech v2

TTS Speeches v2

提供語音轉文字，讓您將文本藉由 REST API 轉換為合成語音，並且支援自然的的聲調與特定語言，接下來說明如何用 http request 來取得文字轉語音服務。

Host: https://tts.api.yating.tw

Step1：使用 /speeches/short

Step2：透過 base64 解碼 `audioContent` 並存成音檔

可參考 GitHub by Yating 取得 sample code，來進行串接。

送出語音合成需求

在處理完所有輸入後，一般情況下，回應時間低於 120 秒。

Request

URL: /v2/speeches/short

Method: POST

Header

Name	Type	Info
*key	String
*Content-Type	String	Only “application/json”

Body

Name	Type	Info
*input	JSON	text: 要生成音頻的文本。如果是一般文字內容，最大文本長度為 600 個字符。請注意，一個中文字用 2 個字符計算，全形符號兩個字符，半形符號與空格一個字符。
*voice	JSON	有關聲音的設定請看「voice 設定」
*audioConfig	JSON	有關產出的聲音格式請看「audioConfig 設定」

Input 設定

variables	Type	Info
*text	string	需要轉換成聲音的文字內容
*type	string	輸入內容的格式，目前支援 text ssml 輸入的本文可以是純文字或是SSML(如果你想要語音內容多一點變化與彈性)。有關 SSML 支援的功能，請看這裡。

Voice 設定

variables	Type	Info
*model	string	這邊要放入你要的聲音類型，請看「聲音代碼列表」
*speed	number	語速介於 0.5~1.5之間數值越小，音檔長度越短，速度越快數值越大，音檔長度越長，速度越慢
*pitch	number	音調介於 0.5~1.5之間數值越大，音調越高，聽起來較尖銳數值越小，音調越低，聽起來較低沈
*energy	number	能量介於 0.5~1.5之間模型發音的力度建議使用預設值1

Voice 中的 Model 設定

目前支援兩種女性聲音與一種男性聲音，選擇你要的聲音並設定。

variables	Type	Info
zh_en_female_1	Female1 (Yating)	Mandarin and English, support 22K/16K
zh_en_male_1	Male1 (Jiahao)	Mandarin and English, support 22K/16K
zh_en_female_2	Female2 (Yiqing)	Mandarin and English, support 22K/16K
tai_female_1	Female1 (Yating)	Taiwanese, support 16K
tai_male_1	Male1 (Jiahao)	Taiwanese, support 16K
tai_female_2	Female2 (Yiqing)	Taiwanese, support 16K

AudioConfig 設定

variables	Type	Info
*encoding	string	查看「音訊編碼」
*sampleRate	string	支援 16K 22K 越高的 sample rate，可以提供更加擬真的聲音

AudioConfig 中的 Encoding 音訊編碼設定

除了聲音類型，還可以選擇音頻輸出格式。目前支持 MP3 和 LINEAR16。

codec	Name	Note
LINEAR16	Linear PCM	WAV音檔編碼格式。16 位線性脈衝編碼調製 (PCM) 編碼。標頭必須包含採樣率。
MP3	MP3	MP3 格式

Request Body Example

{
   "input":{
      "text":"這是測試",
      "type":"text"
   },
   "voice":{
      "model":"zh_en_female_2",
      "speed":0.8,
      "pitch":1.3,
      "energy":1.0
   },
   "audioConfig":{
      "encoding":"MP3",
      "sampleRate":"16K"
   }
}

Success Response Body Example

{
   "audioContent":"//NExAARqoIIAAhEuWAAAGNmBGMY4EBcxvABAXBPmPIAF3/o...",
   "audioConfig":{
      "encoding":"MP3",
      "sampleRate":"16K"
   }
}

Failure Response Body Example

{
  statusCode: 400,
  message: string[],
  error: "Bad Request"
}

[201] created，可以在`audioContent` 裡取得 base64 加密後的音檔內容，解密後存入檔案內即可播放。

[400] invalid request format: request 參數不符合規定，如輸入的聲音編碼不存在，sampleRate不支援

[401] unauthorized: 密鑰不存在或超出限制。

[422] pipeline error: 請查看下表（管道錯誤消息）以獲取更多詳細信息。

[500] internal server error

Pipeline Error message

error message	type	Info
internal pipeline error: unknown payload [for reqId(...) segId(...)]	Internal pipeline error
internal pipeline error: inferencer unavailable	Internal pipeline error
ssml validation error: no text to synthesis	Ssml validation error	輸入一個空的 SSML
ssml validation error: parsing error at line: {line}, column: {col}: {msg}	Ssml validation error	SSML 文法錯誤
ssml validation error: context error at line: {line}, column: {col}: {msg}	Ssml validation error
encoding error: encode error for reqId(...) segId(...)	Encoding error	編碼器成功接收並處理消息，但過程中出現錯誤。
vocoding error: vocode error for reqId(...) segId(...)	Vocoding error	聲碼器成功接收並處理了 msg，但過程中出現了錯誤。
unknown error	Unknown error	未知錯誤
service busy	Service busy	系統目前忙碌，請稍後再試。

限制

每個key的最大同時需求處理數：3