文件

TTS 聲音複製(BETA)

你可以藉由上傳一個音檔訓練出專屬深音模型，接著用這模型來唸出你的內容。

建立你的模型

你需要準備一個大於五分鐘的 mp3 音檔，並且確保都是同一個人講話，用相同的語氣。
請注意，每個 speakerId 與對應的 modelId 就是一個獨立的聲音，每個Key至多只能保留五個聲音。若您已有五個聲音，要新增第六個會失敗，所以你需要先刪除其中一個模型才能技術。

每次聲音訓練大約半小時內結束。

發起聲音Clone需求

Request
URL: /v3/voice-cloning/models
Method: POST

Header

Name	Type	Info
*key	String
*Content-Type	String	Only “application/json”

Body

{
    "speakers": [
        {
            "speakerId": "speaker_name",
            "audioUri": "https://ia801409.us.archive.org/27/items/13_20220920/%E4%B8%AD%E8%8B%B113%E5%88%86%E9%90%98.mp3"
        }
    ]
}

Response

{
    "uid": "ab31c64e-c79d-45f5-9fea-4866427a5972",
    "taskId": "08e99e03-dd42-427b-97d0-09d3495f8868",
    "status": "pending",
    "createdAt": "2024-04-25T05:16:23.545Z",
    "updatedAt": "2024-04-25T05:16:23.000Z"
}

查詢訓練進度

Request
URL: /v3/voice-cloning/models/
Method: GET

Response

{
    "page": 1,
    "perPage": 10,
    "total": 2,
    "nextPage": null,
    "data": [
        {
            "uid": "ab31c64e-c79d-45f5-9fea-4866427a5972",
            "taskId": "08e99e03-dd42-427b-97d0-09d3495f8868",
            "status": "completed",
            "createdAt": "2024-04-25T05:16:23.545Z",
            "updatedAt": "2024-04-25T05:30:45.000Z"
        },
        {
            "uid": "e52bec3e-d657-4980-b535-18313e77aec4",
            "taskId": "c26314db-fe18-40a4-ad0f-272771eacead",
            "status": "completed",
            "createdAt": "2024-04-25T03:31:41.550Z",
            "updatedAt": "2024-04-25T03:50:33.000Z"
        }
    ]
}

取的speakerId資訊

Request
URL: /v3/voice-cloning/models/:uid
Method: GET

Response

{
    "uid": "ab31c64e-c79d-45f5-9fea-4866427a5972",
    "taskId": "08e99e03-dd42-427b-97d0-09d3495f8868",
    "status": "completed",
    "createdAt": "2024-04-25T05:16:23.545Z",
    "updatedAt": "2024-04-25T05:30:45.000Z",
    "speakers": [
        {
            "speakerId": "speaker_name",
            "audioUri": "https://ia801409.us.archive.org/27/items/13_20220920/%E4%B8%AD%E8%8B%B113%E5%88%86%E9%90%98.mp3",
            "createdAt": "2024-04-25T05:16:23.611Z",
            "updatedAt": "2024-04-25T05:16:23.611Z"
        }
    ]
}

刪除訓練好的模型

Request
URL: /v3/voice-cloning/models/:uid
Method: DELETE

用你的聲音進行TTS音訊生成(非同步)

發起TTS需求

在處理完所有輸入後，一般情況下，回應時間是音檔長度的一半。

Request

URL: /v3/voice-cloning/speeches
Method: POST

Header

Name	Type	Info
*key	String
*Content-Type	String	Only “application/json”

Body

{
    "input": {
        "text": "這是一個測試",
        "type": "text"
    },
    "voice": {
        "modelId": "7b0fecb2-2fc8-4883-93a5-a31523e17399",
        "speakerId": "speaker_name",
        "lang": "zh_tw"
    },
    "audioConfig": {
        "encoding": "LINEAR16",
        "maxLength": 600000
    }
}

AudioConfig 設定

Variables	Type	Info
*encoding	String	查看「音訊編碼」

Encoding 音訊編碼
除了聲音類型，還可以選擇音頻輸出格式。目前支持 MP3 和 LINEAR16。

codec	Name	Note
LINEAR16	Linear PCM	WAV音檔編碼格式。16 位線性脈衝編碼調製 (PCM) 編碼。標頭必須包含採樣率。
MP3	MP3	MP3格式

Request Body Example

{
    "input": {
        "text": "哈哈哈:",
        "type": "text"
    },
    "voice": {
        "modelId": "ab31c64e-c79d-45f5-9fea-4866427a5972",
        "speakerId": "speaker_name",
        "lang": "zh_tw"
    },
    "audioConfig": {
        "encoding": "LINEAR16",
        "maxLength": 600000
    }
}

Response Body Example

{
    "uid": "7c21f758-e982-4513-a406-1228bab64195",
    "inputText": "哈哈哈:",
    "inputType": "text",
    "voiceLang": "zh_tw",
    "audioEncoding": "LINEAR16",
    "audioMaxLength": 600000,
    "status": "pending",
    "createdAt": "2024-04-25T08:10:03.377Z",
    "updatedAt": "2024-04-25T08:10:03.000Z"
}

TTS生成-處理狀態查詢

提交後，“狀態”鍵將從“待處理”變為“處理中”，最後變為“完成”。如果出現問題，就會進入“錯誤”。您可以發出 GET 請求，如下所示，以檢查轉錄狀態的更新。

您必須重複 GET 請求，直到您的狀態為“完成”或“錯誤”。一旦狀態鍵顯示為“已完成”，您就可以從 path 拿到新的連結，並且可以直接下載檔案。

Request

URL: /v3/voice-cloning/speeches?page=1&perPage=10
Method: GET

Header

Name	Type	Info
*key	String

Query String Parameter

Name	Info
page	number
perPage	number
status	pending, ongoing, completed, error. If no value, get all list

Response

{
    "page": 1,
    "perPage": 10,
    "total": 5,
    "nextPage": null,
    "data": [
        {
            "uid": "16faaf9a-feed-4855-9466-09b1cbb6cd13",
            "inputText": "哈哈哈，你是誰啊",
            "inputType": "text",
            "voiceLang": "zh_tw",
            "audioEncoding": "LINEAR16",
            "audioMaxLength": 600000,
            "status": "completed",
            "createdAt": "2024-04-25T08:13:15.395Z",
            "updatedAt": "2024-04-25T08:13:25.000Z"
        }
    ]
}

TTS生成-取得生成後的音檔

只要你的音檔狀態是 completed，你即可從以下 resultUrl 中取得處理後的音檔連結，並進行下載。

請注意，一但音檔完成後，因為安全問題，檔案只會保留 24 小時，超過就漚會自動刪除。

Request

URL: /v3/voice-cloning/speeches/:uid
Method: GET

Header

Name	Type	Info
*key	String

Path Parameter

Name	Info
uid	string, uid.

Response Body Example

{
    "uid": "16faaf9a-feed-4855-9466-09b1cbb6cd13",
    "inputText": "哈哈哈，你是誰啊",
    "inputType": "text",
    "voiceLang": "zh_tw",
    "audioEncoding": "LINEAR16",
    "audioMaxLength": 600000,
    "status": "completed",
    "createdAt": "2024-04-25T08:13:15.395Z",
    "updatedAt": "2024-04-25T08:13:25.000Z",
    "audioFile": {
        "fileName": "1b41f08b-0f6a-4655-a444-e55541ce5d07_speed_up.wav",
        "fileHash": "c801a275f8565cd7fb5bcd522e8d8585cab173bf",
        "url": "https://lab5-k8s.corp.ailabs.tw/tts-minio/tts-public/tts-audio/1b41f08b-0f6a-4655-a444-e55541ce5d07_speed_up.wav?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minioadmin%2F20240425%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240425T081321Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=ec552b51af427cc47780f860c8a7ade862c89a6af08ada9c8442d18d049ca53d",
        "expiredAt": "",
        "errorMessage": ""
    }
}

錯誤訊息

key is not authorized
Bad Request
Body Parameter Error
Send Task Distribution Error
Query Parameter Error
Voice Cloning Model Limit Error
Voice Cloning Model Id Type Error
Voice Cloning Model Not Found
Voice Cloning Model Speaker Not Found
Voice Cloning Speeches Id Type Error
Voice Cloning Speeches Not Found