Gemini 3.1 Flash TTS Preview

curl --request POST \
  --url https://api.shuyou.ai/v1/predictions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gemini-3.1-flash-tts-preview",
  "function": "audio",
  "input": {
    "prompt": "Say cheerfully: Have a wonderful day!",
    "style_instructions": "Say the following in a warm, upbeat tone.",
    "voice": "Kore",
    "language": "en-US",
    "output_format": "mp3"
  },
  "webhook": "https://api.shuyou.ai/backend/api/callback"
}
'

{
  "data": {
    "task_id": "2c4d50261173430290971a2395a3b607",
    "task_status": "processing"
  }
}

POST

predictions

curl --request POST \
  --url https://api.shuyou.ai/v1/predictions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "gemini-3.1-flash-tts-preview",
  "function": "audio",
  "input": {
    "prompt": "Say cheerfully: Have a wonderful day!",
    "style_instructions": "Say the following in a warm, upbeat tone.",
    "voice": "Kore",
    "language": "en-US",
    "output_format": "mp3"
  },
  "webhook": "https://api.shuyou.ai/backend/api/callback"
}
'

{
  "data": {
    "task_id": "2c4d50261173430290971a2395a3b607",
    "task_status": "processing"
  }
}

Prompt (`input.prompt`)

Required. Text to synthesize (max 32,000 characters). Supports expressive markup tags in the text:

Tag	Effect
`[sigh]`	Sigh
`[laughing]`	Laughter
`[whispering]`	Whisper
`[shouting]`	Shout
`[extremely fast]`	Very fast delivery

Style instructions (`input.style_instructions`)

Optional (max 1,000 characters). Natural-language directions for tone, pace, accent, and emotion. Default: Say the following.

Voice presets (`input.voice`)

Optional. Default Kore.

`voice`
`Achernar`
`Achird`
`Algenib`
`Algieba`
`Alnilam`
`Aoede`
`Autonoe`
`Callirrhoe`
`Charon`
`Despina`
`Enceladus`
`Erinome`
`Fenrir`
`Gacrux`
`Iapetus`
`Kore`
`Laomedeia`
`Leda`
`Orus`
`Pulcherrima`
`Puck`
`Rasalgethi`
`Sadachbia`
`Sadaltager`
`Schedar`
`Sulafat`
`Umbriel`
`Vindemiatrix`
`Zephyr`
`Zubenelgenubi`

Language (`input.language`)

Optional BCP-47 language code. Default en-US.

Code	Code	Code	Code
`af-ZA`	`am-ET`	`ar-001`	`ar-EG`
`az-AZ`	`be-BY`	`bg-BG`	`bn-BD`
`ca-ES`	`ceb-PH`	`cmn-CN`	`cmn-tw`
`cs-CZ`	`da-DK`	`de-DE`	`el-GR`
`en-AU`	`en-GB`	`en-IN`	`en-US`
`es-419`	`es-ES`	`es-MX`	`et-EE`
`eu-ES`	`fa-IR`	`fi-FI`	`fil-PH`
`fr-CA`	`fr-FR`	`gl-ES`	`gu-IN`
`he-IL`	`hi-IN`	`hr-HR`	`ht-HT`
`hu-HU`	`hy-AM`	`id-ID`	`is-IS`
`it-IT`	`ja-JP`	`jv-JV`	`ka-GE`
`kn-IN`	`ko-KR`	`kok-IN`	`la-VA`
`lb-LU`	`lo-LA`	`lt-LT`	`lv-LV`
`mai-IN`	`mg-MG`	`mk-MK`	`ml-IN`
`mn-MN`	`mr-IN`	`ms-MY`	`my-MM`
`nb-NO`	`ne-NP`	`nl-NL`	`nn-NO`
`or-IN`	`pa-IN`	`pl-PL`	`ps-AF`
`pt-BR`	`pt-PT`	`ro-RO`	`ru-RU`
`sd-IN`	`si-LK`	`sk-SK`	`sl-SI`
`sq-AL`	`sr-RS`	`sv-SE`	`sw-KE`
`ta-IN`	`te-IN`	`th-TH`	`tr-TR`
`uk-UA`	`ur-PK`	`vi-VN`

Output format (`input.output_format`)

Optional. Default mp3.

Value	Description
`mp3`	MP3 audio
`wav`	WAV audio
`ogg_opus`	Ogg Opus audio

Authorizations

Authorization

string

header

required

Authorization: Bearer YOUR_API_KEY

Body

application/json

model

enum<string>

default:gemini-3.1-flash-tts-preview

required

Model ID. Use gemini-3.1-flash-tts-preview for this endpoint.

Available options:

gemini-3.1-flash-tts-preview

Example:

"gemini-3.1-flash-tts-preview"

function

enum<string>

required

Task type. Must be audio for text-to-speech.

Available options:

audio

Example:

"audio"

input

object

required

Show child attributes

webhook

string<uri>

Optional HTTPS callback URL when the task completes, fails, or is cancelled.

Response

Async task created

data

object

required

Show child attributes

Gemini 2.5 Flash Image (Nano Banana)Inworld TTS 1.5 Mini

​Prompt (input.prompt)

​Style instructions (input.style_instructions)

​Voice presets (input.voice)

​Language (input.language)

​Output format (input.output_format)

Authorizations

Body

Response

Prompt (`input.prompt`)

Style instructions (`input.style_instructions`)

Voice presets (`input.voice`)

Language (`input.language`)

Output format (`input.output_format`)