Text to speech (TTS)
Turn any text into natural, expressive speech. Describe the tone in plain language to shape the delivery.
Text to speech · Voice cloning · Multilingual · Speed/pitch control · MP3 / WAV export
Generate natural, expressive AI speech from text or reference audio. Seed Audio 1.0 supports 20 preset voices, voice cloning, and speed/pitch control.
Type your script and get expressive AI speech in seconds.
Reproduce a voice from up to 3 reference clips (30s each).
Supports many languages, including Japanese.
Examples generated with Seed Audio 1.0 — listen to the real voices right here.
「今日のテーマは、AI 音声の活用法です。台本を貼り付けるだけで、自然な会話のような読み上げが作れます。」
「ブランドの声を、すべてのコンテンツで統一できます。一度設定すれば、同じトーンで何度でも生成できます。」
「Seed Audio は日本語と English など、複数の言語に対応しています。」
「わぁ、すごい!こんなに自然な音声が、本当に数秒でできるなんて驚きです。」
From text to natural speech. Text-to-speech, voice cloning and multilingual output in one generator, with flexible controls for any use case.
Turn any text into natural, expressive speech. Describe the tone in plain language to shape the delivery.
Reproduce a voice from reference audio (up to 3 clips, 30s each). Keep one brand voice across all content.
Pick from narration, character or news voices to match your tone and use case.
English, Chinese, Japanese, Spanish, Indonesian and Portuguese, including mixed-language text.
Fine-tune speed (0.5–2.0x), pitch (±12 semitones) and volume.
Export as MP3, WAV, PCM or OGG Opus, up to 48 kHz.
For any workflow that needs a voice. Creators, developers and teams ship audio fast — without a recording studio.
Narrate YouTube videos, ads and explainers with a consistent voice. Regenerate a line in seconds when the script changes.
Turn long scripts into clear narration that keeps tone and pacing steady across chapters.
Pick a steady voice and speed for a clear, broadcast-style read.
Build a reusable brand voice from a short sample and keep it consistent across videos, courses and social.
Speak 6 languages, so videos and courses for global audiences use the same workflow.
Add speech to assistants, menus and game dialogue, and more.
Enter text and generate AI speech right away — no complex setup.
Type the script you want to voice. Add a mood like “calm tone” to shape the delivery.
Pick from 20 preset voices and adjust speed, pitch and volume. You can also clone a voice.
Generate in seconds, preview, and export as MP3 / WAV. Regenerate any line to fine-tune.
Answers about Seed Audio 1.0.
An audio generation model by ByteDance. It generates natural speech from text or reference audio, with text-to-speech (TTS) and voice cloning, usable right in your browser here.
Yes. It supports English, Chinese, Japanese, Spanish, Indonesian and Portuguese, generating natural speech from Japanese text.
Yes — from reference audio (up to 3 clips, 30 seconds each), referenced in the prompt as @Audio1–@Audio3.
New accounts get free credits to try your first generations, then you spend credits based on usage.
Please review your intended use and the applicable terms. See the pricing page and FAQ for details.
MP3, WAV, PCM or OGG Opus, with sample rates up to 48 kHz.
Create natural speech from text with Seed Audio 1.0. Your first generation is covered by free credits.