Enter text and generate natural AI speech in seconds. This guide walks through everything from creating an account to writing prompts, cloning a voice, and exporting your audio with Seed Audio 1.0.
Sign up with email or social login and you receive free credits. They are enough to try your first Seed Audio 1.0 generations. You can check your remaining credit balance before every generation.
Paste the script or sentence you want read aloud into the editor. Seed Audio speaks English, Chinese, Japanese, Spanish, Indonesian and Portuguese. Add a description of the tone you want — calm, upbeat — to shape the delivery.
Choose from 20 preset voices to match your use case. To use your own or a brand voice, upload reference audio to clone it. You can provide up to 3 reference clips (each 30 seconds or less), referenced in the prompt as @Audio1, @Audio2 and @Audio3.
Tune speaking speed (0.5–2.0x), pitch (±12 semitones) and volume (0.5–2.0x). Export as MP3, WAV, PCM or OGG Opus, with sample rates up to 48 kHz. Set it to fit your project and drop the audio straight into a video or podcast.
Press generate and Seed Audio turns your text into audio. Preview it in place, then download as MP3 or WAV. To fine-tune any line, just change the text or settings and regenerate.
Seed Audio 1.0 prompts work in plain sentences — no special syntax to memorize. Alongside the line you want spoken, describe the scene or emotion (for example, in a calm tone, sounding cheerful) to change the delivery.
Describing a scene like “a late-night convenience store, suspenseful” nudges the tone and pacing toward that mood. Adding the use case — narration, ad, drama — in a few words also helps.
Commas, periods and line breaks create natural pauses and make speech easier to follow. Splitting long passages keeps the pacing steady.
Pick a preset voice and speed that fit narration, character, or news reading. The same script sounds very different across voices, so preview and compare.
Voice cloning reproduces a speaker best from clean, low-noise reference audio. Each clip can be up to 30 seconds, with a maximum of 3 clips, referenced in the prompt as @Audio1. Note that a reference image and reference audio cannot be used together.
Speed ranges 0.5–2.0x, pitch ±12 semitones, and volume 0.5–2.0x. If delivery is too fast, lower the speed; to make a voice lower and calmer, drop the pitch.
Seed Audio supports English, Chinese, Japanese, Spanish, Indonesian and Portuguese, and can read text that mixes languages.
If the delivery is off, try splitting the text into shorter parts, adding punctuation, or choosing a different voice. As a new model, a generation can occasionally fail — simply generate again and it often resolves.
Every generation on this site uses Seed Audio 1.0, which supports text to speech and voice cloning.
Open the generator and turn text into AI speech. Your first generation is covered by free credits.