Freepik

    Audio nodes

    Spaces includes a set of audio nodes that let you generate voiceovers, music, and sound effects — all from text descriptions. Use them to add narration, soundtracks, and ambient audio to your video workflows without recording or licensing anything.

    In this article

    Available audio nodes

    NodeWhat it does
    VoiceoverConverts text into natural-sounding speech using AI voices
    Music GeneratorComposes original music tracks from text descriptions
    Sound EffectsGenerates AI-powered foley and ambient sounds from text
    Video Audio MixCombines multiple audio tracks with video

    All audio nodes are new additions to Spaces.

    Voiceover, Music Generator, and Sound Effects are being rolled out gradually and may not be available in your account yet.

    Voiceover

    The Voiceover node converts text into natural-sounding speech. Choose from hundreds of AI voices across multiple providers to narrate scripts, create dialogue, or produce voice content.

    How to use the Voiceover node

    1

    Add the node

    Add a Voiceover node to your Space.

    2

    Enter your script

    Type or paste your script into the text field — or connect a Text node to the input port.

    3

    Select a model

    Choose ElevenLabs v2, ElevenLabs v3, or Gemini 2.5 Pro.

    4

    Pick a voice

    Click the voice chip to open the voice library — browse and preview voices before selecting one.

    5

    Adjust parameters

    Set speed, stability, and similarity boost if needed.

    6

    Generate

    Set the number of generations from 1 to 10 and run the node.

    Voice parameters

    ParameterRangeWhat it does
    Speed0.7x to 1.2xSpeaking rate — lower is slower, higher is faster
    Stability0 to 1Voice consistency — lower is more expressive, higher is more stable
    Similarity Boost0 to 1How closely the output matches the selected voice

    Input and output

    The Voiceover node accepts a Text input — your script — and outputs Audio — the generated voiceover. You can type directly on the card or connect a Text node for dynamic scripts.

    Use cases

    • Video narration — Write a script, generate a voiceover, then combine it with your video using Video Audio Mix.
    • Podcast creation — Generate voiceovers for individual segments and combine them into a full episode.
    • Character dialogue — Use multiple Voiceover nodes with different voices, then mix them together for conversation scenes.
    • Lip-sync video — Generate a voiceover, then connect the audio output to a lip-sync video model so your character speaks in sync.

    Voiceover models

    Three AI models are available for voice generation. Each has different strengths — pick the one that matches your project.

    ModelProviderSpeedQualityBest for
    ElevenLabs v2 TurboElevenLabsFastGoodQuick narration, batch processing
    ElevenLabs v3ElevenLabsModerateHighFinal production, emotional narration
    Gemini 2.5 ProGoogleModerateHighMulti-language content, conversational tone

    ElevenLabs v2 is the fastest option. Use it when you need to generate many voiceovers quickly or iterate on scripts during production. It supports Speed, Stability, and Similarity Boost parameters.

    ElevenLabs v3 delivers more natural prosody and intonation than v2 — pacing, emphasis, and emotional tone feel closer to a human read. Use it for final output when quality matters most. Same parameters as v2.

    Gemini 2.5 Pro excels at multi-language content and conversational delivery. It supports Temperature, System Instruction, and Language selection. Multi-speaker configuration is possible for dialogue-style output.

    Gemini 2.5 Pro is in gradual rollout and may not be available to all users yet.

    Quick guide: which model to choose

    You need...Use this
    Fast turnaroundElevenLabs v2
    Best audio qualityElevenLabs v3
    Multiple languagesGemini 2.5 Pro
    Emotional, expressive narrationElevenLabs v3
    Conversational or dialogue toneGemini 2.5 Pro
    Batch processing many clipsElevenLabs v2

    Music Generator

    The Music Generator node creates original music from text descriptions. Describe the mood, genre, tempo, and instruments — the AI composes a unique track.

    How to use the Music Generator node

    1

    Add the node

    Add a Music Generator node to your Space.

    2

    Describe your music

    Write a description of the music you want in the prompt field — or connect a Text node to the input port.

    3

    Select a model

    Choose Google Lyria or ElevenLabs Music.

    4

    Set the duration

    Up to 30 seconds for Lyria, up to 10 seconds for ElevenLabs.

    5

    Generate

    Set the number of generations from 1 to 10 and run the node.

    Input and output

    The Music Generator node accepts a Text input — your music description — and outputs Audio — the generated track.

    Use cases

    • Video soundtrack — Describe a mood and generate a background track, then combine it with your video using Video Audio Mix.
    • Podcast intro music — Generate a short, branded intro with ElevenLabs Music.
    • Background music — Create lo-fi, ambient, or genre-specific loops for content.
    • Compare styles — Write several genre descriptions, generate them all, and pick the best fit for your project.

    Music Generator models

    Two AI models are available for music generation, each optimized for different use cases.

    ModelProviderMax durationBest for
    Google LyriaGoogle30 secondsBackground music, soundtracks, ambient, varied genres
    ElevenLabs MusicElevenLabs10 secondsJingles, intros, sound logos, short loops

    Google Lyria excels at longer compositions with natural musical structure. It handles a wide range of genres and can produce pieces with evolving arrangement — intros, builds, and transitions.

    ElevenLabs Music is optimized for short, concentrated pieces. The output is clean and well-defined — ideal for branding elements, transitions, and loop-ready clips.

    Quick guide: which model to choose

    You need...Use this
    More than 10 secondsGoogle Lyria
    Short, punchy audioElevenLabs Music
    Varied instrumentationGoogle Lyria
    Quick generationElevenLabs Music
    Soundtrack or background musicGoogle Lyria
    Jingle, intro, or sound logoElevenLabs Music

    Sound Effects

    The Sound Effects node generates AI-powered audio from text descriptions. Describe any sound — from rain on a tin roof to a spaceship engine humming — and the AI creates it. Use it to add atmosphere and foley to video projects.

    How to use the Sound Effects node

    1

    Add the node

    Add a Sound Effects node to your Space.

    2

    Describe the sound

    Write a description in the prompt field — or connect a Text node to the input port.

    3

    Set the duration

    Choose the desired length for the sound effect.

    4

    Enable Loop if needed

    Turn on Loop for sounds that need to play continuously, like ambient or background audio.

    5

    Generate

    Set the number of generations from 1 to 10 and run the node.

    Input and output

    The Sound Effects node accepts a Text input — your sound description — and outputs Audio — the generated effect.

    Use cases

    • Video foley — Describe scene sounds, generate them, and layer them onto your video with Video Audio Mix.
    • Ambient loops — Create continuous background audio like coffee shop ambiance or rain, with Loop enabled.
    • Podcast intros — Create unique audio branding with dramatic stings or transition sounds.
    • Game audio — Generate UI sounds, environmental ambience, or action effects for game prototyping.

    Video Audio Mix

    The Video Audio Mix node combines multiple audio tracks with a video. Use it as the final step in your audio workflow — connect your voiceover, music, and sound effects, then mix them with your generated or uploaded video.

    Typical audio workflow

    A common pattern for producing narrated video with a soundtrack in Spaces:

    1

    Write your script

    Use a Text node or type directly into the Voiceover node.

    2

    Generate voiceover

    The Voiceover node converts your script to natural speech.

    3

    Add music

    The Music Generator creates a background track from a mood description.

    4

    Layer sound effects

    The Sound Effects node generates ambient audio or foley.

    5

    Mix everything

    Connect all audio outputs and your video to a Video Audio Mix node.

    You can connect as many audio sources as you need before combining them with the final video.

    Prompting tips

    Good prompts lead to better audio. Here is what works for each node.

    Voiceover

    Your voiceover prompt is the script itself — write it exactly as you want it spoken. Keep sentences natural and conversational. If you need specific pacing or emphasis, choose the right model and adjust the voice parameters rather than trying to encode delivery instructions in the text.

    Music Generator

    Include genre, mood, tempo, and instruments for the most control.

    Cinematic orchestral piece, dramatic, building tension, strings and brass, 120 BPM

    Acoustic folk guitar, warm and nostalgic, fingerpicking style, 90 BPM

    For ElevenLabs Music — shorter pieces — keep descriptions focused on a single mood or purpose.

    Upbeat electronic jingle, happy, synth lead

    Dark ambient drone, eerie, low frequency hum

    Sound Effects

    Be descriptive and specific. The more detail you include, the closer the result will match what you hear in your head.

    Heavy rain on a tin roof with distant thunder — works better than just rain

    Busy coffee shop ambiance with quiet conversation and espresso machine — works better than cafe sounds

    Tips and best practices

    Preview voices before committing. The voice library includes sample playback for every voice — listen before you generate.

    Lower stability for expression. If your voiceover sounds too flat or robotic, try reducing the Stability parameter. This adds more variation and emotional range to the delivery.

    Generate multiple variations. Audio generation — especially music and sound effects — is highly variable. Generate several versions of the same prompt and pick the best one.

    Use Loop for ambient sounds. Enable the Loop toggle on the Sound Effects node when you need continuous background audio like rain, traffic, or office ambiance.

    Draft with v2, finish with v3. Use ElevenLabs v2 for fast iteration on scripts, then switch to v3 for the final voiceover when quality matters.

    Be specific about tempo in music prompts. Adding a BPM value like 120 BPM gives the AI a concrete target and produces more consistent results.

    Use multiple Voiceover nodes for dialogue. Add several Voiceover nodes with different voices to create conversation scenes, then combine them with Video Audio Mix.

    Combine Sound Effects for rich soundscapes. Layer multiple Sound Effects nodes — for example, rain plus distant traffic plus indoor echo — and mix them together for depth.

    Can't find an answer to your question?

    Our support team is here to help you with any questions or issues.

    Submit a request