Freepik

    Speak

    Turn images and videos into speaking clips with AI. Write a script, choose a voice, and generate lip-synced video in seconds.

    Speak takes a still image or a video and makes the character in it talk. You provide the words (by writing a script or uploading audio) and the AI handles lip sync, facial movement, and timing. Open Speak to get started.

    In this article

    How it works

    Speak combines two inputs: a visual base (an image or a video with a character in it) and audio (either generated from a script or uploaded by you). The AI analyzes the character in the visual, generates or receives the audio, and produces a video where the character speaks with synchronized lip movement.

    There are two ways to provide the audio:

    ModeHow it worksBest for
    ScriptYou write what the character says. Speak generates the voice using AI text-to-speech.Quick creation, testing ideas, characters that need a specific AI voice.
    Add audioYou upload your own audio recording.Using your real voice, professional voiceovers, or audio from other sources.

    Create a speaking clip from a script

    1

    Open Speak

    Go to freepik.com/pikaso/video-speak.

    2

    Upload your content

    Add a start image or a video as the base for your speaking clip. The image or video should contain a visible character with a clear face.

    3

    Select the Script tab

    Make sure the Script tab is active (not Add audio).

    4

    Write your script

    Type what the character says in the text field. You can include emotion tags in square brackets to control how the character delivers the line.

    5

    Select a voice

    Click Select a voice to open the voice library. Browse All voices or My voices, preview each one, and pick the voice that fits your character.

    6

    Generate

    Click Generate. The AI produces a lip-synced video of your character speaking the script.

    Create a speaking clip from your own audio

    1

    Open Speak

    Go to freepik.com/pikaso/video-speak.

    2

    Upload your content

    Add a start image or a video as the base.

    3

    Select the Add audio tab

    Switch to the Add audio tab.

    4

    Choose your audio

    Browse your audio files from Favorites, History, Uploads, Downloads, or Personal project. Select the recording you want to use.

    5

    Generate

    Click Generate. The AI syncs the character's lip movements to your uploaded audio.

    Choosing a voice

    The voice library gives you two sections:

    • All voices. A library of AI voices with different names, languages, and accents. Each voice has a preview so you can listen before selecting.
    • My voices. Voices you have saved or created previously.

    Each voice shows the name and language/accent (for example, Alexander Davis, English - American, or Arthur Kensington, English - British). Click the play button next to any voice to hear a sample before committing.

    Writing a good script

    The script field accepts plain text with optional emotion tags. Write naturally, as if the character were speaking out loud.

    Emotion tags

    Add emotion tags in square brackets before the text to control the delivery tone. For example:

    • [angry] I told them this was a mistake... but nobody listened.
    • [happy] This is the best news I have heard all week!
    • [sad] I really thought things would be different this time.

    Emotion tags influence how the AI voice delivers the line: pacing, intonation, and emphasis change based on the tag you choose.

    Script tips

    • Write in a conversational tone. The AI sounds more natural with everyday language.
    • Keep sentences short to medium length. Very long sentences can sound flat.
    • Use punctuation to guide pacing. Periods create pauses. Ellipses (...) create longer pauses.
    • Test different voices with the same script. The same words can sound very different depending on the voice.

    Tips and best practices

    Use a clear, front-facing image. The AI needs to see the character's face clearly. Images with the face partially hidden, turned away, or at extreme angles produce weaker results.

    Good lighting matters. Well-lit faces with visible features give the AI more to work with. Dark or heavily shadowed images make lip sync less accurate.

    Try emotion tags to add personality. A script without emotion tags sounds neutral. Adding [angry], [happy], or [sad] before key lines makes the delivery feel more human.

    Preview voices before generating. Every voice sounds different. Spend a moment listening to samples in the voice library to find the right fit for your character.

    Upload your own audio for maximum control. If the AI voices do not match what you need, record your own audio and use the Add audio tab. This also works for languages or accents not covered by the voice library.

    Start with an image, then try video. Images are simpler to work with and produce consistent results. Once you are comfortable, try using a video as the base for more dynamic output.

    Speak vs Lip Sync in the Clip Editor. Speak is a standalone tool for creating speaking clips from scratch. The Lip Sync feature in the Clip Editor works on videos you have already generated and is part of the post-production workflow. Both complement each other.

    Can't find an answer to your question?

    Our support team is here to help you with any questions or issues.

    Submit a request