Speak

Turn images and videos into speaking clips with AI. Write a script, choose a voice, and generate lip-synced video in seconds.

Speak takes a still image or a video and makes the character in it talk. You provide the words (by writing a script or uploading audio) and the AI handles lip sync, facial movement, and timing. Open Speak to get started.

How it works

Speak combines two inputs: a visual base (an image or a video with a character in it) and audio (either generated from a script or uploaded by you). The AI analyzes the character in the visual, generates or receives the audio, and produces a video where the character speaks with synchronized lip movement.

There are two ways to provide the audio:

Mode	How it works	Best for
Script	You write what the character says. Speak generates the voice using AI text-to-speech.	Quick creation, testing ideas, characters that need a specific AI voice.
Add audio	You upload your own audio recording.	Using your real voice, professional voiceovers, or audio from other sources.

Create a speaking clip from a script

Open Speak

Go to freepik.com/pikaso/video-speak.

Upload your content

Add a start image or a video as the base for your speaking clip. The image or video should contain a visible character with a clear face.

Select the Script tab

Make sure the Script tab is active (not Add audio).

Write your script

Type what the character says in the text field. You can include emotion tags in square brackets to control how the character delivers the line.

Select a voice

Click Select a voice to open the voice library. Browse All voices or My voices, preview each one, and pick the voice that fits your character.

Generate

Click Generate. The AI produces a lip-synced video of your character speaking the script.

Create a speaking clip from your own audio

Open Speak

Go to freepik.com/pikaso/video-speak.

Upload your content

Add a start image or a video as the base.

Select the Add audio tab

Switch to the Add audio tab.

Choose your audio

Browse your audio files from Favorites, History, Uploads, Downloads, or Personal project. Select the recording you want to use.

Generate

Click Generate. The AI syncs the character's lip movements to your uploaded audio.

Choosing a voice

The voice library gives you two sections:

All voices. A library of AI voices with different names, languages, and accents. Each voice has a preview so you can listen before selecting.
My voices. Voices you have saved or created previously.

Each voice shows the name and language/accent (for example, Alexander Davis, English - American, or Arthur Kensington, English - British). Click the play button next to any voice to hear a sample before committing.

Writing a good script

The script field accepts plain text with optional emotion tags. Write naturally, as if the character were speaking out loud.

Emotion tags

Add emotion tags in square brackets before the text to control the delivery tone. For example:

[angry] I told them this was a mistake... but nobody listened.
[happy] This is the best news I have heard all week!
[sad] I really thought things would be different this time.

Emotion tags influence how the AI voice delivers the line: pacing, intonation, and emphasis change based on the tag you choose.

Script tips

Write in a conversational tone. The AI sounds more natural with everyday language.
Keep sentences short to medium length. Very long sentences can sound flat.
Use punctuation to guide pacing. Periods create pauses. Ellipses (...) create longer pauses.
Test different voices with the same script. The same words can sound very different depending on the voice.

Tips and best practices

Use a clear, front-facing image. The AI needs to see the character's face clearly. Images with the face partially hidden, turned away, or at extreme angles produce weaker results.

Good lighting matters. Well-lit faces with visible features give the AI more to work with. Dark or heavily shadowed images make lip sync less accurate.

Try emotion tags to add personality. A script without emotion tags sounds neutral. Adding [angry], [happy], or [sad] before key lines makes the delivery feel more human.

Preview voices before generating. Every voice sounds different. Spend a moment listening to samples in the voice library to find the right fit for your character.

Upload your own audio for maximum control. If the AI voices do not match what you need, record your own audio and use the Add audio tab. This also works for languages or accents not covered by the voice library.

Start with an image, then try video. Images are simpler to work with and produce consistent results. Once you are comfortable, try using a video as the base for more dynamic output.

Speak vs Lip Sync in the Clip Editor. Speak is a standalone tool for creating speaking clips from scratch. The Lip Sync feature in the Clip Editor works on videos you have already generated and is part of the post-production workflow. Both complement each other.

Can't find an answer to your question?

Our support team is here to help you with any questions or issues.

Submit a request

Stock

Image

Video

Audio

Design