Speak
Turn images and videos into speaking clips with AI. Write a script, choose a voice, and generate lip-synced video in seconds.
Speak takes a still image or a video and makes the character in it talk. You provide the words (by writing a script or uploading audio) and the AI handles lip sync, facial movement, and timing. Open Speak to get started.
In this article
- How it works
- Create a speaking clip from a script
- Create a speaking clip from your own audio
- Choosing a voice
- Writing a good script
- Tips and best practices
How it works
Speak combines two inputs: a visual base (an image or a video with a character in it) and audio (either generated from a script or uploaded by you). The AI analyzes the character in the visual, generates or receives the audio, and produces a video where the character speaks with synchronized lip movement.
There are two ways to provide the audio:
| Mode | How it works | Best for |
|---|---|---|
| Script | You write what the character says. Speak generates the voice using AI text-to-speech. | Quick creation, testing ideas, characters that need a specific AI voice. |
| Add audio | You upload your own audio recording. | Using your real voice, professional voiceovers, or audio from other sources. |
Create a speaking clip from a script
Open Speak
Go to freepik.com/pikaso/video-speak.
Upload your content
Add a start image or a video as the base for your speaking clip. The image or video should contain a visible character with a clear face.
Select the Script tab
Make sure the Script tab is active (not Add audio).
Write your script
Type what the character says in the text field. You can include emotion tags in square brackets to control how the character delivers the line.
Select a voice
Click Select a voice to open the voice library. Browse All voices or My voices, preview each one, and pick the voice that fits your character.
Generate
Click Generate. The AI produces a lip-synced video of your character speaking the script.
Create a speaking clip from your own audio
Open Speak
Go to freepik.com/pikaso/video-speak.
Upload your content
Add a start image or a video as the base.
Select the Add audio tab
Switch to the Add audio tab.
Choose your audio
Browse your audio files from Favorites, History, Uploads, Downloads, or Personal project. Select the recording you want to use.
Generate
Click Generate. The AI syncs the character's lip movements to your uploaded audio.
Choosing a voice
The voice library gives you two sections:
- All voices. A library of AI voices with different names, languages, and accents. Each voice has a preview so you can listen before selecting.
- My voices. Voices you have saved or created previously.
Each voice shows the name and language/accent (for example, Alexander Davis, English - American, or Arthur Kensington, English - British). Click the play button next to any voice to hear a sample before committing.
Writing a good script
The script field accepts plain text with optional emotion tags. Write naturally, as if the character were speaking out loud.
Emotion tags
Add emotion tags in square brackets before the text to control the delivery tone. For example:
- [angry] I told them this was a mistake... but nobody listened.
- [happy] This is the best news I have heard all week!
- [sad] I really thought things would be different this time.
Emotion tags influence how the AI voice delivers the line: pacing, intonation, and emphasis change based on the tag you choose.
Script tips
- Write in a conversational tone. The AI sounds more natural with everyday language.
- Keep sentences short to medium length. Very long sentences can sound flat.
- Use punctuation to guide pacing. Periods create pauses. Ellipses (...) create longer pauses.
- Test different voices with the same script. The same words can sound very different depending on the voice.
Tips and best practices
Use a clear, front-facing image. The AI needs to see the character's face clearly. Images with the face partially hidden, turned away, or at extreme angles produce weaker results.
Good lighting matters. Well-lit faces with visible features give the AI more to work with. Dark or heavily shadowed images make lip sync less accurate.
Try emotion tags to add personality. A script without emotion tags sounds neutral. Adding [angry], [happy], or [sad] before key lines makes the delivery feel more human.
Preview voices before generating. Every voice sounds different. Spend a moment listening to samples in the voice library to find the right fit for your character.
Upload your own audio for maximum control. If the AI voices do not match what you need, record your own audio and use the Add audio tab. This also works for languages or accents not covered by the voice library.
Start with an image, then try video. Images are simpler to work with and produce consistent results. Once you are comfortable, try using a video as the base for more dynamic output.
Speak vs Lip Sync in the Clip Editor. Speak is a standalone tool for creating speaking clips from scratch. The Lip Sync feature in the Clip Editor works on videos you have already generated and is part of the post-production workflow. Both complement each other.
Can't find an answer to your question?
Our support team is here to help you with any questions or issues.
Submit a request