AI video generation prompting guide for Sora 2 and Higgsfield.ai
This skill auto-activates when you're working with:
For Advanced Topics:
OpenAI Sora 2 creates videos from your text descriptions or images. Think of it like hiring a filmmaker—the clearer your instructions, the better your video turns out.
Higgsfield.ai brings together multiple AI video tools (Sora 2, Google Veo, WAN, Kling, and more) in one place. It gives you easy camera controls and editing tools to make professional-looking videos.
Start with this simple structure:
[Style]. [Subject] [action]. [Camera movement]. [Environment]. [Lighting].
Prompt:
Cinematic style. A golden retriever runs across a sunny meadow.
Camera pans left to right. Bright afternoon sunlight, green grass, blue sky.
Why this works:
How to adapt it:
Prompt:
1970s film aesthetic. A chef in white apron dices onions in three quick cuts,
scrapes them into bowl. Medium shot from 45-degree angle. Restaurant kitchen
with copper pots on back wall. Warm overhead pendant lights.
Colors: brass, cream, sage green.
Why this works:
How to adapt it:
Prompt:
Archival documentary, 16mm grain. Woman in red coat stands at rain-soaked
window, looking out. Medium close-up, eye level. Soft window light from left.
Colors: crimson, navy, amber, steel gray.
Dialogue:
- Woman: "It's still raining..."
Why this works:
How to adapt it:
"Clarity wins."
Replace vague words with specific visual details.
Instead of "beautiful" → say "wet asphalt with neon reflections"
Instead of "moves quickly" → say "pedals three times, brakes, stops"
| ❌ Weak | ✅ Strong |
|---|---|
| "A beautiful street at night" | "Wet asphalt, zebra crosswalk, neon signs reflecting in puddles" |
| "Person moves quickly" | "Cyclist pedals three times, brakes, stops at crosswalk" |
| "Cinematic look" | "Anamorphic 2.0x lens, shallow depth of field, volumetric light" |
| "Nice lighting" | "Soft window light with warm lamp fill, cool rim from hallway" |
These five principles form the foundation of effective video generation prompting.
Style is your most powerful tool. Start with the look you want—it guides all the other visual choices the AI makes.
Why this works: The AI has seen thousands of 1970s films, so when you say "1970s film," it knows to add grain, warm colors, and that era's camera style.
Quick tip: Always start your prompt with the style or look you want.
Movement is hard for AI—keep it simple.
Each shot should have:
Count your actions to give the AI a sense of timing.
| ❌ Weak | ✅ Strong |
|---|---|
| "Actor walks across room" | "Actor takes four steps to window, pauses, pulls curtain in final second" |
| "Car drives fast" | "Car accelerates in three seconds, reaches 60mph, tires screech" |
| "Bird flies" | "Hawk dives downward for two seconds, spreads wings, glides" |
Why this works: "Actor walks" doesn't tell the AI how long to walk or when to stop. "Takes four steps, pauses two seconds" gives clear timing.
Quick tip: Count things. Use words like "quick," "slow," "gradual," or "in final second."
Lighting sets the mood. If you don't specify lights and colors, the AI will pick random ones that might change throughout your video.
Describe:
Example:
Soft window light from left with warm desk lamp fill from right,
cool backlight creating rim on shoulders
Choose 3-5 exact colors to keep your video consistent:
Why this works: Naming exact colors keeps the AI from randomly changing colors between frames.
Quick tip: Say "burnt sienna" or "slate blue" instead of "earthy tones" or "cool colors."
Important: Keep dialogue separate from your scene description.
[Visual description of shot and environment]
Dialogue:
- Character A: "Short, natural line here"
- Character B: "Brief response"
✅ DO:
❌ DON'T:
Why this works: When you mix dialogue with scene description ("She says 'hello' while waving"), the AI might not know this needs lip-sync. A separate section makes it clear this is spoken dialogue.
Starting with an image helps keep your videos consistent.
Example workflow:
Why this works: Converting words to images can vary a lot. Starting with an actual image gives the AI a clear starting point.
Quick tip: Make reference images of your characters or locations when creating multiple related videos.
Prompt:
Cinematic ad. iPhone 15 Pro rotating slowly on marble pedestal.
Dolly zoom in. Minimalist studio, soft shadows.
Lighting: rim light with cool backlight. Colors: titanium, deep blue, white.
Breakdown:
Why it works: Simple, focused, one subject doing one thing. All elements clearly specified.
How to adapt:
Prompt:
Documentary style. A deer walks through morning mist in forest clearing.
Camera static, eye level. Dappled golden hour sunlight through trees.
Colors: forest green, gold, earth brown, soft white mist.
Breakdown:
Why it works: No camera movement makes it easier. One animal doing one thing. Natural scenes work well for AI.
How to adapt:
Prompt:
Handheld camera, authentic lighting. Young woman in yellow raincoat
walks forward four steps, stops, looks up at sky.
Residential street, autumn leaves on sidewalk. Overcast daylight.
Colors: yellow, burgundy leaves, gray pavement, muted blue sky.
Breakdown:
Why it works: Counting the steps gives clear timing. Simple actions in order (walk, stop, look). Everyday setting the AI knows well.
How to adapt:
Copy these templates and fill in the brackets with your specific details.
Use for: Product videos, commercial content, social media ads
[Style preset: Cinematic/Archival/Modern]. [Product name] [key action:
rotate/emerge/unfold]. [Higgsfield camera preset: Dolly In/Crash Zoom/etc.].
[Minimal environment: 2-3 elements]. [Lighting: type + direction].
[Palette: brand colors + 2 supporting colors].
Example:
Cinematic commercial. Wireless headphones rotating slowly on wooden surface.
Dolly zoom in. Minimalist studio with soft shadows, blurred plant background.
Soft overhead lighting with cool rim light from left. Colors: matte black,
walnut wood, sage green, silver accents.
Use for: Instagram Reels, TikTok, authentic creator content
[Handheld/authentic camera style]. [Person description] speaks directly
to camera, [natural gesture/movement]. [Location: home/outdoor/casual setting].
Natural/window light. [Mood: energetic/calm/conversational].
Dialogue:
- [Name]: "[Enthusiastic, brief statement - under 15 words]"
Example:
Handheld camera, natural lighting. Young creator in graphic tee sits on
bedroom floor, gestures excitedly while speaking to camera. Bedroom with
string lights and posters in soft focus background. Window light from right.
Energetic, authentic mood.
Dialogue:
- Creator: "You won't believe what happened today!"
Use for: Setting scenes, mood creation, B-roll
[Wide shot or aerial preset]. [Location with distinctive features].
[Time of day + weather]. [Camera: slow drift/static/orbit].
[Lighting: environmental source]. [Palette: mood-appropriate 3-5 colors].
[Audio: ambient soundscape].
Example:
Aerial wide shot. Coastal lighthouse on rocky cliff with waves crashing below.
Foggy dawn, mist rolling in from ocean. Camera: slow crane up revealing more
coastline. Soft pre-sunrise light, diffused by fog. Colors: slate gray rocks,
white lighthouse, navy ocean, soft pink dawn, white foam.
Audio: Waves crashing, distant foghorn, seabirds calling, wind.
"A beautiful street scene at night with nice lighting"
Why it fails: The AI has to guess everything—time, place, weather, lights, camera position, what happens, and colors. Results will be random.
"Film noir style. Wet cobblestone street with zebra crosswalk, neon bar
signs reflecting in puddles. Static shot, eye level. Midnight, light rain.
Single streetlamp creating pool of warm light, neon signs providing cool
blue-pink accent. Colors: deep blue shadows, warm amber streetlight, neon
pink-blue, wet black pavement."
What changed: Style defined, specific environmental elements, camera position, time/weather, light sources with direction, color palette.
"Person walks across room quickly while waving and talking on phone"
Why it fails: Three things happening at once (walking, waving, talking) confuses the AI. No timing information.
"Woman takes six quick steps across living room, phone to ear with right
hand. Camera pans right following movement, eye level. Mid-sentence, raises
left hand in brief wave gesture toward someone off-camera, then drops hand.
One action beats per 2 seconds."
What changed: Actions broken into steps with counts. Clear timing. Gestures happen one at a time (not all together). Camera separate from action.
Better option: Split into two shots—one for walking/talking, one for waving.
"Interior office scene with manager at desk"
Why it fails: Without lighting info, the AI picks randomly—might be bright office lights, dim mood lighting, or mixed inconsistent sources.
"Interior office scene. Manager at wooden desk with laptop and papers.
Overhead fluorescent tubes providing cool even light, warm desk lamp
adding fill from left side, window behind creating slight backlight.
Late afternoon."
What changed: Three light sources specified (overhead, desk lamp, window) with quality (cool, warm) and direction.
"Cinematic coffee shop scene with warm tones"
Why it fails: "Warm tones" is vague—could be orange, red, yellow, brown, or any combination. Colors may drift frame-to-frame.
"Cinematic coffee shop scene. Colors: rich espresso brown, cream ceramic
mugs, brass espresso machine, warm Edison bulb amber light, dark walnut
furniture."
What changed: Five specific color anchors stabilize palette across generation.
"Woman in cafe says 'I can't believe this happened' while looking worried
and holding coffee cup nervously"
Why it fails: When dialogue is mixed with actions, the AI might treat it as part of the scene instead of spoken words that need lip-sync.
"Woman in cafe holds coffee cup, shifts nervously in seat, speaks with
worried expression. Medium close-up, slight angle from across table.
Cafe background with blurred patrons, warm pendant lights overhead.
Dialogue:
- Woman: "I can't believe this happened."
What changed: Dialogue isolated in separate block. Visual description focuses on actions and camera.
[Style]. [Subject] [action in beats]. [Camera]. [Environment].
[Lighting + direction]. [Colors: 3-5 anchors].
Dialogue:
- Character: "Brief line"
See Progressive Examples for:
See Technical Reference for:
See Advanced Workflows for:
Last updated: January 2025 Based on Sora 2 and Higgsfield.ai current capabilities