The difference between a prompt that produces raw AI slop and one that produces a directed cinematic frame isn't creativity - it's structure. The AI video community has converged on four distinct prompt formats in 2026, each optimized for different use cases and models.
Here's when to use each one, with real examples from creators whose work has gone viral.
Format 1: Timeline Prompts
Best for: Seedance 2.0, Sora 2, Kling 2.6+ Best use case: Multi-beat sequences, music videos, action scenes, anything with pacing
Timeline prompting is the dominant professional format in 2026. You break the video into timestamp blocks, each scripting a specific moment with its own camera, action, and mood.
Structure
FORMAT: [duration] / [tempo or energy] / [continuity rule]
[0:00-0:03]: Shot description with camera, action, SFX [0:03-0:06]: Next beat [0:06-0:10]: Resolution
Real Example (@aimikoda - "Fashion Sequence")
FORMAT: 15s / 128 BPM / ONE CONTINUOUS SHOT / camera accelerates between poses
0:00-0:01.5: MCU, centered symmetry. Pose 1, she faces forward with one hand touching the KODA headphones. Camera nearly still with a restrained push-in. 85mm, shallow depth. SFX: shutter click, satin whisper.
0:01.5-0:03.0: Pose 2, she turns three-quarter and lifts her chin. Camera accelerates in a descending arc and brakes briefly on the eyes. 50mm to 35mm. SFX: heel shift, cloth whisper, shutter chatter.
Why It Works
Each timestamp block is essentially its own mini-prompt. The model processes them sequentially, maintaining context from previous blocks. This gives you editing-level control over pacing without post-production cuts.
The FORMAT header at the top sets global rules: duration, energy level, and whether cuts are allowed. "ONE CONTINUOUS SHOT" prevents the model from inserting transitions.
Tips
- Keep each block under 40 words
- One camera movement per block
- SFX descriptions set energy level even though they don't generate audio
- Use sub-second timestamps (0:01.5) for precise beat-matching
- The first block establishes the visual rules - spend the most detail there
Format 2: JSON Structured
Best for: Veo 3.1, API workflows, complex multi-parameter scenes Best use case: Reproducible results, iterating on single parameters, pipeline automation
JSON isolates every visual parameter into its own field. Change the lighting without redescribing the subject. Swap the lens without touching the framing.
Structure
`{ "shot_type": "...", "camera_movement": "...", "lens_spec": "...", "lighting": "...", "subject_details": "...", "environment_details": "...", "vfx_elements": "...", "color_palette": "...", "framing": "...", "shutter_speed": "..." }`
Real Example (@sebatheepan - "Solar Sail Gliders")
[ { "shot_type": "Extreme long shot transitioning to a dizzying centrifugal orbital spiral", "camera_movement": "360-degree barrel roll with a snap-focus lock, simulating high-G vibration", "lens_spec": "22mm wide-angle prime, heavy chromatic aberration at the edges, T1.5", "lighting": "Harsh unshielded solar radiation, blinding white-hot rim lighting, deep chiaroscuro voids", "subject_details": "Squadron of chrome-plated solar-sail gliders with iridescent nanotech wings", "environment_details": "The golden rings of a gas giant, saturated with crystalline ice shards", "color_palette": "Molten gold, deep violet, obsidian black, and iridescent teal highlights" } ]
Why It Works
The separation of concerns is the key advantage. When your sci-fi scene's lighting is wrong, you change the lighting field without risking the subject description. This makes iteration 3-5x faster on complex scenes.
JSON also maps directly to API parameters. Studios building automated pipelines (batch processing, A/B testing prompt variations) use JSON because it's programmatically manipulable.
Tips
- Include at least 5-7 fields for meaningful control
- The "lens_spec" field has outsized impact on the final look
- Color palette descriptions actually influence the output significantly
- Don't include fields you don't care about - empty or vague fields dilute the prompt
- Wrap in array brackets [ ] for multi-shot sequences
Format 3: Shot List
Best for: Seedance 2.0, Sora 2, production-minded workflows Best use case: Action sequences, storyboard-to-video, director-style prompting
Shot lists treat the AI like a cinematographer receiving a production brief. Numbered shots with explicit timing, framed as if you're directing a crew.
Structure
Shot 01 (0:00-0:02): Camera position, subject action, key visual detail. Audio description. Shot 02 (0:02-0:04): Transition or cut, new framing, continued action. ... STYLE NOTES: Global rules for the entire sequence.
Real Example (@TheDorBrothers - "Medieval Battlefield")
Shot 01 (0:00-2:00): Camera starts at ankle level. A wounded soldier is dragged across the mud by another - passes directly across frame as a horse charges through, nearly trampling both. Audio: Dragging friction, hooves, strained breathing.
Shot 02 (2:00-3:30): Camera weaves between bodies and running soldiers - a spear thrust passes inches from lens. Audio: Shouts, metal impacts, chaos.
STYLE NOTES: Camera feels dragged and redirected constantly, never choosing its path - only reacting.
Why It Works
The shot-list format maps directly to how films are planned. Directors and cinematographers think in shots - numbered, timed, with specific framing. This format leverages that existing mental model.
The STYLE NOTES section at the bottom is powerful. It sets a behavioral rule for the entire sequence without cluttering individual shot descriptions. "Camera never choosing its path - only reacting" gives Seedance a directive that influences every shot.
Tips
- Number shots sequentially even within a "continuous take" - it helps the model understand progression
- Include audio/SFX descriptions for energy pacing
- End with STYLE NOTES for global behavioral rules
- Keep individual shots to 2-3 sentences maximum
- Describe the camera as a physical entity ("camera gets pulled," "camera stumbles")
Format 4: Natural Language (Paragraph)
Best for: Quick iteration, concepting, Kling 3.0, any model Best use case: Simple single-shot concepts, rapid prototyping, non-technical users
A single paragraph describing the scene. No structure, no timestamps, no JSON. Just language.
Real Example (@LudovicCreator - "Rocket Wingsuit")
A wingsuit flyer launches from a stratospheric balloon above Earth, tiny rocket boosters attached to the suit. At the 2-second mark the boosters ignite and the flyer accelerates through a storm cloud layer. Lightning flashes around the wingsuit as the flyer dives through a canyon of storm clouds. The flyer pulls out inches above a desert highway before gliding into a narrow canyon. Rocket wingsuit dive, lightning storm flythrough, extreme speed descent, cinematic aerial motion, 4K.
Why It Works
Speed. You can write a natural language prompt in 30 seconds. For rapid concepting - generating 20-30 variations to find a direction - paragraph prompts are the most efficient format.
The trailing keywords ("rocket wingsuit dive, lightning storm flythrough, extreme speed descent, cinematic aerial motion, 4K") act as style tags that reinforce the core visual identity. Most successful natural language prompts end with a keyword summary.
Tips
- Front-load the most important visual element
- Include temporal markers ("at the 2-second mark") for basic pacing control
- End with comma-separated style keywords
- Keep it under 100 words - the model loses focus beyond that
- Use this format for first drafts, then upgrade to timeline format for the final version
Which Format for Which Model
| Model | Best Format | Why |
|---|---|---|
| Seedance 2.0 | Timeline or Shot List | Handles temporal scripting natively, 6-step formula built in |
| Kling 3.0 | Timeline with beat markers | Audio-visual sync requires precise timestamp control |
| Veo 3.1 | JSON Structured | Reference ingredient system maps to JSON fields |
| Runway Gen-4.5 | Natural Language + Motion Brush | Physics descriptions ("momentum causes fishtail") work well |
| Sora 2 | Timeline or Shot List | Causal physics ("due to hydroplaning...") in structured format |
| Grok Imagine | Natural Language | Speed-focused, simple prompts iterate fastest |
The Meta-Prompting Workflow
An emerging technique: use an LLM to generate your video prompts. Feed it your concept, shot references, and target model, and have it output a properly structured prompt in the right format.
This works particularly well for: - Converting storyboards to timeline prompts - Generating A/B test variations of the same scene - Translating a simple concept into model-specific structured format - Batch-generating 10-20 prompt variations for rapid prototyping
The creators producing the most consistent output in 2026 aren't writing every prompt from scratch. They have template libraries, format converters, and LLM-assisted prompt generation in their workflow.
The Core Principle
Structure isn't a creative constraint - it's creative leverage. The prompt format you choose determines how much control you have over the final output. Match the format to your goal:
Exploring an idea: Natural language. Write fast, generate fast, iterate fast.
Producing a deliverable: Timeline or shot list. Script the sequence, control the pacing, direct the camera.
Building a pipeline: JSON. Isolate parameters, automate variations, scale production.
The tools are available to everyone. The prompt structure is what separates a raw generation from a directed frame.
A note on reference images
The most underrated factor in prompt engineering is what you feed the model alongside the prompt. A perfect text prompt with no reference images will lose every time to a mediocre prompt with strong reference conditioning.
This is why image-to-video pipelines have largely replaced pure text-to-video for serious work. If you want a faster way to build reference image libraries to feed your prompts, Freepik consolidates 39+ image models, 250M stock assets, and direct image-to-video handoff into Kling O1 in one platform - it removes the friction of generating references in one tool, downloading, then re-uploading to your video model. For deeper coverage of how to structure that workflow, see our image-to-video pipeline guide.
Common prompt mistakes that kill output quality
After reviewing hundreds of prompts from working studios over the last year, five mistakes show up repeatedly and each one cuts output quality in half.
Mistake 1: describing the subject instead of the motion. When you feed a reference image into a video model, it can already see the subject. Adding "a young woman with brown hair in a red jacket standing in a forest" on top of a reference image of exactly that is wasted tokens. Use your prompt budget to describe what should happen over time: the subtle tilt of the head, the wind catching the jacket, the camera tracking sideways. Describe change, not state.
Mistake 2: over-stacking adjectives. "Cinematic beautiful stunning professional gorgeous masterpiece golden hour magical ethereal" is not a prompt, it is a pile of nouns the model has to reconcile. Strong prompts pick 2-3 specific modifiers that describe a concrete visual quality - "low-angle, golden hour, 35mm film grain" is more useful than 15 generic adjectives.
Mistake 3: inconsistent grammar between shots. If your timeline prompt says "0-2s: hero walks forward" in one block and "2-4s: the camera follows as she turns" in the next, you have switched subject focus mid-sequence. Models handle this badly. Keep the grammatical subject consistent across blocks within a single generation.
Mistake 4: contradictory camera and motion instructions. "Camera pushes in slowly while tracking rapidly right" is a contradiction the model cannot resolve cleanly. Pick one dominant camera move per block. If you need both, split into two generations and cut them together in post.
Mistake 5: ignoring aspect ratio in the prompt itself. Models interpret framing differently for 16:9 vs 9:16 vs 1:1. A prompt that works for landscape will often fail in vertical because the model reframes the composition badly. Specify the aspect ratio and framing in the prompt when producing for social formats: "vertical 9:16, subject centred, shoulders-up framing."
Advanced pattern: the dual-pass workflow
The single highest-leverage technique in 2026 is the dual-pass workflow. Instead of trying to get a finished shot in one generation, you split the work into two passes and use a different prompt structure for each.
Pass 1 - composition. Use a natural language prompt to establish composition, subject, and mood. Do not worry about camera move or exact motion. Run 10-20 generations, pick the one with the best composition, export the best frame as a still image.
Pass 2 - motion. Take the still frame from pass 1 and feed it as a reference image into a fresh generation, now using a timeline or JSON prompt that describes only the motion. Because the composition is locked, the model's full budget is spent on the motion work. This produces noticeably more controlled and cinematic results than a single-pass text-to-video.
Most professional studios use this dual-pass approach for hero shots. The total credit cost is 2-3x a single generation but the output quality is closer to 10x. For social content where quality matters less than volume, single-pass is fine. For brand hero work, dual-pass is the default.
Prompt libraries worth stealing from
Three prompt libraries consistently produce better output than what most creators write from scratch:
The studio-shared libraries on StudioList. Several studios listed on StudioList publish their prompt patterns publicly as part of their case studies. Browse a few studio profiles and copy the prompt structures that produce work you like. These are battle-tested on real client projects.
The community libraries on Civitai and CivitAI's video section. Heavily biased toward open-source workflows (ComfyUI, Wan 2.7, LTX) but the timeline patterns for narrative work are transferable to commercial models.
The internal references in Kling and Runway's own example galleries. Both platforms publish prompt examples alongside their output reels. These are curated to highlight the model's strengths, but they are also a crash course in how the respective teams think prompts should be structured for their tools.
Build your own prompt library as you work. Every successful generation gets saved with the exact prompt, the reference images used, and a note on why it worked. Over six months this library becomes more valuable than any tool subscription.