Executive Summary
This document synthesizes key principles and methodologies for effective video generation using the Sora 2 model, based on the official prompting guide. The central theme is that successful prompting is a creative, iterative process, balancing detailed instruction for control against open-endedness for creative discovery.
Key takeaways include:
- Dual Prompting Philosophies: Detailed, specific prompts provide greater control and consistency, while shorter, more general prompts grant the model creative freedom, often leading to surprising and novel outcomes. The choice depends entirely on the project’s goals.
- API vs. Prose Control: Core video attributes such as model type (
sora-2orsora-2-pro), resolution (size), and duration (seconds) are exclusively controlled via API parameters. Prompt text cannot override these settings; it governs the video’s content, including subject, motion, lighting, and style. - The Power of Specificity: Vague descriptions yield unpredictable results. The guide emphasizes using precise, tangible nouns and verbs to describe action, lighting, and composition. For example, instead of “cinematic look,” specify “Anamorphic 2.0x lens, shallow DOF, volumetric light.”
- Style as a Primary Lever: Establishing a clear aesthetic style (e.g., “1970s film,” “16mm black-and-white film,” “IMAX-scale scene”) is one of the most powerful ways to guide the model’s output, framing all subsequent details.
- Advanced Control Features: The model supports several advanced features for fine-grained control. Image inputs can anchor the composition and style of the first frame. Dialogue should be placed in a dedicated block for clarity. The remix functionality allows for controlled, incremental changes to existing videos.
- Iteration is Essential: The generation process is not exact. The same prompt will produce different results on subsequent runs, which is considered a feature. Users are encouraged to iterate on prompts, making small adjustments to camera, lighting, or action to refine the output.
Core Prompting Principles
The guide frames prompting as an act of briefing a cinematographer. The level of detail provided directly correlates to the level of control over the final output.
- Control vs. Creative Freedom: The guide presents two valid and powerful approaches to prompting:
- Detailed Prompts: Offer high control and consistency, making the model adhere closely to a specific vision. This is ideal for matching storyboards or maintaining continuity.
- Lighter Prompts: Provide the model with creative freedom, allowing it to improvise and generate unexpected, varied interpretations. This is useful for exploration and discovery. The guide states: “detailed prompts give you control and consistency, while lighter prompts open space for creative outcomes.”
- The Iterative Process: Prompting is described as a collaborative process with the model, not an exact science. Small changes to a prompt can dramatically alter the outcome. Generating a prompt multiple times is encouraged, as subsequent results may be superior to the first.
Technical Parameters and Constraints
Certain fundamental attributes of the video are governed strictly by API parameters and cannot be influenced by the descriptive text of the prompt. These parameters define the “container” for the generated video.
| Parameter | Description | Supported Values |
| model | The specific Sora model to use. | sora-2 or sora-2-pro |
| size | The video’s resolution as a {width}x{height} string. | sora-2: 1280x720, 720x1280sora-2-pro: |
| seconds | The desired length of the video clip. The default value is “4”. | "4", "8", "12" |
- Impact of Resolution: Higher resolutions result in greater visual fidelity, with more accurate detail, texture, and lighting transitions. Lower resolutions can introduce softness or artifacts.
- Impact of Duration: The model follows instructions more reliably in shorter clips. For longer sequences, the guide suggests that stitching together multiple 4-second clips may produce better results than generating a single 8-second clip.
Crafting Effective Prompts
A successful prompt clearly describes a shot by defining its visual and narrative components.
Anatomy of a Prompt
A clear prompt structure includes:
- Camera Framing: The type of shot and its angle (e.g.,
wide establishing shot, eye level). - Depth of Field: The focus area (e.g.,
shallow focus). - Action Beats: Specific, sequential actions.
- Lighting and Palette: The quality, direction, and color of the light.
When describing a sequence of shots within a single prompt, each shot should be treated as a distinct creative unit with its own camera setup, subject action, and lighting.
The Importance of Style and Specificity
Style is identified as one of the most powerful levers for guiding the model. Establishing an overall aesthetic early helps ensure consistency. Vague language should be avoided in favor of concrete, descriptive terms that point to visible results.
| Weak Prompt | Strong Prompt |
| “A beautiful street at night” | “Wet asphalt, zebra crosswalk, neon signs reflecting in puddles” |
| “Person moves quickly” | “Cyclist pedals three times, brakes, and stops at crosswalk” |
| “Cinematic look” | “Anamorphic 2.0x lens, shallow DOF, volumetric light” |
Core Elements of Control
- Camera and Framing: Direction and framing dictate the shot’s emotional impact. A wide shot can establish context, while a close-up focuses on emotion.
- Good Framing Examples:
wide shot, tracking left to right with the charge,medium close-up shot, slight angle from behind. - Good Motion Examples:
slowly tilting camera,handheld eng camera.
- Good Framing Examples:
- Motion and Timing: To achieve clear motion, prompts should be simple, with one clear camera move and one clear subject action per shot. Actions are more effective when described in discrete beats or counts.
- Weak:
Actor walks across the room. - Strong:
Actor takes four steps to the window, pauses, and pulls the curtain in the final second.
- Weak:
- Lighting and Color: Lighting is crucial for setting the mood. To ensure consistency across multiple clips for seamless editing, prompts should specify both the quality of light and key color anchors.
- Weak:
brightly lit room - Strong:
soft window light with warm lamp fill, cool rim from hallway Palette anchors: amber, cream, walnut brown
- Weak:
The Ultra-Detailed Prompting Method
For maximum control and cinematic realism, prompts can utilize professional production terminology. This approach involves specifying details such as:
- Format & Look: Shutter angle, film stock emulation, grain, halation.
- Lenses & Filtration: Specific lens focal lengths (e.g., 32mm) and filters (e.g., Black Pro-Mist).
- Grade / Palette: Color treatment for highlights, mids, and blacks.
- Lighting & Atmosphere: Direction and quality of light sources, use of bounce or negative fill, and atmospheric effects like mist.
- Sound: Specifying diegetic sound only and providing details like ambient hum or specific sound effects.
- Optimized Shot List: Breaking down the clip’s duration into timed shots with specific camera moves and narrative purposes.
Advanced Control Mechanisms
Image Input
For precise control over composition and style, an image can be provided as a visual reference. This locks in elements like character design, wardrobe, or set dressing for the first frame of the video, while the text prompt dictates the subsequent action.
- Implementation: Use the
input_referenceparameter in the API request. - Requirements: The input image must match the target video’s resolution (
size). - Supported Formats:
image/jpeg,image/png,image/webp.
Dialogue and Audio
Dialogue should be described directly in the prompt, preferably within a distinct block to separate it from the visual description.
- Best Practices: Keep lines concise and natural to fit the clip’s duration. For multiple characters, label speakers consistently.
- Audio Cues: Even for silent shots, suggesting a small background sound (e.g., “distant traffic hiss”) can provide a rhythm cue for pacing.
Iterating with the Remix Functionality
The remix feature is designed for making controlled, incremental adjustments to a generated video.
- Methodology: Make one change at a time and state the change explicitly in the prompt (e.g., “same shot, switch to 85 mm”).
- Troubleshooting: If a shot consistently fails, the guide recommends stripping it back to a simpler version (e.g., static camera, simple action) and then layering complexity back in step-by-step.
Prompt Templates and Examples
The guide provides a flexible template for structuring descriptive prompts, emphasizing that not all elements are required. Leaving some sections open-ended can encourage more creative outputs from the model.
Descriptive Prompt Template
[Prose scene description in plain language. Describe characters, costumes, scenery, weather and other details. Be as descriptive to generate a video that matches your vision.]
Cinematography: Camera shot: [framing and angle, e.g. wide establishing shot, eye level] Mood: [overall tone, e.g. cinematic and tense, playful and suspenseful, luxurious anticipation]
Actions:
- [Action 1: a clear, specific beat or gesture]
- [Action 2: another distinct beat within the clip]
- [Action 3: another action or dialogue line]
Dialogue: [If the shot has dialogue, add short natural lines here or as part of the actions list. Keep them brief so they match the clip length.]
The provided examples demonstrate how to combine these elements to create rich, detailed scenes, specifying everything from the style (“Hand-painted 2D/3D hybrid animation,” “1970s romantic drama”) to specific camera lenses, lighting setups, character actions, dialogue, and background sounds.


