Skip to main content
Tools8 min readApril 6, 2026

Wan 2.7: The Open-Source Model That Solved Character Consistency

Alibaba's Wan 2.7 maintains character identity across multiple shots with 150+ reference frames. Here's how to use it for narrative AI video - and why it changes the open-source game.

S

StudioList Editorial

AI Video Research Team

The single biggest problem in AI video production isn't quality - it's consistency. Generate a character in shot one, and by shot three they've changed hair color, lost their jacket, or aged 10 years. Every studio working on narrative content has hit this wall.

Alibaba's Wan 2.7 (released late March 2026) is the first open-source model to seriously address it. Using a Mixture-of-Experts architecture with 14 billion parameters trained on 1.5 billion videos, Wan 2.7 maintains character identity across multiple shots using up to 150+ reference frames.

And it's free.

What Changed from Wan 2.6

Wan 2.6 (December 2025) introduced the reference-to-video (R2V) mode and multi-shot generation. Wan 2.7 builds on that with:

Enhanced motion quality. Smoother, more natural movement - the jitter and float that plagued earlier versions is significantly reduced.

Advanced controls. More precise camera and subject control, bringing it closer to Kling 3.0's professional camera toolset.

9-grid image-to-video. Feed 9 reference images and Wan 2.7 uses them to maintain consistency across the generated sequence. This is the key feature for character work.

Native audio. Synchronized audio generation - Wan 2.7 joins Kling 3.0 and Veo 3.1 in the native audio club. No more post-production audio patching for basic sound.

Character consistency across shots. The headline feature. Using 150+ reference frames, Wan 2.7 preserves facial structure, wardrobe, body proportions, and lighting continuity between separate generations. Not perfect, but dramatically better than anything else in open source.

Why Character Consistency Matters

Most AI video tools generate isolated shots. Each generation starts fresh with no memory of previous outputs. For social clips and single-shot content, this is fine. For anything with narrative - brand campaigns, music videos, short films, explainer content - it's a dealbreaker.

Consider a 60-second brand commercial with 8 shots featuring the same spokesperson. Without character consistency, you're generating each shot independently and hoping the model produces a similar-looking person. The hit rate is low. Studios end up spending 10-20x more on generation, manually curating shots that happen to match.

Wan 2.7's reference system changes the math. Feed it reference frames of your character and it maintains identity across shots. Not perfectly - subtle variations still occur - but consistently enough for production work.

How to Use It

Setup

Wan 2.7 runs through ComfyUI (the dominant open-source workflow platform). Requirements: 24GB+ VRAM GPU. With the new NVFP4 quantization, you can run it on lower-spec hardware at 60% less VRAM usage.

It's also available through cloud APIs: Alibaba Cloud, fal.ai, and Atlas Cloud offer hosted Wan 2.7 access without local hardware requirements.

The Reference Frame Workflow

  • . Generate or select your character key frames. Use Midjourney V8, Luma Uni-1, or any image model to create 3-9 reference images of your character from different angles, expressions, and lighting conditions.

2. Feed references into Wan 2.7. Use the 9-grid input or R2V mode. The more reference frames you provide, the stronger the identity lock.

3. Generate each shot with text prompts. Describe the action, camera, and environment for each shot. Wan 2.7 uses the references to maintain character appearance while following your shot-specific directions.

4. Iterate on weak shots. If a particular shot drifts from the character reference, regenerate with additional reference frames from angles closer to the target shot.

Best Practices

More references = better consistency. 3 references is the minimum for basic identity. 9 references covers most angles and expressions. 50+ references is where professional productions operate.

Front-load the reference work. Spend time getting your reference images right before generating any video. The quality of your references directly determines the consistency of your output.

Keep lighting consistent across references. If your references have wildly different lighting, Wan 2.7 will average them and the result looks flat. Match the lighting across reference frames to the target scene.

Use the same aspect ratio. Reference images and output video should share the same aspect ratio for best results.

How It Compares

FeatureWan 2.7Kling 3.0Seedance 2.0Runway Gen-4.5
Character consistency150+ ref framesLimitedLimitedLimited
Multi-shot planningYes6 connected shotsNoMulti-shot via extension
Max resolution1080p4K @ 60fps1080p2K
Native audioYesYesYesNo
Camera controlsAdvancedProfessionalBasicBasic + Motion Brush
Open sourceYes (free)No ($6.99+/mo)No (~$10/mo)No ($12+/mo)
Local hardware24GB+ VRAMCloud onlyCloud onlyCloud only

Wan 2.7 wins on: Character consistency, open-source access, cost (free locally), multi-shot narrative.

Kling 3.0 wins on: Resolution (4K), professional camera controls, global API, overall polish.

Seedance 2.0 wins on: Visual quality (highest Elo), cinematic aesthetics.

Runway Gen-4.5 wins on: Physics simulation, Motion Brush precision.

The Multi-Model Narrative Workflow

The smart play for narrative content in April 2026:

Character design: Luma Uni-1 or Midjourney V8 for reference images. Spend time here - it determines everything downstream.

Narrative shots with character consistency: Wan 2.7 via ComfyUI. Use the reference system to maintain identity across all speaking/acting shots.

Hero cinematic shots: Seedance 2.0 or Kling 3.0 for the money shots where visual quality matters more than character lock.

Action and physics: Kling 3.0 or Runway Gen-4.5 for sequences requiring realistic motion and interaction.

Avatar and talking head: OmniHuman 1.5 ($0.14/sec) for dialogue-heavy scenes requiring precise lip sync.

Post-production: DaVinci Resolve for color matching across shots from different models. ElevenLabs for dialogue. Sound design and foley for everything else.

What This Means for Studios

Open source just became viable for narrative production work. Studios that previously dismissed open-source video models as "not production ready" need to reassess. Wan 2.7 doesn't replace Kling 3.0 or Seedance 2.0 for hero shots - but for the 70% of shots in a project that need to be good (not spectacular) and consistent, it's a free alternative to $500+/month in API costs.

The studios that integrate Wan 2.7 into their pipeline for consistency-critical work while using premium models for hero frames will produce better work at lower cost than studios using a single model for everything.

Ready to find the right studio for your project?