X-Portrait: Turning static Images Into Expressive Videos with Facial Animation

Goku AI Image

In this article, I’m going to walk you through one of the most fascinating projects I’ve seen recently—X-Portrait 2 by ByteDance.

If you're familiar with the idea of creating animations or expressive videos using just a photo. This tool is capable of taking a single portrait image and transforming it into a video animation filled with natural facial movements, emotions, and gestures—all driven by a reference video.

Here we'll explore what X-Portrait 2 does, how it works, and why it could be a major step forward for creators, developers, and anyone interested in animation.

What Is X-Portrait 2?

X-Portrait 2 is an AI-based model developed by ByteDance. Its main function is to convert static images into animated videos by copying facial expressions and motions from a video clip. That means:

  • You provide a photo (a single frame)
  • You provide a short video (used to guide the animation)
  • The model combines the two to produce a new video, where the photo appears to mimic all the expressions and gestures from the original video
Goku AI Image

This system supports realistic human portraits, anime characters, and even painted or drawn artwork. What makes it stand out is how well it handles complex movements and micro-expressions, which are often tricky to get right.

Why X-Portrait 2 Is Unique?

There are already several animation tools out there, but this one caught my eye because of the quality and precision in its output.

  • Single video input: No need for multiple camera angles or extensive data. One photo + one video = complete animation.
  • Supports realistic and animated styles: Whether it’s a selfie or a cartoon image, X-Portrait 2 works with both.
  • Facial detail accuracy: Expressions like blinking, pouting, eye rolling, tongue movements, and even tilted head gestures are handled really well.
  • Supports fast motion: Most models struggle with fast or sudden movements, but X-Portrait 2 performs really well even in those scenarios.

Key Features of X-Portrait 2

  • Expression Transfer with High Detail – The model is able to transfer facial movements—including subtle and fast-changing ones—with surprising accuracy. From happy smiles to angry frowns, from blinking to tongue gestures, it all comes through smoothly in the output video.
  • Works With Multiple Image Types – This includes realistic photos, anime portraits, and painted or illustrated faces. It doesn't matter whether the input is stylized or photorealistic—the animation output still looks expressive and consistent.
  • Handles Head Movements – One thing that impressed me was how well it could handle head tilts, nods, and shifts in direction. Most tools distort the image when head movement is involved, but X-Portrait 2 keeps things smooth and believable.

Visual Examples and Demonstrations

Let me walk you through how it looks when put to work. In one example I saw:

  • A static portrait image was paired with a short reference video.
  • The reference video was shown in the corner, and right next to it was the animated result.
  • What happened was amazing—the image replicated the exact expressions from the video, including a tilted face, blinks, and even slight eye shifts.

Here’s a brief summary of a few examples:

Reference VideoInput ImageAnimated Output
Woman smiling with tilted headStatic cartoon imageCartoon smiling with same tilt
Man frowning and blinkingSelfie photoPhoto shows identical expressions
Anime character talkingAnime still imageCharacter mouth moves in sync

Works With Cartoons and Anime Too

This was another big surprise for me.

X-Portrait 2 isn’t just limited to similar images. It also works great with cartoons and anime.

Here’s how that looked:

  • I provided a simple anime image.
  • I paired it with a short video of someone talking.
  • The result was the anime character talking and blinking just like the person in the video.

This opens up exciting use cases for anime creators, storytellers, and even meme makers who want to add expressive animations to still images.

Step-by-Step: How X-Portrait 2 Works

Let’s walk through how the system works in simple steps:

Step 1: Choose a Portrait Image

Start with a static image—this could be a photo of yourself, a friend, or even an illustration.

Step 2: Provide a Reference Video

Use any short video clip that includes the facial movements or expressions you want the image to replicate.

Step 3: Feed Both Into the Model

The system analyzes both inputs:

  • It extracts facial landmarks and motion data from the video.
  • It maps those movements onto the image.

Step 4: Generate the Animated Output

The result is a video animation that makes the portrait image move, blink, talk, or express whatever the person in the video was doing.

Real-World Applications

There are so many possible uses for this tool. Some of the ones I’m most excited about include:

  • Social media reels: Imagine creating expressive content just from your photos.
  • Virtual agents: Customer support or chatbot avatars that appear to speak naturally.
  • Movie production: Animate still characters for visual effects.
  • Game development: Add personality to character portraits without full motion capture.

How It Compares to Others

Right now, Runway is a known player in this space, offering similar features. However, based on my observations, X-Portrait 2 has better animation quality, especially when it comes to:

  • Fast motions
  • Small facial changes
  • Expression accuracy

Another advantage is that it works equally well on realistic photos and stylized illustrations, which isn’t always the case with other tools.

Limitations and Concerns

As much as I loved trying it out, there are a few downsides that I noticed:

  • Not open-source: Unfortunately, the code and model weights haven’t been released.
  • No public checkpoints: I searched on Hugging Face, GitHub, and even a few Chinese platforms, but couldn’t find anything to download.
  • Platform-locked (for now): It’s not clear if or when the tool will be made available to the public or integrated into other apps.

Possible Future Integrations

I personally believe that X-Portrait 2 has strong potential to be integrated directly into platforms like TikTok. If that happens, users could animate their selfies using in-app tools—no need for third-party software or editing skills.

And if it becomes available freely or at low cost, it could offer creators a powerful way to produce unique, expressive videos using nothing more than a photo.

Final Thoughts

To sum up my experience with X-Portrait 2:

  • It takes one image and one video, and turns them into realistic animated facial expressions.
  • It works on photos, anime, and artwork.
  • It handles fine-grained facial details better than most models I’ve seen.
  • It could be used across entertainment, games, social media, and more.