What is DreamO: The Ultimate Unified Framework for Image Customization

DreamO is a single system that lets you edit and create images with many types of controls at once. You can guide it with a face, a full character, clothing items, or a style, and mix these inputs to get the result you want.

DreamO: The Ultimate Unified Framework for Seamless Image Customization

It runs on your own computer, supports consumer GPUs, and also has a simple app you can start with one command. The team built special tricks to keep faces, outfits, and styles consistent in the final image.

Tip: For background on the team and related work, see our short profile of Bytedance.

DreamO: The Ultimate Unified Framework for Image Customization Overview

Here’s a quick look at what DreamO offers and how you can use it.

teaser

Item	Details
Type	Open-source image customization framework (Diffusion + Transformer)
Creator	ByteDance (official implementation)
Latest version	v1.1 (improved quality, fewer body errors, better look)
Main purpose	Create and edit images with identity, subject, clothing, and style inputs
Key features	Multi-condition control, feature routing, placeholder tokens, 3-stage training, Turbo LoRA option
Supported tasks	IP (character/object), ID (face), Try-On (tops/bottoms/glasses/hats), Style, Multi Condition
Demo options	Local Gradio app, ComfyUI node, HuggingFace demo
Devices	NVIDIA GPUs (8GB, 16GB, 24GB with quantization and offload), Apple Silicon (M1–M4) via MPS, CPU support
Quantization	int8 (optimum-quanto) and Nunchaku
Setup time	Minutes (conda + pip)
Quick start	python app.py
Project page	mc-e.github.io/project/DreamO
Repo	github.com/bytedance/DreamO

Read more hands-on notes here: Dreamo quick notes.

DreamO: The Ultimate Unified Framework for Image Customization Key Features

Capablities

One tool for many inputs. Guide the model with a face, a full character, clothing pieces, or a reference style.
Mix inputs. Combine ID, IP, and Try-On to get more control in one run.
Strong face and character match. It keeps key traits of a person or character.
Try-On with more than one garment. Add tops, bottoms, glasses, and hats together.
Placeholder positions. You can tie a condition to a spot in the image prompt so the right thing shows up in the right place.
Feature routing. The model “picks” the right details from each reference so they do not clash.
Three-stage training. Starts simple, scales up, then fixes quality, which helps keep results balanced.
Turbo LoRA option. By default it uses a fast LoRA to cut steps from 25+ to 12 for quicker results.

Note: For a simple overview of our AI tools hub, check Omnihuman 1.Com.

DreamO: The Ultimate Unified Framework for Image Customization Use Cases

IP (Character/Object)

Give a full-body character, object, or animal as a guide. The system encodes features to keep identity details clear.

ID (Face)

Focus only on the face. This task targets facial traits without clothing. If the face looks too glossy, lower the guidance scale.

Try-On (Clothes and Accessories)

Feed tops, bottoms, glasses, and hats to test outfits. Even with no multi-garment pairs in training, it still handles many pieces together well.

Style

Give a style image to affect the look. In the current release, style is less stable and cannot be mixed with other conditions yet.

Multi Condition

Mix ID, IP, and Try-On in one shot. The feature routing design helps reduce conflicts between inputs.

Performance & Showcases

More Results

Showcase 1 — Recently, extensive research on image customization (e.g., identity, subject, style, background, etc.) demonstrates strong customization capabilities. This clip shows how one model can support many types of input in a single workflow.

How DreamO Works

DreamO takes in your prompt plus one or more reference images. It then runs a diffusion transformer that reads all inputs in a uniform way.

A “feature routing” rule guides which parts of the reference are used for the face, outfit, or style. A “placeholder” trick lets you tie a condition to a specific spot in the prompt so the right thing appears in the right place.

Overview of our proposed DreamO, which can uniformly handle commonly used consistency-aware generation control in a DiT framework.

The team trained the model in three stages. First on simple tasks to lock core consistency, then on a large mixed set to grow skills, and finally a quality pass to fix bias from lower-quality data.

Installation & Setup

Note for v1.1: In order to use Nunchaku for model quantization, we have updated the diffusers version to 0.33.1. If you have the older version 0.31.0 installed, please update diffusers; otherwise, the code will throw errors.

Copy these steps exactly.

# clone DreamO repo
git clone https://github.com/bytedance/DreamO.git
cd DreamO
# create conda env
conda create --name dreamo python=3.10
# activate env
conda activate dreamo
# install dependent packages
pip install -r requirements.txt

(optional) Nunchaku: If you want to use Nunchaku for model quantization, please refer to the original repo for installation guide.

Quick Inference — Local Gradio Demo

Start the app:

python app.py

Available options:

options:
 --version {v1.1,v1} default will use the latest v1.1 model, you can also switch back to v1
 --offload Enable 'quant=nunchaku' and 'offload' to reduce the original 24GB VRAM to 6.5GB.
 --no_turbo Use turbo to reduce the original 25 steps to 12 steps.
 --quant {none,int8,nunchaku}
 Quantize to use: none(bf16), int8, nunchaku
 --device DEVICE Device to use: auto, cuda, mps, or cpu

We observe strong compatibility between DreamO and the accelerated FLUX LoRA variant (FLUX-turbo), and thus enable Turbo LoRA by default, reducing inference to 12 steps (vs. 25+ by default). Turbo can be disabled via --no_turbo, though our evaluation shows mixed results; we therefore recommend keeping Turbo enabled.

tips: If you observe limb distortion or poor text generation, try increasing the guidance scale; if the image appears overly glossy or over-saturated, consider lowering the guidance scale.

For consumer-grade GPUs

Currently, the code supports two quantization schemes: int8 from optimum-quanto and Nunchaku. You can choose either one based on your needs and the actual results.

For 8GB GPUs:

python app.py --nunchaku --offload

For 24GB GPUs:

python app.py --quant int8

python app.py --quant nunchaku

For 16GB GPUs:

python app.py --int8 --offload

Note: Offload cuts memory but also slows down speed.

For macOS Apple Silicon (M1/M2/M3/M4)

DreamO supports Apple Silicon via MPS. The app will pick MPS by default when available.

Run the app:

python app.py

Pick a device:

python app.py --device mps

python app.py --device cpu

Combine MPS with quantization on low-memory devices:

python app.py --device mps --int8

Make sure your PyTorch build supports MPS (requirements include PyTorch 2.6.0+).

The Technology Behind DreamO

Uniform input handling: The diffusion transformer lets the model read very different inputs in a single flow.
Feature routing: It directs the right bits of info from each reference so they do not interfere with each other.
Placeholder mapping: Each condition can be tied to a spot in the prompt, helping with layout.
Training stages: Start simple for stability, scale up for skills, then adjust for quality.

What’s New in v1.1

Better overall image quality.
Fewer body composition mistakes.
Nicer look and color control.
Support for consumer GPUs (16GB and 24GB) with quantization and offload choices.
Native ComfyUI node and macOS MPS support.
Nunchaku quantization support and diffusers 0.33.1 note.

If you want a tiny, fast summary of the project and updates, check our short guide: Dreamo quick notes.

Getting Results in ComfyUI and Online Demo

ComfyUI: Use the native node “ComfyUI-DreamO” to build a node graph with your inputs.
Online: Try the HuggingFace demo in your browser.

For more background on our AI coverage and related tools, visit our main hub.

FAQ

Can I mix face identity with clothing items?

Yes. Use ID for face and Try-On for tops, bottoms, glasses, and hats in the same run. The feature routing design helps keep each part clean.

What if the face looks too glossy or colors look too strong?

Lower the guidance scale. If limbs look warped or text is weak, raise the guidance scale.

Does it run on an 8GB GPU?

Yes, with Nunchaku quantization and offload. Use the exact command shown above to fit memory.

Can I use styles with other conditions right now?

Not yet. Style is a solo input in the current version, and mixing will come later.

Is there a quick way to try it without coding?

Yes. Run the local Gradio app with one command or use the HuggingFace demo. ComfyUI is also ready if you prefer node graphs.

Read More: Bytedance

Image source: DreamO: The Ultimate Unified Framework for Image Customization