What is ComfyMind: The Future of General-Purpose Generation via Tree-Based Planning

ComfyMind is a system that plans its steps before it creates. You give it a simple instruction, and it figures out the best path to make images, short videos, edits, or even step-by-step answers.

ComfyMind: The Future of General-Purpose Generation via Tree-Based Planning

It runs on top of ComfyUI, a popular tool for building creative AI workflows. ComfyMind tests different options like a decision tree and uses feedback to improve the result.

ComfyMind: The Future of General-Purpose Generation via Tree-Based Planning Overview

Here is a quick look at what the project offers and how you can use it right now.

Item	Details
Type	Open-source AI system for general-purpose generation
Built On	ComfyUI
Purpose	Unified creation: image generation, video generation, image editing, and reasoning tasks
Planning Style	Tree-based planning with reactive feedback
Interfaces	Command line (Python script) and a Gradio web app
Inputs	Text instructions; optional reference images
Outputs	Images, short videos, edited images, reasoning outputs
Benchmarks	ComfyBench, GenEval, Reason-Edit
Reported Results	Better than many open projects; close to GPT-Image-1
Requirements	Python 3.12, Conda, ComfyUI with needed models and extensions
Demo	Online demo listed in project News; Gradio demo script provided
Evaluation	Ready-to-run scripts; public result sheets linked
Repository	github.com/LitaoGuo/ComfyMind

MY ALT TEXT

Note: For related industry context and research trends, see our short take on Bytedance.

ComfyMind: The Future of General-Purpose Generation via Tree-Based Planning Key Features

Plans before it creates. It builds a tree of options, tests them, and picks the best path.
Works across tasks. It can make new images, edit photos, create short videos, and write reasoned outputs.
Uses reactive feedback. It checks the result at each step and adjusts.
Fits into ComfyUI. You can bring your models and nodes through ComfyUI-Manager and Hugging Face.
Strong results on public tests. It performs well on ComfyBench, GenEval, and Reason-Edit.

ComfyMind: The Future of General-Purpose Generation via Tree-Based Planning Use Cases

Product photos: remove items, swap styles, or expand the shot to fit a layout.
Marketing and ads: turn a base image into a poster-ready version with clean lighting.
Short clips: generate a few seconds of video for social posts or concepts.
Education: explain science or common facts with clear, step-by-step answers.
Creative edits: age a face, replace buildings, or outpaint to wider views.

Performance & Showcases

Showcase 1 — Forest light and calm wildlife Label: Generate a 4 seconds high-quality video of sunlight filters through the forest, deer herd drinks from a stream. This shows control over light, water, and soft motion in nature scenes.

Showcase 2 — A winged figure in the sky Label: Generate a 4 seconds high-quality video of a winged woman hovers in the desolate skies above the wasteland. This highlights texture, atmosphere, and steady framing.

Showcase 3 — Night fire by the sea Label: Generate a 8 seconds high-quality video of a bonfire burning on the seaside. You can see smooth flames, glow, and reflections on the shore.

Showcase 4 — Eggs in the pan Label: Generate a 8 seconds high-quality video of Fried egg sizzle in the skillet. The system handles close-up detail, heat, and fine motion.

Showcase 5 — Over-the-shoulder city scout Label: Generate a 4 seconds high-quality video of a survivor in an exoskeleton scavenges the wastecity, framed by an over-the-shoulder shot. It keeps composition stable while showing action.

Showcase 6 — Pipeline view with seaside fire example Label: Generate a 8 seconds high-quality video of a bonfire burning on the seaside. Use this to understand how a prompt flows through the build steps.

How It Works

You write a short instruction, and you can add one or two reference images if needed.
ComfyMind makes a plan like a small decision tree. It tries different paths, checks the results, and updates the plan.
The system then runs the best path through ComfyUI, using the models and nodes you have set up.

It can create from scratch or edit an input image. For videos, it sets up a short clip with motion and style. For reasoning, it writes clear steps to show how it reached the answer.

The Technology Behind It

ComfyMind is built on ComfyUI. You need to prepare the models and extensions first using ComfyUI-Manager and model hubs.

It supports a Gradio web demo and a command line script. Tests on ComfyBench, GenEval, and Reason-Edit show strong scores, close to GPT-Image-1 on many parts.

If you like short-form video tools, compare this with our note on Goku Video Generation.

Installation & Setup

Follow these steps exactly as provided by the project.

Step-by-step Installation

Clone the repository, create and activate conda environment:

git clone https://github.com/LitaoGuo/ComfyMind.git
cd ComfyMind

conda create -n comfymind python=3.12
conda activate comfymind

Install dependencies:

pip install -r requirements.txt

⚠️ 3. Before using ComfyMind, please prepare your ComfyUI with necessary models and extensions.

ComfyUI Installation
ComfyUI-Manager offers custom nodes management and installation.
Hugging Face offers Models Installation

Configuration

⚠️ Modify config.yaml to set your APIs:

Running ComfyMind

Execute the main script

python main.py \
 --instruction "The generation instruction" \
 --resource1 "<optional>path/to/the/reference1" \
 --resource2 "<optional>path/to/the/reference2"
 --save_path "path/to/save/result"

Gradio Demo

python main_gradio.py

Evaluation The results of ComfyMind on evaluation benchmarks can be viewed via https://drive.google.com/drive/folders/1pR5vCQoo-W0Tr3vodyfQEWrI3om15Dak?usp=drive_link.

To run the evaluation for ComfyMind, you can execute the following commands: python Evaluation/eval_geneval.py , python Evaluation/eval_reason_edit.py

Examples of Image Editing Tasks

Here are examples shared by the project team that show what the system can do with photos:

Remove objects: delete a windmill; delete knife and fork from a dinner photo.
Structural edits: cut a triangular slice from a whole cake while keeping it natural.
Style for ads: turn a cherries photo into an ad-ready version with exhibition stand lighting.
Face edit: age a young man’s photo to an older version while keeping the same identity.
Product mockup: put a pigeon scribble logo on a ceramic cup on an office table.
Outpainting: extend both left and right sides of a New York city shot by 512 pixels with the prompt “A spectacular view of New York City's skyline at dusk.”
Replacement: change a castle to a Chinese traditional temple.

If you enjoy creative tools that turn movement or media into new formats, see how we mapped actions to notes in our short explainer on Dance2Midi.

Tips for Best Results

Write clear, short instructions. Add 1–2 reference images when helpful.
Prepare ComfyUI first with needed nodes and models, then run ComfyMind.
Use the Gradio demo to test ideas fast, and switch to the script for batches.

FAQ

What do I need before running ComfyMind?

You need Python 3.12, Conda, and a working ComfyUI setup with the right models and extensions. Install the Python packages from requirements.txt. Then run the script or the Gradio app.

Can I use my own models with it?

Yes. Install your preferred models in ComfyUI first. ComfyMind will then use what you have set up.

How do I pass reference images?

Use the --resource1 and --resource2 flags in the command. They should point to your image file paths. The web demo lets you upload them directly.

Where can I see test results?

The team shares their scores on ComfyBench, GenEval, and Reason-Edit. The results link is included above under Evaluation.

How do I save outputs?

Use the --save_path argument in the script. In Gradio, you will see a download option after generation.

Image source: ComfyMind: The Future of General-Purpose Generation via Tree-Based Planning