Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation

Xunzhi Xiang^*1,2, Yabo Chen^*2,3, Guiyu Zhang^*2,4, Zhongyu Wang², Zhe Gao¹,

Quanming Xiang⁵, Gonghu Shang^2,3, Junqi Liu², Haibin Huang²,

Yang Gao¹, Chi Zhang², Qi Fan^1†, Xuelong Li^2†,

¹Nanjing University, ²TeleAI, ³Shanghai Jiao Tong University, ⁴Chinese University of Hong Kong, Shenzhen, ⁵University of Chinese Academy of Sciences

arXiv Code

TL;DR

We propose a novel plan-then-populate framework centered on Macro-from-Micro Planning (MMPL) for scalable, high-quality long video generation. Experiments on standard benchmarks confirm that our method outperforms existing video generation models in quality and stability.

Macro-from-Micro Planning (MMPL): a novel long-video generation paradigm that mitigates temporal drift and color shift, while enabling multi-GPU parallelization to generate longer videos.

Macro-from-Micro Planning Framework

Overall framework of Macro-from-Micro Planning. Our method operates on two planning levels: (1) Micro Plans, which predict a sequence of future frames within each segment to mitigate local error accumulation, and (2) a Macro Plan, formed as an Autoregressive Chain of Micro Plans, where the planning frames of the first segment autoregressively generate the planning frames of subsequent segments, ensuring long-horizon temporal consistency.

Adaptive Workload Scheduling for Long Videos

Pipelined Parallel Generation for Long Videos

Adaptive Multi-GPU Workload Scheduling for Balanced Execution and Fast Autoregressive Video Generation.

Long-Time Video Generation

Our model generates high-quality 480P videos and supports streaming generation for extended durations. Below, we present 20-second videos (top), extended 30-second videos (middle), and 1-minute videos (bottom), all produced by our model without noticeable drift or color shift across time. [More Examples]

A cinematic scene from a classic western movie, featuring a rugged man riding a powerful horse through the vast Gobi Desert at sunset. The man, dressed in a dusty cowboy hat and a worn leather jacket, reins tightly on the horse's neck as he gallops across the golden sands. The sun sets dramatically behind them, casting long shadows and warm hues across the landscape. The background is filled with rolling dunes and sparse, rocky outcrops, emphasizing the harsh beauty of the desert. A dynamic wide shot from a low angle, capturing both the man and the expansive desert vista.

A high-speed FPV (First Person View) shot inside the locomotive cab of a vintage European train, moving at hyper-speed through the bustling streets of an old European city. The cab is filled with intricate mechanical details, including dials, switches, and controls, with steam and smoke swirling around. The train's windows show blurred, colorful buildings and narrow cobblestone streets passing by quickly. The camera angle provides a dynamic, immersive view, capturing the intense motion and the rich architectural details of the cityscape. The overall style is detailed and realistic, emphasizing the speed and energy of the journey.

A Japanese animated film-style scene of a young woman standing on a ship, looking back at the camera with a gentle smile. She has long black hair tied in a loose ponytail and wears a traditional Japanese kimono with intricate patterns and vibrant colors. Her expression is serene and slightly contemplative. The ship is mid-ocean, with waves gently lapping against the sides, and the background shows a vast blue sea with distant clouds and a setting sun. The scene has a soft, dreamy quality, capturing the tranquility of the moment. A medium shot from a slightly elevated angle, emphasizing her graceful posture and the serene ocean backdrop.

A stunning aerial photograph captured from a drone, circling around a majestic historic church perched atop a rocky outcropping along the Amalfi Coast. The camera captures the intricate architectural details and tiered pathways and patios that adorn the church, with waves crashing against the rocks below. The view extends to the horizon, showcasing the coastal waters and the rolling hills of the Amalfi Coast in Italy. Distant figures can be seen leisurely walking and enjoying the dramatic ocean views from the patios. The warm glow of the afternoon sun bathes the scene in a magical and romantic light, creating a breathtaking and serene atmosphere. The photo has a high-resolution, detailed quality that highlights every texture and color of the landscape. A wide-angle shot from a dynamic aerial perspective.

A traditional Chinese dining scene in a dimly lit restaurant, capturing a middle-aged Chinese man sitting at a small round table. He is attentively eating noodles with chopsticks, his face reflecting contentment and focus. His attire is casual yet neat, with a light blue shirt and black pants. The background features blurred details of other diners and tables, hinting at a bustling yet cozy atmosphere. The lighting casts soft shadows, enhancing the warm and inviting ambiance. A close-up shot from a slightly overhead angle, emphasizing the man's engaged expression and the textures of the food.

A vintage drag racing scene in a classic film noir style, featuring a group of six muscle cars lined up at the starting line of a straight asphalt strip. Each car, adorned with chrome accents and distinctive paint jobs, revs its engine loudly, smoke billowing from their exhausts. The cars are positioned side by side, ready to race, with the front wheels slightly lifted in anticipation. The background is a blurred, sunlit highway with faded road markings and distant buildings. A wide-angle shot captures the intense moment just before the race begins, emphasizing the dynamic movement and the roaring engines.

A close-up shot of someone carefully pouring milk into a cup, with the milk flowing smoothly and filling the cup with a milky white color. The person's hand is steady, guiding the milk into the cup with precision. The background is blurred, showing a subtle kitchen setting with hints of cabinets and countertops. The photo has a soft, natural lighting effect, emphasizing the smoothness and elegance of the pouring action.

A detailed oil painting in a romantic style, showcasing a young woman standing amidst a vibrant garden filled with blooming flowers. She wears a floral-patterned dress, her hair loosely tied with wildflowers adorning it. Her expression is one of serene joy, with a gentle smile on her lips. She is framed by a variety of colorful blooms, including roses, tulips, and daisies, which surround her in a natural, organic arrangement. The background features a soft, pastel-colored sky with fluffy clouds, and a gentle breeze rustling through the petals. A medium shot with a slightly tilted angle, capturing the essence of spring and renewal.

A dramatic sunset landscape photograph captured in a cinematic style, featuring a car with its side mirrors reflecting the vibrant hues of the setting sun. The car is parked on a winding road, with one of its side mirrors perfectly capturing the warm orange and pink tones of the sky. The sun is just below the horizon, casting long shadows and creating a golden glow over the landscape. The background includes rolling hills and a few trees silhouetted against the sky. The photo has a rich, film noir texture, enhancing the mood and atmosphere. A wide-angle shot from a low angle, emphasizing the reflection in the mirror and the vastness of the landscape.

A dynamic action shot in the style of a high-energy sports magazine spread, featuring a golden retriever sprinting with all its might after a red sports car speeding down the road. The dog's fur glistens in the sunlight, and its eyes are filled with determination and excitement. It leaps forward, its tail wagging wildly, while the car speeds away in the background, leaving a trail of dust. The background shows a busy city street with blurred cars and pedestrians, adding to the sense of urgency. The photo has a crisp, vibrant color palette and a high-resolution quality. A medium-long shot capturing the dog's full run.

A close-up shot of a majestic white dragon with pearlescent, silver-edged scales, icy blue eyes, and elegant ivory horns. The dragon's face is detailed with subtle wrinkles and sharp, defined features, capturing a regal and serene expression. Its breath forms a gentle mist, adding to the ethereal quality. The scales are meticulously textured, reflecting light in a way that highlights their depth and shine. Set against a softly blurred background, the scene is bathed in a soft, ambient glow, emphasizing the dragon's majesty and otherworldly presence. The background hints at a misty forest, with blurred outlines of ancient trees and vines, creating a mystical atmosphere.

A photorealistic video of a butterfly-like creature swimming gracefully through a vibrant coral reef. The butterfly has iridescent wings that shimmer in shades of blue and green, and its body is sleek and streamlined, allowing it to move effortlessly through the water. It navigates through a diverse array of colorful corals and schools of fish, creating a mesmerizing underwater scene. The background features intricate coral structures, schools of fish, and the gentle flow of seawater. The camera angle changes from a close-up of the butterfly's face and wings to a wider view of its journey through the reef, capturing the natural movements and colors with stunning clarity. The video has a fluid and dynamic quality, emphasizing the graceful motion of the creature.

A highly detailed close-up shot in HD, focusing on dew droplets glistening on the delicate petals of a blue rose. The petals are soft and velvety, with intricate patterns and subtle color gradients. Each dew drop sparkles like tiny diamonds, catching the light and creating a mesmerizing effect. The background is blurred, emphasizing the dew and petals, with a soft focus on the edges. The photo has a clear, crisp texture, highlighting the beauty and fragility of nature.

A vibrant and lively vlog-style photo of a corgi in tropical Maui, showcasing the dog energetically filming itself on a sandy beach. The corgi stands on the shore, one paw slightly lifted, with a joyful and curious expression. It wears a colorful collar and a small backpack camera slung over its neck. The background features a lush, palm-fringed beach with clear turquoise waters and a bright blue sky. The photo has a warm, natural lighting effect, capturing the corgi from a slightly elevated angle, emphasizing its playful and adventurous spirit.

A dynamic scene captured in the style of a vibrant food photography, showcasing a chef skillfully chopping onions in a bustling kitchen. The chef, a middle-aged man with a weathered face and determined expression, skillfully slices the onions with quick, practiced movements. He wears a white apron tied neatly around his waist and a chef's hat perched atop his head. The background is a well-equipped kitchen, with stainless steel appliances and countertops cluttered with various cooking tools and ingredients. Steam rises from a pot on the stove, and sunlight filters through the window, casting a warm glow. A medium shot with the chef at the center, capturing the intensity of his work.

A tropical island beach scene in a vibrant and lively illustration style, featuring a corgi wearing stylish sunglasses walking along the sandy shore. The corgi has a playful expression, its fur glistening in the bright sunlight. It strides confidently, its tail wagging as it explores the soft sand. The background showcases a clear turquoise sea with palm trees swaying gently in the breeze. A few seagulls fly overhead, adding to the serene yet lively atmosphere. The corgi’s sunglasses add a touch of whimsy and fun to the scene. A medium shot with the corgi at the center, captured from a slightly elevated angle.

A stunning mid-afternoon landscape photograph with a low camera angle, showcasing several giant wooly mammoths treading through a snowy meadow. Their long, wooly fur gently billows in the brisk wind as they move, creating a sense of natural movement. Snow-covered trees and dramatic snow-capped mountains loom in the distance, adding to the majestic setting. Wispy clouds and a high sun cast a warm glow over the scene, enhancing the serene and awe-inspiring atmosphere. The depth of field brings out the detailed textures of the mammoths and the snowy environment, capturing every nuance of these prehistoric giants in breathtaking clarity.

A charming 3D digital render art style image showcasing an adorable and happy otter confidently standing on a surfboard, wearing a bright yellow lifejacket. The otter is depicted with a joyful expression, its fur soft and detailed, and it appears to glide gracefully through turquoise tropical waters. The background features lush tropical islands with vibrant green foliage and palm trees, creating a serene and picturesque setting. The water is crystal clear, with gentle waves and sunlight filtering through, adding a sense of tranquility and vibrancy to the scene. A medium shot capturing the otter mid-glide, with a slight tilt to the camera angle emphasizing its playful and adventurous spirit.

A nighttime scene from a vintage film-style photograph, depicting a giant, otherworldly creature slowly walking down a desolate, rundown city street. Only one dim streetlamp casts flickering shadows, illuminating the creature's massive, imposing form. Its skin is rough and covered in peculiar growths, with glowing eyes that reflect the dim light. The creature's steps echo in the empty alleyways, creating a sense of eerie quiet. The background features crumbling buildings, broken windows, and trash-strewn sidewalks. The photo has a grainy texture and a muted color palette, capturing the haunting atmosphere of the scene. A medium shot with a slight tilt to the camera, emphasizing the creature's movement and presence.

A mystical Chinese ink painting depicting a crescent blue moon slowly rising over a serene mountain landscape. The moon appears ethereal and glowing, casting a soft, bluish light on the tranquil scene. Mountains in the distance are outlined in ink, with a few pine trees standing tall against the night sky. The foreground features a small stream with ripples reflecting the moonlight. A few bamboo shoots are scattered around, adding to the serene atmosphere. The sky transitions from deep indigo to lighter shades of blue as dawn approaches. A bird can be seen flying towards the moon, adding a sense of movement and life to the composition. A medium shot with a slightly upward angle.

A movie trailer in a classic cinematic style, featuring the adventurous journey of a 30-year-old space man wearing a vibrant red wool knitted motorcycle helmet. The scene unfolds against a vast blue sky and a desolate salt desert landscape. Shot on 35mm film, the trailer showcases vivid and rich colors, capturing the hero as he navigates through the harsh terrain with determination. His helmet glints under the sun, adding to the dramatic effect. The background is a mix of sweeping desert vistas and distant horizons, with the occasional shimmer of light reflecting off the salt flats. A dynamic medium shot with a sweeping overhead angle, emphasizing the hero's resilience and the vastness of his adventure.

A vibrant and dynamic photo in the style of a fast-food commercial, capturing a young man taking a big bite of a juicy burger. His mouth is full of the meat and melted cheese, creating a satisfying and mouthwatering scene. He has a casual, relaxed expression, with a hint of satisfaction on his face. His eyes are closed, and he leans slightly forward, enjoying the moment. The background shows a modern, clean restaurant setting with a blurred view of other diners and tables. The lighting is bright and focused on the burger, emphasizing its deliciousness. A close-up shot from a slightly angled perspective, capturing the vivid details of the burger and the man's joyful expression.

An astronaut runs smoothly and appears almost weightless on the lunar surface, as seen from a low-angle shot that highlights the vast, desolate background of the moon. The moon's craters and rocky terrain are clearly visible, creating a stark contrast against the running astronaut who moves with graceful, fluid motions. The background features a muted, grayscale texture with subtle shadows and highlights, emphasizing the lunar landscape's rugged beauty. The astronaut wears a classic spacesuit with reflective fabric, adding to the sense of lightness and movement. A dynamic medium shot capturing the astronaut's forward momentum.

Longer and Better

Our method delivers substantially superior performance on 30-second long video generation, surpassing MAGI, SkyReels, CausVid, and Self-Forcing in both visual quality and temporal consistency. It robustly mitigates frame drift and flickering, while effectively addressing over-saturation and color imbalance, resulting in more stable and photorealistic outputs.

MAGI

SkyReels-V2

CausVid

Self-Forcing

Ours

Scalability: Image-to-Video Extension

Our framework is not restricted to the text-to-video (T2V) task; it can be seamlessly extended to image-to-video (I2V) generation without introducing any architectural modifications or additional image encoders. This flexibility derives from the unified autoregressive design, which only requires lightweight adjustments to the number and ordering of autoregressive steps.

Scalability: Adaptation to Self-Forcing and DMD

Our approach can be seamlessly integrated with self-forcing strategies without any architectural modifications. Specifically, it only requires adjusting the attention visibility range and the prediction order during both training and inference.

Limitation: Extrapolation Quality

Although MMPL mitigates error accumulation in long video generation, the substantial temporal span of long videos means that a single text prompt often aligns only with the early content and fails to capture the full video semantics. As generation progresses, the limitations of a static prompt lead to repetitive or even collapsed content in later segments. The examples below illustrate content repetition and quality degradation caused by using a single static prompt.

BibTeX


@article{xiang2025macro,
  title={Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation},
  author={Xiang, Xunzhi and Chen, Yabo and Zhang, Guiyu and Wang, Zhongyu and Gao, Zhe and Xiang, Quanming and Shang, Gonghu and Liu, Junqi and Huang, Haibin and Gao, Yang and others},
  journal={arXiv preprint arXiv:2508.03334},
  year={2025}
}