Collaborative Multi-Robot Non-Prehensile Manipulation via Flow-Matching Co-Generation

Model Variants Comparison

GCo-DC
(Discrete-Continuous)

Co-generates discrete contact points and continuous manipulation trajectories.

GCo-CC
(Continuous-Continuous)

Co-generates continuous contact formations and continuous manipulation trajectories.

GCo-CT
(Continuous-Trajectory)

Generates only an unconstrained continuous trajectory. The first configuration of the trajectory is treated as the contact point.

Note: In these animations of GCo-DC and GCo-CC, trajectories are translated to begin at the contact points for visualization purposes. In practice, trajectories are generated with their initial configurations being the origin.

GCo-DC: Contact points are chosen from the perceived space as discrete choices over pixels.
GCo-CC: Contact points remain unconstrained in the continuous space.
CT: No explicit co-generation of contact formations.

Flow-Matching Co-Generation

Model illustration. Our flow-matching co-generation framework addresses the fundamental challenge of jointly reasoning about contact formation and manipulation trajectories. The model takes visual observations of the environment, a robot budget, and a required transformation for the observed object, and co-generates discrete contact points alongside continuous manipulation trajectories. This dual representation ties contact planning to the perceptual space, avoiding reasoning over large continuous spaces unnecessarily, while maintaining flexibility for generating smooth manipulation trajectories.

Abstract

Coordinating a team of robots to reposition multiple objects in cluttered environments requires reasoning jointly about where robots should establish contact, how to manipulate objects once contact is made, and how to navigate safely and efficiently at scale. Prior approaches typically fall into two extremes--either learning the entire task or relying on privileged information and hand-designed planners--both of which struggle to handle diverse objects in long-horizon tasks. To address these challenges, we present a unified framework for collaborative multi-robot, multi-object non-prehensile manipulation that integrates flow-matching co-generation with anonymous multi-robot motion planning. Within this framework, a generative model co-generates contact formations and manipulation trajectories from visual observations, while a novel motion planner conveys robots at scale. Crucially, the same planner also supports coordination at the object level, assigning manipulated objects to larger target structures and thereby unifying robot- and object-level reasoning within a single algorithmic framework. Experiments in challenging simulated environments demonstrate that our approach outperforms baselines in both motion planning and manipulation tasks, highlighting the benefits of generative co-design and integrated planning for scaling collaborative manipulation to complex multi-agent, multi-object settings.

Collaborative Multi-Robot Non-Prehensile Manipulation via Flow-Matching Co-Generation

Video summary.

Balancing flow-matching discrete-continuous co-generation with scalable multi-robot motion planning enables flexible multi-robot manipulation.

Model Variants Comparison

GCo-DC
(Discrete-Continuous)

GCo-CC
(Continuous-Continuous)

GCo-CT
(Continuous-Trajectory)

Flow-Matching Co-Generation

Collaborative

Long-Horizon

Abstract

Collaborative Multi-Robot Non-Prehensile Manipulation via Flow-Matching Co-Generation

Video summary.

Balancing flow-matching discrete-continuous co-generation with scalable multi-robot motion planning enables flexible multi-robot manipulation.

Model Variants Comparison

GCo-DC (Discrete-Continuous)

GCo-CC (Continuous-Continuous)

GCo-CT (Continuous-Trajectory)

Flow-Matching Co-Generation

Collaborative

Long-Horizon

Abstract

GCo-DC
(Discrete-Continuous)

GCo-CC
(Continuous-Continuous)

GCo-CT
(Continuous-Trajectory)