Collaborative Multi-Robot Non-Prehensile Manipulation via Flow-Matching Co-Generation

Unknown University



In this work, we propose a unified framework for collaborative multi-robot, multi-object non-prehensile manipulation that combines flow-matching co-generation with anonymous multi-robot motion planning. Our contributions are mainly:

  • The Generative Collaboration (GCo) framework, which interleaves generative modeling with model-based planning to exploit known structure while learning components that are difficult to model.


  • The Goal Swapping with Priority Inheritance (Gspi) algorithm, an anonymous multi-robot motion planner that supports GCo by planning object motions that make progress toward their goals and robot motions that reach the generated contact points.

  • A novel contact modeling approach with discrete-continuous flow-matching co-generation.
  • Multi-Robot Multi-Object Manipulation

    Balancing flow-matching discrete-continuous co-generation with scalable multi-robot motion planning enables flexible multi-robot manipulation. We have evaluated GCo on increasingly complex manipulation tasks with up to nine robots, heterogeneous object shapes and types, and long horizon obstacle-laden scenes.

    Three robots manipulate two objects around a small obstacle.
    Long horizon manipulation in clutter, with nine robots and four objects.
    Five robots manipulate three objects. Robots avoid a wall and switch objects as necessary
    We trained our models on data collected in MuJoCo.

    Manipulation Experimental Results

    This section includes compiled success rates for single-object and multi-object manipulation experiments. All experiments require objects to arrive at specified poses (translation and orientation). We notice a clear gradient in performance (and color!): GCo-DC performs most robustly, then GCo-CC and GCo-CT, and finally baselines. We include a model comparison (DC, CC, CT) lower in the page, and more results in our manuscript.


    Single-object manipulation success
    Single-object multi-robot manipulation.
    Multi-object manipulation success
    Multi-object multi-robot manipulation.

    Model variants: DC = Discrete-Continuous, CC = Continuous-Continuous, CT = Continuous-Threshold. Higher values indicate better success. The number of robots available to each method is with the number following its name (e.g., 1, 2, 3, 6, or 9).

    Out-Of-Distribution Manipulation Results

    We have trained our flow-matching co-generation models on cylinders and boxes of varying sizes. In these experiments, we evaluate the models' ability to generalize to shapes and sizes that are significantly different from those seen during training. In particular, we conducted two test suites (120 runs overall), where we varied object shapes from familiar rectangles to non-convex general polygons and "T" shapes (as seen below) and required robots to move the objects to specified translations and rotations.


    We generated out-of-distribution object shapes by progressively deforming eight vertices placed initially as a rectangle. We also experiment with "T" shapes, as seen below.

    GCo is able to properly move and rotate unseen objects ("T" shapes and deformed rectangles) in clutter.
    Heavy noise in an obstacle-laden environment. Three robots move objects to goals, despite objects being significantly out-of-distribution.
    While GCo performs robustly with out-of-distribution objects, stability is predictably better in-distribution.
    Heavy noise in free space. The object goals are initially assigned in a shuffled manner. GCo reassigns goals effectively and ends up pairing each object with a nearby goal.

    Results Summary

    Aggregated

    Success rate (averaged across robots)
    GCo-DC is generally robust to noise. Others decline slowly.
    Cost per robot vs noise
    Average cost per robot increases with noise, showing slow but increased inaccuracy in execution.

    Obstacle-Free

    Success rate on empty map
    Success rates. GCo-DC performs the best and improves with more robots.
    Cost per robot (empty)
    Cost per robot is lower when more robots are used, showing good utilization of resources.

    Obstacle Avoidance

    Success rate with wall obstacle
    More difficult tests lead to more variability in performance. GCO-DC still solves 75% of problems.
    Cost per robot (wall)
    Cost is higher in more complex environments where the manipulation horizon is longer.

    The plots above compare success rates and per-robot costs as object shape noise increases. The left column shows overall trends, while the other two columns show results split by environment type (empty vs. obstacle-laden). Together these figures illustrate how GCo handles out-of-distribution objects. We observe that GCo is surprisingly resilient to variation in object geometry and can still solve upward of 75% of problems even with maximal noise applied to objects. GCo-DC dominates other variants.

    Anonymous Multi-Robot Motion Planning

    Gspi visualization.

    A central contributions of our work is Gspi, an anonymous multi-robot motion planner. This section visualizes our experimental analysis of Gspi. In all tests, the initial assignment of robots to goals is random, and experiments have been repeated multiple times to ensure reliability. In most tests, we varied the number of robots to assess scalability. Please see our paper for details. Each visualization is annotated with the robot team-size within it.


    Stress Tests

    These are a few tests that push the limits of Gspi - specifically attempting to test it in scenarios that require robustness to extreme robot and obstacle densities.

    Dense obstacles (114 robots).

    Dense obstacles (57 robots).

    Dense packing (3 robots).
    Robots must move together.

    Dense packing escape (9 robots).

    Narrow Corridor (9 robots).

    Dense packing escape (3 robots).
    Robots must move together.

    Obstacle-Rich Environments

    We evaluated Gspi in environments with numerous obstacles to test its navigation and goal-swapping capabilities in conditions that force robots into dense congestion. These settings are normally chalenging for multi-robot motion planning algorithms.

    Slalom (35 robots).

    Circle Opening (56 robots).

    Funnel (30 robots).

    Gspi results summary in obstacle-rich environments.

    Gspi results summary in obstacle-rich environments.

    Large Scale in Open Spaces

    We tested Gspi in large scale scenarios without obstacles to assess its ability to effectively swap goals between robots and streamline their movements. As always, all goals were initially assigend randomly to robots. Motions that look natural, or short, are a result of effective goal swapping.

    Circle to square (300 robots).

    Square to circle (150 robots).

    Square to Square (150 robots). The initial goal assignment is random.

    Gspi results summary in obstacle-free environments.

    Gspi results summary in obstacle-free environments.

    Flow-Matching Co-Generation

    Model illustration. Our flow-matching co-generation framework addresses the fundamental challenge of jointly reasoning about contact formation and manipulation trajectories. The model takes visual observations of the environment, a robot budget, and a required transformation for the observed object, and co-generates discrete contact points alongside continuous manipulation trajectories. This dual representation ties contact planning to the perceptual space, avoiding reasoning over large continuous spaces unnecessarily, while maintaining flexibility for generating smooth manipulation trajectories.

    Collaborative

    Collaborative result.

    Our approach enables collaboration between multiple robots for manipulating multiple objects. GCo handles scenarios where the number of objects is greater than the number of robots and vice versa.

    Long-Horizon

    Long-horizon result.

    The framework can handle long-horizon manipulation tasks in complex, cluttered environments. To do so, GCo leverages flow-matching co-generation for learning the portions that are hard to model, and turns to sclable planning for those that can be modeled well (object and robot non-interacting motions).

    Model Variants Comparison

    GCo-DC
    (Discrete-Continuous)

    Co-generates discrete contact points and continuous manipulation trajectories.

    GCo-CC
    (Continuous-Continuous)

    Co-generates continuous contact formations and continuous manipulation trajectories.

    GCo-CT
    (Continuous-Trajectory)

    Generates only an unconstrained continuous trajectory. The first configuration of the trajectory is treated as the contact point.

    Note: In these animations of GCo-DC and GCo-CC, trajectories are translated to begin at the contact points for visualization purposes. In practice, trajectories are generated with their initial configurations being the origin.

    • GCo-DC: Contact points are chosen from the perceived space as discrete choices over pixels.
    • GCo-CC: Contact points remain unconstrained in the continuous space.
    • GCo-CT: No explicit co-generation of contact formations.

    Abstract

    Coordinating a team of robots to reposition multiple objects in cluttered environments requires reasoning jointly about where robots should establish contact, how to manipulate objects once contact is made, and how to navigate safely and efficiently at scale. Prior approaches typically fall into two extremes--either learning the entire task or relying on privileged information and hand-designed planners--both of which struggle to handle diverse objects in long-horizon tasks. To address these challenges, we present a unified framework for collaborative multi-robot, multi-object non-prehensile manipulation that integrates flow-matching co-generation with anonymous multi-robot motion planning. Within this framework, a generative model co-generates contact formations and manipulation trajectories from visual observations, while a novel motion planner conveys robots at scale. Crucially, the same planner also supports coordination at the object level, assigning manipulated objects to larger target structures and thereby unifying robot- and object-level reasoning within a single algorithmic framework. Experiments in challenging simulated environments demonstrate that our approach outperforms baselines in both motion planning and manipulation tasks, highlighting the benefits of generative co-design and integrated planning for scaling collaborative manipulation to complex multi-agent, multi-object settings.