PINO addresses multi-character interaction by decomposing it into a series of pairwise motion-generation steps.
- It starts from a pretrained two-person diffusion model.
- To add a new character, the method pairs the newcomer with one of the existing characters, supplies a text prompt describing only that pair, and generates their motion with the same model.
- The model's initial noise is then optimized so that the new motion fits the entire group—avoiding overlaps and preserving proper distances and orientations.
- This optimization-and-add cycle is repeated to build interactions of any size, and the same idea can be used to extend motion sequences over time.
Because physical and spatial penalties are built into the noise-optimization step, PINO offers fine-grained control over motion composition without any additional training of the diffusion model.