Same architecture, same dataset, same loss, same seed. The only variable changed is the geometry of the target space. That's enough to completely change the convergence behavior: FM keeps reinforcing a slightly wrong trajectory until late in training, while AToM never commits to a wrong trajectory in the first place. The point isn't a huge final FID gap. It’s that the failure mode disappears.
Same architecture, same dataset, same loss, same seed. The only variable changed is the geometry of the target space. That's enough to completely change the convergence behavior: FM keeps reinforcing a slightly wrong trajectory until late in training, while AToM never commits to a wrong trajectory in the first place. The point isn't a huge final FID gap. It’s that the failure mode disappears.