There's quite a few monocular depth estimation models out there, have been for years. This one looks pretty good. That said, the temporal stability seems pretty wobbly, I don't think I'd use it for a self driving car.
The most impressive example was the point cloud they generated from the extreme fisheye lens, that was nice.
Predicting that the background on cloud city was a flat matte painting is also impressive in a way. It does seem to collapse all far field objects into a single plane. That's a decent compromise for many things.
I've seen tools for Generative AI art which do something a lot like that [1]
Generate an image of subject A, apply a depth estimation ML model to figure out which pixels belong to the foreground and cut them out.
Then generate an image of background B with a different prompt, and paste the subject from the first step on top. A sort of automatic collage making process.
when it is 3d then it can be trivially converted to point cloud (if it is not exportable as point cloud already) and then you can scale, overlay, splice, merge anything with anything trivially. game over !
eu is having plans and making steps for having main highway infrastructure ready for autonomous vehicles, they are changing colors on road signs for years (vertical and horizontal) so cars have higher probability of decoding it, introducing C-ITS, ... This can be another great addition to get us there safely and not by listening to AI nonsense bros. while europe is trying to build stuff while being attacked by russians in more ways than one, https://www.theguardian.com/world/2024/dec/04/up-to-100-susp... US "elites" are doing nothing just talking about impending doom and buying land on Hawaiian islands or new zealand for apocalyptic shelters....
There's enough crazy stuff in image and video happening now that would have turned heads at SIGGRAPH just a few years ago. It feels like we'll have fully programmable and deformable visual worlds soon.
Can’t wait enough for that. But there are so many new methods and papers coming out every week I dont see how tooling can become stable enough to have a unified workflow that doesnt need a clean slate takeover every 6 months
Point louds do not load on firefox mobile
Couple of other recent similar papers (this UniK3D might be the best for single image, not sure):
https://vgg-t.github.io/
https://zhanghe3z.github.io/FLARE/
ETH is insane last few years, last year literally dirt cheap literally lossless seasonal energy storage : https://ethz.ch/en/news-and-events/eth-news/news/2024/08/iro...
now they have better 3D perception algo than Tesla after 10 years of Musks fraudulent ai claims...
There's quite a few monocular depth estimation models out there, have been for years. This one looks pretty good. That said, the temporal stability seems pretty wobbly, I don't think I'd use it for a self driving car.
The most impressive example was the point cloud they generated from the extreme fisheye lens, that was nice.
Predicting that the background on cloud city was a flat matte painting is also impressive in a way. It does seem to collapse all far field objects into a single plane. That's a decent compromise for many things.
Another leading monocular depth estimation model, Marigold [1] is also from ETH.
[1] https://marigoldmonodepth.github.io/
tesla can not see paper box in my garage.
Would it be possible to "auto" layer the outputs from multiple images?
I've seen tools for Generative AI art which do something a lot like that [1]
Generate an image of subject A, apply a depth estimation ML model to figure out which pixels belong to the foreground and cut them out.
Then generate an image of background B with a different prompt, and paste the subject from the first step on top. A sort of automatic collage making process.
[1] https://github.com/Extraltodeus/multi-subject-render
when it is 3d then it can be trivially converted to point cloud (if it is not exportable as point cloud already) and then you can scale, overlay, splice, merge anything with anything trivially. game over !
eu is having plans and making steps for having main highway infrastructure ready for autonomous vehicles, they are changing colors on road signs for years (vertical and horizontal) so cars have higher probability of decoding it, introducing C-ITS, ... This can be another great addition to get us there safely and not by listening to AI nonsense bros. while europe is trying to build stuff while being attacked by russians in more ways than one, https://www.theguardian.com/world/2024/dec/04/up-to-100-susp... US "elites" are doing nothing just talking about impending doom and buying land on Hawaiian islands or new zealand for apocalyptic shelters....
There's enough crazy stuff in image and video happening now that would have turned heads at SIGGRAPH just a few years ago. It feels like we'll have fully programmable and deformable visual worlds soon.
Can’t wait enough for that. But there are so many new methods and papers coming out every week I dont see how tooling can become stable enough to have a unified workflow that doesnt need a clean slate takeover every 6 months
> I dont see how tooling can become stable enough to have a unified workflow that doesnt need a clean slate takeover every 6 months
That's a good thing. There's so much opportunity to disrupt incumbents now.
Build for what we have, but don't overfit. Be flexible for what's coming next.