Meta had a big win last year with “Segment Anything,” a machine learning model that could quickly and reliably identify and outline almost anything in an image. The sequel, which CEO Mark Zuckerberg debuted onstage at SIGGRAPH on Monday, brings the model into the video realm, showing how rapidly the field is evolving.
Segmentation is the technical term for a visual model looking at an image and picking out parts: “This is a dog, and this is the tree behind the dog,” and hopefully not “This is the tree growing out of the dog.” This has been done for decades, but it has gotten a lot better and faster in recent times, and Segment Anything is a big step forward.
Segment Anything 2 (SA2) is a natural successor in that it applies natively to video as well as still images. Of course, we could run the first model on every frame of a video individually, but this is not the most efficient workflow.
“Scientists use this to study coral reefs and natural habitats and things like that. But being able to do this on video, with zero shots, and tell it what you want, is pretty cool,” Zuckerberg said in a conversation with Nvidia CEO Jensen Huang.
Of course, video processing requires much more computing power, and the fact that SA2 can run without collapsing data centers is a testament to the efficiency gains being made across the industry. Of course, this is still a huge model that requires serious hardware to run, but fast and flexible segmentation was virtually impossible even a year ago.
Image credit: Meta
This model, like the first one, is open and free to use, although there is no mention of the hosted version that AI companies sometimes offer, although there are free demos.
Naturally, training such a model requires a lot of data, and Meta has also published a large annotated database of 50,000 videos that it created just for this purpose. The paper describing SA2 also used a separate database of over 100,000 “internally available” videos for training, but this has not been made public. I asked Meta for more information about what this is and why it is not made public. (Our guess is that it is sourced from public Instagram and Facebook profiles.)
Example of labeled training data. Image credit: Meta
Meta has been a leader in the “open” AI space for a few years now, but in fact (as Zuckerberg noted in our conversation) it has maintained that position for a long time with tools like PyTorch. But more recently, LLaMa, Segment Anything, and a few other models the company has made available for free have become relatively achievable standards of AI performance in those spaces, even if their “openness” is debatable.
Zuckerberg said that while Meta's openness wasn't entirely well-intentioned, that doesn't mean their intentions were impure.
“This is not just software that you can build. You need an ecosystem around it. If you don't open source it, it's likely not going to work, right? We're not doing this because we're altruistic people, we're doing this because we think what we're building will be the best, even if we think it will help the ecosystem.”
Either way, it will certainly be of good use, check it out on GitHub here.