Inception is a new Palo Alto-based company launched by Stanford Computer Science professor Stefano Ermon, and claims to have developed a new AI model based on “diffusion” technology. Inception calls it a major diffusion-based language model, or “DLM” for short.
The most attention-grabbing generation AI models can be broadly divided into two types: large-scale language models (LLM) and diffusion models. LLMS built on a trans architecture is used for text generation. Meanwhile, diffusion models that power AI systems such as Midjourney and Openai's SORA are primarily used to create images, video and audio.
According to the company, Inception's model offers traditional LLM features, including code generation and questioning, but it offers significantly faster performance and reduces computing costs.
Elmon told TechCrunch that he has been studying how to apply the diffusion model to texts for a long time at Stanford University. His research was based on the idea that traditional LLM is relatively slow compared to diffusion techniques.
With LLMS, “you can't generate the second word until you generate the first word. You can't generate the first two words,” says Ermon.
Elmon was looking for ways to apply a diffusion approach to the text. This is because unlike LLM, which operates sequentially, it starts with a rough estimate of the data (such as photographs) generated by the diffusion model, and focuses the data at once.
The diffusion model assumes Elmon, which was assumed to have been generated and modified in parallel with large blocks of text. After years of attempts, Elmon and his students achieved a major breakthrough. They explained in detail in a research paper published last year.
Recognizing the potential for promotion, Elmon established the start last summer, tapping two former students, UCLA professors, Professor Aditya Glover and Professor Cornell, Voldi Milkleshov, to jointly lead the company.
Elmon refused to discuss Inception funds, but TechCrunch understands that the Mayfield Fund has invested.
By addressing the critical needs of lower AI latency and increasing speed, Inception already has multiple customers, including unnamed Fortune 100 companies. Emron said.
“What we found is that models can make use of GPUs much more efficiently,” Ermon said, referring to the computer chips commonly used to run models in production. “I think this is a big deal. This will change the way people build language models.”
Inception offers APIs and on-premises and edge device deployment options, support for model tweaking, and a suite of ready-to-use DLMS for a variety of use cases. The company claims that DLMS can run up to 10 times faster than traditional LLMs, but costs 10 times more.
“Our 'small' coding model is just as good [OpenAI’s] The GPT-4o Mini is more than 10 times faster,” a company spokesperson told TechCrunch. “Our “mini” model is better than small open source models like [Meta’s] With Llama 3.1 8b, he has achieved over 1,000 talks per second. ”
“Token” is the industry term for bits in raw data. Assuming that Inception's claim continues, 1000 tokens per second is a truly impressive speed.