AI is permeating every corner of biotech and pharmaceutical research, but like other industries, implementation is not as easy as it seems. Converge Bio has built tools to help companies put biology-focused LLMs to work, from “enriching” data to explaining answers. The company raised $5.5 million in a seed round to expand its products.
“A model is just a model. It's not enough,” said Dov Gertz, CEO and co-founder. “We need to create a pipeline so that companies can actually use the models in their own R&D processes. We want to be able to integrate it and use it anywhere.We want to be that place.”
If you're not a machine learning engineer working in drug discovery, this may not be a familiar problem. But fundamentally, there are powerful foundational models out there, large language models trained not on books or the Internet, but on huge databases of DNA, protein structures, and genomics.
These are powerful and versatile models, but like the LLMs used in products like ChatGPT and Cursor, they require a lot of work to get into a shape that people can actually use on a daily basis. This task is particularly difficult in specialized fields such as microbiology and immunology. Taking a “raw” LLM trained on billions of protein sequences and making it something that research engineers can use as part of their regular research is not a trivial problem.
As an example, Gertz suggested antibody research. LLMs trained in antibody-specific biology do exist, but they are very common. Converge Bio provides a set of improvements that can be performed securely using a company's own IP.
From left: Ido Weiner, chief scientific officer at Converge Bio; CEO Dov Gertz. Oded Caleb, CTO. Image credit: Omer Hacohen / Converge Bio
The first is “data enrichment,” which enriches the antibody LLM with important relevant data such as antigen-antibody and protein-protein interactions. Then, by loading it with more specific knowledge, it can be fine-tuned to the specific antigen the team is targeting, and they may have unique in-dish data about it.
“Now we have a complete application. The input is a sequence and the output is a binding affinity,” Gertz said. Next, the platform provides another important layer of explainability. Researchers can drill down into the output to tell not just “this sequence works better than this,” but what parts of the sequence appear to work better, down to the amino acid or base pair level. you can pinpoint it.
Finally, we generate new sequences that provide improved results with similar explainability. Gertz said he's surprised by explainability's popularity with customers. This makes sense because experts can apply their field expertise (e.g., protein interactions) to this new and unknown area of bioinformatics and machine learning.
Image credit: Converge Bio
Converge uses many open source and free foundation models out there, but we are also working on creating our own foundation models. Gertz said the company already has its own process in place for the explainability part. And the data enrichment “curriculum” is also completely theirs and is not an easy process. He pointed out that training methodologies are one of the few secrets closely guarded by the most successful AI companies.
This is part of the moat they hope to build upon, along with the fact that, as Gertz says, “this is probably the biggest opportunity in biotech in the last 50 years.”
However, many, perhaps most, biotech companies have specialized solutions for doing LLM-related work in their field or actively pursuing niche areas where generic solutions do not apply. Not yet.
“The idea is to be a be-all and end-all store for GenAI in biotech and use that as a wedge to deliver more over time,” Gertz said. “The trend in the pharmaceutical and biotech industries is that once you have a trusted vendor connection, you want to use it for other use cases, whether it's antibody design or vaccine design. I think it's the best in the market right now.”
Investors seem to agree, with $5.5 million in a seed round led by TLV Partners.
The company plans to use the money to hire and acquire customers, as startups often do at this stage, but it will also publish scientific papers on antibody design (using its own systems, of course) and develop “the right base models.” I'm planning on training. ”
TechCrunch has a newsletter focused on AI. Sign up here to get it delivered to your inbox every Wednesday.