If your startup has even a passing connection to dealing with data pipelines, you're probably wondering how to take advantage of the current situation. Enterprises are looking for ways to get the most out of their data to power their generative AI products, and they need robust data services to do so. Founded in 2020, Airbyte started with a focus on building a low-code/no-code open-source data integration platform. Since then, Airbyte has raised $181.2 million in total, including a massive $150 million Series B round that came at a somewhat unconventional time in late 2021.
Four years later, the company has now released Airbyte 1.0, with the focus, of course, on AI, intended as both an addition to Airbyte's own tools and to enable users to build their own AI-based services.
In fact, the company is now cleverly leveraging AI to extend its entire low-code/no-code philosophy: Their models will be able to look at API documentation and automatically create connectors based on it — you just provide the documentation and it will handle the rest (at least in theory, but time will tell how well it works in practice).
Michel Tricot, co-founder and CEO of Airbyte, told me that he sees one area where large-scale language models are transforming how companies use data: by making unstructured data much more useful and usable.
“Structured data is just the tip of the iceberg when it comes to unlocking the full potential of data,” he says. “The rise of LLMs has enabled us to efficiently leverage previously inaccessible unstructured data. … The demand for processing multimodal data is huge. Our latest developments are focused on supporting intelligent and context-aware pipelines, optimizing frameworks such as RAG, and automating pipeline creation based on customer data workflows. These innovations are essential to enable advanced use cases and improve the performance of LLMs.”
Airbyte has significantly improved its ability to manage unstructured data, allowing users to leverage their existing pipelines to manage unstructured data without relying on additional tools.
In non-AI news, Airbyte's connector now also supports GraphQL, allowing users to access a number of additional datasets without having to build custom pipelines.
With this release, Airbyte is also making its self-managed enterprise service generally available. Like many open source companies, the enterprise version, available on the AWS and GCP marketplaces, offers features like single sign-on (SSO) and role-based access control (RBAC), as well as Airbyte-specific capabilities like sensitive data masking and advanced monitoring capabilities.
Airbyte says it currently has 7,000 enterprise customers and more than 170,000 deployments, ranging from Calendly and Coupa to Perplexity AI and Siemens.
“Every company is a data company, driving decision-making and providing the foundation for AI initiatives,” said Tricot. “Only Airbyte, with an open source strategy that enables hundreds of connectors, can enable companies to leverage any data they choose. As AI continues to drive transformation, we are delivering the technology and ecosystem organizations need to build the data infrastructure they need for AI-driven innovation.”