AsiaTechDaily – Asia's Leading Tech and Startup Media Platform
SingleStore, a real-time data platform, recently announced a bi-directional integration with Apache Iceberg, along with several other key enhancements. These advancements aim to change how businesses manage and utilize their data, unlocking new potentials in real-time AI, analytics, and intelligent applications.
Data exists in various formats and types, often scattered across different sources. The disconnection between these formats leads to costly and time-consuming data movement, ultimately resulting in a poor user experience. SingleStore’s data platform aims to address this by allowing users to seamlessly work with both unstructured and structured data, simplifying the process and enabling easy conversion into a unified format.
The integration with Apache Iceberg is particularly significant. Many of SingleStore’s customers store Iceberg data in data lakehouses and data lakes. However, using this data for AI applications that require real-time context has been challenging. The Apache Iceberg integration seeks to resolve this issue by increasing data accessibility and allowing enterprises to combine all their data into a single repository.
In a recent interview with AsiaTechDaily, Madhukar Kumar, the Chief Marketing Officer at SingleStore, delves into the company’s latest innovations and their impact on enterprises.
Kumar discusses the technical challenges addressed by these updates, the unique features that set SingleStore apart, and the future of cloud products and data processing.
How critical is real-time AI for enterprises today, and what role does SingleStore play in enabling this?
Today’s enterprises are rapidly integrating AI into their operations. However, if these AI systems are trained on outdated and inadequate data, the technology fails to generate useful and relevant outputs. Essentially, AI created by Large Language Models (LLMs) that rely on data frozen in time limit knowledge acquisition, lack full contextual awareness and lead to hallucinations. No enterprise will derive value from a system that produces insufficient and inaccurate responses.
That’s why at SingleStore, we’re helping enterprises overcome these challenges by building a single location to store, manage and curate all knowledge and data. Our latest product innovations double down on our commitment to helping developers and data scientists build intelligent applications. Through faster vector search, keyword matching and auto-scaling, we’ve created an enterprise-grade data platform that enables users to transact, analyze and contextualize data in milliseconds. These key features are supporting “live” retrieval augmented generation (RAG), a pioneering contextual format for speedier and more efficient data processing and analytics. Altogether, we’re helping enterprises realize the full potential of their data to enable successful innovation.
Can you explain how the bi-directional integration with Apache Iceberg specifically benefits enterprises struggling with ‘frozen’ data in their data lake houses?
This advancement reduces the complications of data ingestion. Enterprises have multiple data sources that store massive amounts of data, but they need a pipeline to get the knowledge from each source. It’s often expensive and time-consuming to extract, transform, and load (ETL) data from one source to the next.
SingleStore integration with Apache Iceberg enables quicker ingestion, supporting the query of data instantaneously to flow between sources. The data flows into SingleStore’s platform and works as zero-ETL to make these processes simpler. When it’s easier to extract data from sources and mix them with live streaming data for all modern applications, including gen AI services, it no longer remains “frozen” or unused in data lakehouses.
The update mentions faster vector search and enhanced full-text search. Can you elaborate on the specific improvements and how they impact the performance of real-time applications?
Vector search has been a staple of our data platform since 2017. At SingleStore, we understood that vectors play a critical role in Retrieval Augmented Generation (RAG) use cases that most enterprises have started to rely on. The fewer extractions and transformations, the more seamless the data querying and processing will be. This understanding led us to support faster vector capabilities over time, and with this latest version, we’ve integrated vector search with HNSW that is 40% faster than previous versions.
Our IVF Flat index is between 47 to 100x quicker than pgvector, and the build times are now 2-3x faster than Milvus and pgvector. These advancements, coupled with keyword matching, range searches, and filters, enable enterprises to build intelligent applications generating real-time outputs. All of this is coupled with the fact that customers can add multiple vector indexes to the same data in order to support queries either requiring more accuracy or lower latency.
Would love to hear your comment on the Helios — BYOC deployment model addressing the needs of customers with strict regulatory and governance requirements.
Data regulations are rapidly evolving, just as the number of cybersecurity threats is increasing. For enterprises, it’s difficult to effectively negate these challenges. Keeping up with the latest government and industry regulations is not an easy task. However, providing a private virtual cloud within our data platform enables enterprise IT leaders to no longer carry the burden of managing data governance practices on their own. We follow industry best practices and ensure their data is secure, providing them with some peace of mind.
What feedback have you received from customers participating in the private preview on AWS?
The genesis of BYOC came in as customer requirements from large enterprises. These companies have several petabytes of data that they want to use for modern applications but don’t necessarily want to manage all the operations related to it.
BYOC allows them to maintain the data locality and governance in their own virtual private cloud on AWS with the rest of their infrastructure, all the while SingleStore manages it through a common control plane. This is a huge differentiator for customers already using AWS for all their workloads.
Private preview customers are pleased that they’re receiving the same performance at scale on a much easier-to-use BYOC package that takes care of scaling, observability, and automatically updates while their data remains in their AWS account.
What innovations are you excited about in real-time AI and data processing that SingleStore is working on or exploring?
For the first time, we now have web applications that are evolving to become experiences for segment-of-one. Instead of just text messages back and forth with an LLM, we are now seeing rich data coming back to the users in the form of analytics and agentic widgets that users can then store as their personal pages, similar to the pages feature recently launched by Perplexity.
This evolution relies heavily on rich data curation and analysis in split seconds. This has been a SingleStore strength for years and with our new features of unlocking Iceberg data, faster hybrid search and enterprise-grade features like auto-scaling, we feel we are singularly poised to become the single knowledge store for all enterprise applications.
Madhukar Kumar, spoke about the company’s latest advancements and their potential impact on data management and AI applications. Kumar also highlighted the importance of real-time data processing and the need for seamless integration of diverse data formats.
SingleStore’s recent innovations, including the integration with Apache Iceberg and enhancements to vector and text search capabilities, aim to address such challenges of managing and utilizing diverse data formats in real time. These updates are designed to streamline data processes, reduce costs, and improve application performance.
Also Read: