This article was originally published on Synnada on February 13, 2023.
The world of AI and data is undergoing a rapid transformation. The traditional approach of batch data analysis for the purpose of converting insights into actionable models is no longer effective due to two key ongoing shifts:
- Real-time data analysis has become increasingly critical as consumers demand immediate gratification. Gone are the days when streaming data was just a competitive advantage, it is now a necessary aspect of business operations — with entire organizations being built around the principle of real-time operations.
- The advent of versatile AI models such as ChatGPT has changed the way we approach data-driven systems, enabling the creation of real-time agents that can perform tasks previously reserved for humans. The prospect of fully autonomous intelligent agents operating in real-time has long been a captivating goal that should now be within our reach.
However, despite recent advancements, the majority of existing solutions continue to be exclusive and only accessible to a limited group of organizations and/or individuals. So, what are the challenges and obstacles in democratizing these technological opportunities?
The initial challenge we must overcome is within the infrastructure layer, which must effectively and continuously channel data to intelligent agents. The field of data engineering is facing a growing challenge due to the ever-increasing complexity of tools and technologies — from big data platforms to data ware/lake-housing tools and query engines. This complexity is making it difficult for professionals to stay up-to-date with the latest developments and acquire the necessary skills to excel in the field. The result is a talent gap in the data-driven economy, with organizations struggling to find and retain qualified data engineers to meet the rapidly growing demand.
This isn’t the first time the industry has faced a challenge of this kind. A somewhat similar situation arose with the adoption of NoSQL databases in the early 2010s. At the time, there was a clear need for scalable and flexible data management solutions, but engineering talent skilled in these technologies was rare. This caused big data analysis using NoSQL to remain a niche, premium capability for certain breeds of companies for a relatively long time. It wasn’t until prominent players released products with standard SQL support (or at least SQL-like interfaces) that the technology saw widespread adoption by the community, engineers as well as non-engineers like data analysts.
Stream processing is facing a similar challenge. Historically, methods for analyzing both at-rest (SQL) and in-motion (non-SQL) data have been separate, leading to slower adoption of stream processing than many had anticipated. Furthermore, even SQL-like interfaces provided by stream processing solutions evolved into their own specific dialects, requiring special functions and features to express data transformations that could be accomplished with standard SQL. This was often driven by a focus on low-latency, real-time use cases, or a desire to use proprietary syntax for competitive advantage. Due to these facts, processing data streams still remains an elusive, esoteric task for many.
Fortunately, the data ecosystem seems to be moving towards a direction that will help overcome this challenge. The previously popular Lambda architecture, which decouples batch and stream processing, is giving way to the growing popularity of its streaming-first sibling, Kappa architecture. Simplicity, scalability, and lower-latency nature of Kappa notwithstanding, the rise of the event-driven architecture and continuous delivery in software engineering seems to be the main motivating factors behind this shift. This trend presents a great opportunity to solve the challenge in question: If the data infrastructure community interprets and implements the Kappa architecture with a strong unification emphasis, tool complexity may soon start to diminish and the user base of these technologies will grow.
At Synnada, we strongly support the Kappa architecture and believe that achieving a real unification of batch and stream processing is key to the widespread adoption of streaming-first approaches. That’s why we’re working hard to make streaming capabilities in Apache Datafusion accessible to anyone with proficiency in standard SQL (and just standard SQL), making it easier to build and maintain real-time data pipelines, applications, or products.
The second obstacle we need to overcome is within the intelligence layer, which will drive the detections, choices, and actions of the agents. To ensure an efficient and immediate response, it is crucial that the intelligence layer operates in real-time. This requires intelligence subsystems that can quickly process dynamic changes in data, operating conditions, or system behavior. Unfortunately, traditional AI/ML techniques relying on batch processing and training are insufficient in providing the required adaptivity to support real-time data products.
Despite the promising prospects of real-time machine learning, there are still multiple challenges that need to be overcome. One of the biggest hurdles is developing the technology to a point where it can seamlessly integrate into data engineering workflows and eventually, data products.
So, what does real-time machine learning really mean? We see several key criteria for real-time ML systems:
- A real-time ML system must have a low prediction latency, as it should be able to process new data and make predictions over data streams in near real-time.
- Continuous learning is a key aspect of a real-time ML system, as it should be able to learn from data streams incrementally and perform model updates quickly. This is especially critical for applications where data drifts and regime changes are common.
- Real-time ML systems should exhibit self-guidance, assign confidences to their predictions, and have the ability to make justified decisions on the fly.
- Furthermore, real-time ML systems should have excellent robustness properties, not only when inferencing but also during training. Currently, training of a typical ML system is generally a fragile, trial-and-error driven process that often involves some type of brute-force style search over hyper-parameters. Such approaches will need to be eclipsed by other methods in viable real-time ML systems.
- Human-in-the-loop (HITL) capabilities are crucial to cover cases where real-time ML systems do not have enough confidence to take certain actions, and should defer to humans. Moreover, HITL also facilitates efficient continuous learning. The field of reinforcement learning has a rich literature on various techniques one can utilize for this purpose.
- Finally, the cost of designing and deploying real-time ML systems is another critical aspect that must be taken into consideration in practice. This includes not only the cost of hardware and software resources required to run the system, but also the cost of expert knowledge and expertise needed to develop, implement, and maintain the system.
Today people try to utilize, or should we maybe say force, ideas like AutoML  from traditional off-line ML into the real-time ML area because that is all they have. Obviously, such approaches do not work well and have resulted in a whole plethora of auxiliary ML products, such as model monitoring tools, MLOps platforms and others.
Imagine we use an AutoML system to continually update our models — how costly (and slow) would that be? Think about trying to collect new, relevant labels every time when your model monitoring tool warns you that your model is going out of sync. If you had a working HITL system in the first place, there would be no need for doing this.
At Synnada, we are working on a comprehensive real-time ML framework that meets all these criteria by incorporating either (1) novel research, or (2) best practices from the industry. You will not need to integrate multiple ML tools while building data products — this ML framework will seamlessly mesh with our data streaming technology so our users will start benefiting from the underlying synergies from day one without getting into any complicated integration work.
If you are interested, follow us so that we can keep you in the loop about our most recent ML R&D. These days, we are focusing on robust optimization techniques that will serve as the main workhorse of the full framework, and we plan to discuss our results soon.
As we progress towards developing intelligent agents/systems that have the ability to observe, decide, act, and continuously adapt; the final obstacle to overcome is to package and deliver these agents for practical use. Our goal must be to deliver a rich selection of useful agents that should, over time, specialize to the environment of each end-user. We believe such a goal could only be achieved through a product layer  that would allow a ”test, learn and improve” cycle of building and servicing. So, what is required from such a layer?
Composability is a key aspect for any framework that aims to facilitate development of intelligent agents. There is a very large universe of use cases we want to create agents for, and every use case requires some customization and/or specialization in one way or another. Therefore, it is essential to embrace a flexible and composable approach, so that users can create agents that fit best to their specific needs by combining elementary components. As an example, imagine that we are building an end-to-end DevOps agent to help us with initial triaging. It would likely need to carry out tasks such as analyzing logs, forecasting metrics, or responding to support inquiries, each of which requires a specific AI subsystem we would need to compose together.
Another key requirement is an effective collaboration layer: Features such as comprehensive version control for data, models, and code; data management and organization functions accessible to non-technical users; support structures for ML engineers and data engineers; and cross-functional teamwork mechanisms are crucial for development, maintenance and effective operation of intelligent agents (and data systems in general). Moreover, this collaboration layer will also be responsible for facilitating intuitive interactions between AI agents and humans so that users can equip their agents with effective human-in-the-loop capabilities.
Overall, the development of advanced data/AI systems requires a compound product strategy that addresses the key components of collaboration, composability, and other non-functional (such as auditability & control, explainability, governance, serviceability, deployment, scaling, etc.) requirements. Each of these components must be given the necessary attention, designed for maximum efficiency and delivered in a tightly-knit manner, as they are all essential for unlocking the full capabilities of a live system. Adding such features as an afterthought will, eventually, reduce the robustness and effectiveness of intelligent agents. Synnada understands the need for a full-scope solution and is planning to deploy a compound product strategy to meet the demands of the ever-evolving data and AI spaces.