Real-time data streams and processing are crossing into the mainstream – they will become the norm, not the exception, according to IDC

The drivers are, by now, familiar: Cloud, IoT and 5G have increased the amount of data generated by – and flowing through – organizations. They have also accelerated the pace of business, with organizations rolling out new services and deploying software faster than ever. 

Spending on data analytics has been growing as a result – by around a third year-on-year across all sectors, as those in charge of operations attempt to make sense of this data. They want to take effective decisions in real time in response to changing events and market conditions. This has been accelerated due to technology disruptors, both large and small, driving a new normal of more intelligent applications and experiences.

We are therefore experiencing a burgeoning renaissance in streaming technologies – from data-flow management to distributed messaging and stream processing, and more. 

Forrester’s Mike Gualtieri profiles the landscape here: “You can use streaming data platforms to create a faster digital enterprise… but to realize these benefits, you’ll first have to select from a diverse set of vendors that vary by size, functionality, geography, and vertical market focus.”

Bloor’s Daniel Howard goes deeper on what it takes to realize the promise they offer in analytics. “Streaming data… is data that is generated (and hence must be processed) continuously from one source or another. Streaming analytics solutions take streaming data and extract actionable insights from it (and possibly from non-streaming data as well), usually as it enters your system.”

This has huge appeal according to Gartner. It expects half of major new business systems will feature some form of continuous intelligence based on real-time, contextual data to improve decision taking.

The important phrase in the work of Howard and Gartner is “continuous processing” because it has implications for real-time analytics.

Real time? Nearly…

Organizations with real-time operations need analytics that deliver insights based on the latest data – from machine chatter to customer clicks – in a matter of seconds or milliseconds.

To be effective, these analytics must offer actionable intelligence. For example, a commerce cart must be capable of making recommendations to a shopper at the point of engagement based on past purchases, or be able to spot fraudulent activity. That means enriching streaming data with historic data typically held in legacy stores, such as relational databases or mainframes.  

It’s a process of capture, enrichment and analytics that should be continuous, yet Kappa – a key architecture for streaming – doesn’t deliver continuous and it’s a problem for real-time analytics.

Kappa sees data fed in through messaging storage systems like Apache Kafka. It’s processed by a streaming engine that performs data extraction and adds reference data. That data is often then held in a database for query by users, applications or machine-learning models in AI. 

But this throws up three bumps to continuous processing.

First, Kappa is being implemented with a relational or in-memory data model at its core. Streaming data – events like web clicks and machine communications – are captured and written in batches for analysis. Joins between data take place in batches and intelligence is derived in aggregate. But batch is not real time – it’s near-real time and it serves analysis of snapshots, not the moment. This is counter to the concept of continuous as expressed by Howard and Gartner.

Raw performance takes us further away from continuous: Traditional data platforms are formatted drive by drive with data written to – and read – from disk. The latency of this process only adds underlying drag that comes with the territory of working with physical storage media.

Finally, there’s the manual overhead of enriching and analyzing data. As McKinsey in its report, Data Driven Enterprise of 2025, notes: “Data engineers often spend significant time manually exploring data sets, establishing relationships among them, and joining them together. They also frequently must refine data from its natural, unstructured state into a structured form using manual and bespoke processes that are time-consuming, not scalable and error prone.”

Ditch the batch in real time

Real-time analytics comes from continuous and ongoing acts of ingestion, enrichment and querying of data. Powering that process takes a computing and storage architecture capable of delivering sub-millisecond performance – but without hidden costs or creating a spaghetti of code.

This is where we see the most advanced stream processing engines will employ memory-first integrated fast storage. This approach swaps stop-go processing for continuous flow with the added plus of a computational model that can crunch analytics in the moment. 

Such engines combine storage, data processing and a query engine. Data is loaded into memory, it is cleaned, joined with historic data and aggregated continuously – no batch. Second, by sharing the random-access memory of groups of servers combined with fast SSD (or NVMe) storage to continuously process and then store data that’s being fed into their collective data pool. Processing is conducted in parallel to drive sub-millisecond responses with millions of complex transactions performed per second. 

It’s vital, too, to empower your people. Your team needs a language for writing sophisticated queries. Your continuous platform should, therefore, be a first-class citizen of streaming SQL.

SQL is a widely used and familiar data query language. Bringing it to streaming simply opens the door to everyday business developers who would rather not have to learn a language like Java. Streaming SQL doubles down on the idea of continuous: results to queries written using streaming SQL will be returned as needed – not after a batch job. Streaming SQL lets teams filter, join and query different data sources at speed of the stream – not after the fact. 

We’re seeing a renaissance in streaming technologies, with more choices than ever for data  infrastructures. But, as more organizations take their operations real time, it’s vital that the analytics they’ll come to depend upon can deliver the insight they’ll want, the moment it’s needed. That will mean streaming built on a foundation of continuous processing – not blocks of batch.

To hear more about cloud native topics, join the Cloud Native Computing Foundation and the cloud native community at KubeCon + CloudNativeCon North America 2022 in Detroit (and virtual) from October 24-28.