Beyond batch: Real-time business is a continuous event

Published: October 11th, 2022

Real-time data streams and processing are crossing into the mainstream – they will become the norm, not the exception, according to IDC.

The drivers are, by now, familiar: Cloud, IoT and 5G have increased the amount of data generated by – and flowing through – organizations. They have also accelerated the pace of business, with organizations rolling out new services and deploying software faster than ever.

Spending on data analytics has been growing as a result – by around a third year-on-year across all sectors, as those in charge of operations attempt to make sense of this data. They want to take effective decisions in real time in response to changing events and market conditions. This has been accelerated due to technology disruptors, both large and small, driving a new normal of more intelligent applications and experiences.

We are therefore experiencing a burgeoning renaissance in streaming technologies – from data-flow management to distributed messaging and stream processing, and more.

Forrester’s Mike Gualtieri profiles the landscape here: “You can use streaming data platforms to create a faster digital enterprise… but to realize these benefits, you’ll first have to select from a diverse set of vendors that vary by size, functionality, geography, and vertical market focus.”

Bloor’s Daniel Howard goes deeper on what it takes to realize the promise they offer in analytics. “Streaming data… is data that is generated (and hence must be processed) continuously from one source or another. Streaming analytics solutions take streaming data and extract actionable insights from it (and possibly from non-streaming data as well), usually as it enters your system.”

This has huge appeal according to Gartner. It expects half of major new business systems will feature some form of continuous intelligence based on real-time, contextual data to improve decision taking.

The important phrase in the work of Howard and Gartner is “continuous processing” because it has implications for real-time analytics.

Real time? Nearly…

Organizations with real-time operations need analytics that deliver insights based on the latest data – from machine chatter to customer clicks – in a matter of seconds or milliseconds.

To be effective, these analytics must offer actionable intelligence. For example, a commerce cart must be capable of making recommendations to a shopper at the point of engagement based on past purchases, or be able to spot fraudulent activity. That means enriching streaming data with historic data typically held in legacy stores, such as relational databases or mainframes.

It’s a process of capture, enrichment and analytics that should be continuous, yet Kappa – a key architecture for streaming – doesn’t deliver continuous and it’s a problem for real-time analytics.

Kappa sees data fed in through messaging storage systems like Apache Kafka. It’s processed by a streaming engine that performs data extraction and adds reference data. That data is often then held in a database for query by users, applications or machine-learning models in AI.

But this throws up three bumps to continuous processing.

First, Kappa is being implemented with a relational or in-memory data model at its core. Streaming data – events like web clicks and machine communications – are captured and written in batches for analysis. Joins between data take place in batches and intelligence is derived in aggregate. But batch is not real time – it’s near-real time and it serves analysis of snapshots, not the moment. This is counter to the concept of continuous as expressed by Howard and Gartner.

Raw performance takes us further away from continuous: Traditional data platforms are formatted drive by drive with data written to – and read – from disk. The latency of this process only adds underlying drag that comes with the territory of working with physical storage media.

Finally, there’s the manual overhead of enriching and analyzing data. As McKinsey in its report, Data Driven Enterprise of 2025, notes: “Data engineers often spend significant time manually exploring data sets, establishing relationships among them, and joining them together. They also frequently must refine data from its natural, unstructured state into a structured form using manual and bespoke processes that are time-consuming, not scalable and error prone.”

Ditch the batch in real time

Real-time analytics comes from continuous and ongoing acts of ingestion, enrichment and querying of data. Powering that process takes a computing and storage architecture capable of delivering sub-millisecond performance – but without hidden costs or creating a spaghetti of code.

This is where we see the most advanced stream processing engines will employ memory-first integrated fast storage. This approach swaps stop-go processing for continuous flow with the added plus of a computational model that can crunch analytics in the moment.

Such engines combine storage, data processing and a query engine. Data is loaded into memory, it is cleaned, joined with historic data and aggregated continuously – no batch. Second, by sharing the random-access memory of groups of servers combined with fast SSD (or NVMe) storage to continuously process and then store data that’s being fed into their collective data pool. Processing is conducted in parallel to drive sub-millisecond responses with millions of complex transactions performed per second.

It’s vital, too, to empower your people. Your team needs a language for writing sophisticated queries. Your continuous platform should, therefore, be a first-class citizen of streaming SQL.

SQL is a widely used and familiar data query language. Bringing it to streaming simply opens the door to everyday business developers who would rather not have to learn a language like Java. Streaming SQL doubles down on the idea of continuous: results to queries written using streaming SQL will be returned as needed – not after a batch job. Streaming SQL lets teams filter, join and query different data sources at speed of the stream – not after the fact.

We’re seeing a renaissance in streaming technologies, with more choices than ever for data infrastructures. But, as more organizations take their operations real time, it’s vital that the analytics they’ll come to depend upon can deliver the insight they’ll want, the moment it’s needed. That will mean streaming built on a foundation of continuous processing – not blocks of batch.

To hear more about cloud native topics, join the Cloud Native Computing Foundation and the cloud native community at KubeCon + CloudNativeCon North America 2022 in Detroit (and virtual) from October 24-28.

Article Tags

5g, cloud, IoT

About John DesJardins

John DesJardins is chief solutions officer at Hazelcast.

View all posts by John DesJardins

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_WTGVKVXEZJ	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_107693958_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
_heatmaps_g2g_101137905	10 minutes	No description
cf_7167_id	20 years	No description
cf_7167_person_last_update	session	No description
GoogleAdServingTest	session	No description
prism_252377639	1 month	No description
querylyvid	3 months	No description
xtc	1 year 1 month	No description

Beyond batch: Real-time business is a continuous event

Article Tags

Subscribe to SDTimes

About John DesJardins

Related Articles

Four trends reshaping Kubernetes platform engineering

How real-time AI-driven automation is reshaping IoT data strategies

Kentik offers free Cloud Latency Map to provide insights into cloud latency around the world

Cloud Native Computing Foundation announces KubeEdge graduation