Is Your Data Bathtub About to Overflow?

Published: September 2nd, 2022

- Ozan Unlu

Over the past seven years, we’ve seen Kubernetes become the de facto platform for building modern applications. With this shift, application architectures have become increasingly distributed, dynamic, and modular. As a byproduct, logging data has exploded – depending on the company, anywhere from one terabyte to multiple petabytes of data can be generated each day.

While more data isn’t inherently bad, teams simply don’t have the tools or the budget to process and make sense of all this data. It’s like a bathtub with a never-ending supply of water, but only so much capacity to make use of it. So, DevOps and SRE teams are forced to make predictions and decisions about which data is “important” and worth analyzing. The rest gets “drained” into archives or a less active storage tier where the team can save costs at the expense of real-time visibility and analytics.

Again, no two organizations are the same, but it’s common for teams to neglect as much as 80% of their logging data.

The impact of neglecting observability datasets

This practice of dropping datasets presents shortcomings to DevOps and SRE teams, both from an operational standpoint and a long-term viability perspective.

First, in terms of operations, teams are forced to undertake a lot of manual labor to get valuable insights from their data.

They must understand their datasets at an intimate level to index the “right” things.
They likely have to structure their logs in a very purposeful and intentional way to make sense of the data.
They have to configure and refine logic at granular levels to monitor the behaviors they care about (to the best of their ability).

All of these operations take time, which is unfortunate since DevOps teams are already overtaxed: according to a recent survey, 83% of DevOps practitioners reported experiencing burnout.

When an issue does occur, teams must look for the proverbial “needle in a haystack” of logging data to resolve it, adding hours or days to the workflow.

Second, from a long-term perspective, data growth is not slowing down. Let’s say you’re generating one terabyte of data each day now, and only able to analyze 200 gigabytes. What happens in three years when you’re generating more data?

Borrowing from my previous analogy, the water source might get bigger but your bathtub is likely to stay the same size.

Both of these challenges are going to be more extreme for teams that are new to Kubernetes. That’s because there is a sudden spike in data, plus many new layers or tiers of resources (clusters, containers, pods, etc.). To monitor the “right” datasets – and confidently drop the wrong datasets – teams need to decipher this new environment, which can be challenging to do.

A new approach: Analyzing data at the source

DevOps and SRE teams can help their organizations solve these challenges, but it calls for a different approach to observability. One that allows them to analyze 100% of their data at any scale, without neglecting critical parts of it.

Currently, most observability pipeline vendors are pushing the choice of what data to index and analyze upstream. Instead, teams can push their analytics upstream to the data source, unlocking visibility into complete datasets without pushing the limits of their existing platforms. Thus, streaming the outputs to their observability platform of choice, saving their engineers hours in manual operations.

To flip observability on its head like this, teams need a few core capabilities. It all starts with a deployment methodology that is as distributed as their Kubernetes environment. From there, teams need:

Stream processing of data at the source, versus batch processing in a central platform
Federated machine learning to analyze datasets
Intuitive visualizations to communicate service behavior to each stakeholder

Kubernetes environments have characteristics that make them advantageous for building modern applications – they’re scalable, modular, and dynamic. These same characteristics create huge swarms of data that can be difficult to keep up with using traditional observability tools alone.

However, data explosion doesn’t have to be an Achilles heel for your DevOps or SRE team. By pushing analytics upstream to the data source, teams can understand the behavior of their applications and services before they index any data. This new approach to observability allows teams to gain better visibility today, and also stay ahead of exponential data growth in the years to come.

Article Tags

devops, Edge Delta, SRE

About Ozan Unlu

Ozan Unlu is CEO at Edge Delta.

View all posts by Ozan Unlu

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_WTGVKVXEZJ	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_107693958_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
_heatmaps_g2g_101137905	10 minutes	No description
cf_7167_id	20 years	No description
cf_7167_person_last_update	session	No description
GoogleAdServingTest	session	No description
prism_252377639	1 month	No description
querylyvid	3 months	No description
xtc	1 year 1 month	No description

Is Your Data Bathtub About to Overflow?

The impact of neglecting observability datasets

A new approach: Analyzing data at the source

Article Tags

Subscribe to SDTimes

About Ozan Unlu

Related Articles

Azure Managed DevOps Pools help teams quickly spin up managed infrastructure

Report: Gap between FinOps and DevOps/engineering teams widens

Octopus Deploy releases features to simplify Kubernetes continuous delivery

ITOps Times Open-Source Project of the Week: Copa