Will Kubernetes Become the Standard for AI Workloads?

Kubernetes for Inference Processing

“Open AI and ChatGPT is probably one of the fastest growing services of all time,” said Chris Aniszczyk, CNCF CTO. “And they were able to scale that using Kubernetes for a lot of the inference-based workloads.”

OpenAI in fact published two case studies on the CNCF website publicly discussing their use of Kubernetes for key workloads.

“A lot of classic LLM training is done on customized bare metal, Slurm, and PyTorch,” Aniszczyk continued. “This is the classic HPC ecosystem. But a lot of people are using Kubernetes more and more for inference, which I think it’s extremely well suited for.”

To support inference processing standardization, Red Hat is contributing llm-d to CNCF, which is the inference engine developed by Neural Magic, a company Red Hat acquired last year.

“AI model training was developed largely by data scientists building their own specialized infrastructure,” said Brian Stevens, Red Hat SVP and CTO of AI. “The in-production scaling and operation of inference, however, is now becoming a CIO problem, and the language CIOs speak is Kubernetes.”

“Standard Kubernetes orchestration wasn’t designed for the highly stateful and dynamic demands of LLM inference,” Stevens continued. The llm-d project provides the architectural layer needed to treat LLMs like any other scalable microservice.”

Open Source Will Solve the Hardest AI Problems

During her keynote presentation, Erin Boyd, Sr. Director at NVIDIA, confirmed NVIDIA’s support the goal of establishing Kubernetes as the standard platform for AI applications.

“The future of AI is community driven and open,” she said. To that end, NVIDIA is donating the NVIDIA GPU driver, the KAI Scheduler, and the AI Cluster Runtime (AICR), and Dynamo to CNCF and pledging $4M in GPU hardware for development and testing.

Boyd said NVIDIA verified their configurations against the Kubernetes AI conformance test suite for inference that was announced at the November, 2025 KubeCon as a key driver of the AI standardization process, just as the Kubernetes conformance program was for establishing the Kubernetes standard.

Boyd noted that over the past ten years, Kubernetes evolved from being just infrastructure to become the de facto programmable control plane for modern distributed infrastructure.

She sees AI workloads on GPUs following a similar path to standardization. “Because the hardest problems ahead are not just model problems, they’re infrastructure problems, scaling problems, interoperability problems, trust and transparency problems, and no single company can solve those alone.”

The open source community is “what made Kubernetes the foundation of modern infrastructure,” she added, “And it’s what will make AI the foundation of the next generation of compute.”

The Inference “Gold Rush”

During the KubeCon press conference, Jonathan Bryce, Executive Director of Cloud and Infrastructure, Linux Foundation, spoke about the “inference gold rush” underway.

Slide for press conference by CNCF

A CipherTalk report from February projects an almost 20% growth in the inference market year over year, resulting in a total market of $225B b y 2030, up from $106B in 2025.

Perhaps even more significantly, the report predicts that inference will represent 67% of all AI compute in 2026, up from 23% in 2023. They are seeing the valuations of inference companies skyrocket as a result, such as Baseten at $55B and Fireworks at $4B.

While 66% of gen AI workloads currently run on Kubernetes, he said, the Foundation’s market analysis estimates that the global AI economy could save $20–$48 billion per year by switching to open models.

In other words, the report says, without open models, consumers would spend between $350 million to $1.23 billion more than they currently are on LLM inference. The analysis provides a clear financial incentive for investing in open source for AI inference processing.

“Standards help organizations get the most out of their AI data,” Bryce added.

The Future of Agentic Workloads

CNCF identifies the third type of AI workload as agentic. To standardize AI agents, the Linux Foundation launched the Agentic AI Foundation (AAIF) in December with founding contributions from Anthropic (Model Context Protocol or MCP), Block (goose), and OpenAI (AGENTS.md).

“Agents are at a different layer of the stack,” said Aniszczyk. “They’re most likely going to be running on Kubernetes infrastructure, but how agents talk to each other, how they work the protocols such as MCP – we just consider that above the Kubernetes layer.”

The AAIF now has 170 members and sponsors the global MCP Dev Summit (upcoming in New York April 2-3) for open source collaboration on the future enhancement of the MCP. Enhancement areas include identity, trust, privacy, observability, and security,

“The AAIF is also looking at things such as ecommerce, trying to figure out how to get agents to buy things,” Aniszczyk added. “All of this will most likely happen at the AAIF layer, but agents have to run on something and be operationalized, and that’s where CNCF comes in.”

Photo of exhibit floor by CNCF

Graph Context for Inference

Two database vendors offer products to store and submit data graphs for context that improves the results of inference processing.

Neo4J is an established graph database vendor, and a pioneer in the space. Bur “AI has opened up a whole new set of use cases for graph technology,” said Stephen Chin, VP of DevRel at Neo4J.

For example, Neo4j stores the output of Microsoft GraphRAG solution to rapidly and accurately generate natural language summaries of entities and relationships found in the knowledge graph, Chin said.

And for another use case, Neo4J stores vectors in its databases. “Using a combination of vectors and graphs, we were able to show a dramatic performance in terms of the accuracy of the results,” Chin added. This knowledge graph approach overcomes performance and accuracy limitations of pure vector databases for inference processing, he said.

SurrealDB is an open source, multimodal database that combines graph, relational, vector, time series, and key-value modes into a single platform.

“Instead of using a specialist database, you can use one API and query across all different modes,” said Tobie Morgan Hitchcock, CEO and Founder. “And you can go back in time with the queries as well.”

Because the biggest challenge with gen AI is accuracy, you can “get the benefit of combining data models, reduce infrastructure cost, and dynamically generate knowledge graphs” to improve inference processing results, Hitchcock added.

SurrealDB is designed to accommodate constantly changing data and work at petabyte scale, Hitchcock said. “The knowledge graphs give you the ability to improve relevance,” he added.

Observability for AI

Observability vendors are adapting their products to AI, especially to agents which require additional tracking and logging for the actions they take, the decisions they make, and the API calls they make (including a chain of calls).

Sawmills for example offers a low cost, high scale telemetry solution for AI agents

“The problem today is too much data,” said Ronit Belson, CEO and Co-Founder. “Think about tomorrow – and maybe we are already there – when AI agents are writing the code and there’s so much more telemetry data.”

And a second problem is the quality of the data, Belson added. “On the one hand, coding agents are going to generate much more telemetry data, and on the other hand, they don’t care about the quality of the data.”

“We look at production data and create a feedback look that tells the coding agents how to do telemetry correctly so that the data quality is higher and more useful,” Belson added.

Chronoshphere, now part of Palo Alto, is designed and built for Kubernetes observability. They are adapting, as Kubernetes is, to support AI workload for inference and agentic workloads, said Martin Mao, Chronosphere co-founder and now SVP, GM of Observability at Palo Alto Networks.

Mao sees Chronosphere as complementary to Palo Alto’s platform. Training workloads are diminishing as the market shifts to inference and agentic workloads, he said. “In addition to observability tailored for those workloads, you also need an ID management system and security for agents. Palo Alto can now provide it all.”

Identity for Agents

Identity management is another area under adaptation for gen AI. Agents need IDs the way APIs need IDs but also need privileges and secrets.

Akeyless, for example, provides a one stop shop for secrets management, said Shahar Inbar, Akeyless VP of Sales.

“We provide a secured identity to either a machine or to a human,” Inbar said. “And nowadays, machine identities in the organization are growing around a hundred times more quickly than human identities, especially because of agentic AI.”

Akeyless offers a solution to secure the connection with an AI agent between the MCP and database, Inbar added. They are also offering a Privileged Access Management (PAM) solution on top of it, which also supports agentic AI.

The Intellyx Take

The Linux Foundation is embarking on a significant effort to standardize gen AI through open source collaboration and reduce costs, focusing especially on the inference market though CNCF and the agentic market through AAIF.

Success in standardization always hinges on adoption, which is what the conformance program did for Kubernetes.

It will take time to discover whether the Linux Foundation will be as successful in establishing Kubernetes as the standard deployment environment for inference workloads and whether the AAIF can also succeed in standardizing the agentic workload level above Kubernetes.

But it seems like they are off to a good start. And they have the opportunity to leverage the extraordinary success of Kubernetes and the CNCF, as evidenced in no small part by the increasing attendance numbers at KubeCon.

Chronosphere is a former Intellyx customer. All images provided to press and analysts by CNCF.

Article Tags

agents, AI, CNCF, identity, KubeCon, observability

About Eric Newcomer

Eric Newcomer is principal analyst and chief technology officer at Intellyx.

View all posts by Eric Newcomer

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_WTGVKVXEZJ	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_107693958_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
_heatmaps_g2g_101137905	10 minutes	No description
cf_7167_id	20 years	No description
cf_7167_person_last_update	session	No description
GoogleAdServingTest	session	No description
prism_252377639	1 month	No description
querylyvid	3 months	No description
xtc	1 year 1 month	No description

Will Kubernetes Become the Standard for AI Workloads?

Kubernetes for Inference Processing

Open Source Will Solve the Hardest AI Problems

The Inference “Gold Rush”

The Future of Agentic Workloads

Graph Context for Inference

Observability for AI

Identity for Agents

The Intellyx Take

Article Tags

Subscribe to SDTimes

About Eric Newcomer

Related Articles

News Roundup: May 22, 2026 — Honeycomb, Forward, Automation Anywhere

Why General-Purpose Agentic AI Breaks Down in Cloud-Native Infrastructure

groundcover Expands AI Observability to Support Agentic Workflows in Google Cloud

Credential Management: The Hidden Production Bottleneck for Agentic AI on Kubernetes