The growing complexity of Kubernetes use – as organizations expand Kubernetes across multiple teams and expand clusters – makes it hard to enforce standardization. Without it, you risk wasting cloud resources and introducing risk.

These issues are examined in a very detailed report on Kubernetes by Fairwinds, provider of a cost optimization and policy enforcement platform, which found that organizations struggle with gaining insights into what’s happening in their clusters. The Kubernetes Benchmark Report 2024 found improved efficiency and reliability over past years, but changes are still needed.

Kubernetes adoption continues to grow, and as it does users are becoming more adept at managing the complexities of Kubernetes workload configuration. Joe Pelletier, VP of product at Fairwinds, said of the report: “By tracking the data over several years, we’ve been able to track significant improvements over time, particularly for organizations leveraging open source and proprietary software to identify Kubernetes misconfigurations and get actionable results to help them make improvements in terms of cost efficiency, reliability, and security.”

The biggest surprise in this year’s benchmark report, Pelletier said, is that “there are two types of organizations: those that have processes and controls around configuration, such as right-sizing their workloads, and those that do not.”

Among the key findings from examining 330,000 workloads in more than 100 organizations are that 37% of organizations were in need of right-sizing their containers, and that 65% of them “are missing liveness and/or readiness probes and many organizations still rely on cached images,” according to the report.

Right-sizing, the report noted, can be used to increase resources to improve reliability, or to lower resources to improve utilization and efficiency. In the past, the survey noted, data on right-sizing was broken out into CPU and Memory utilization. But now, by asking the simple question, “Does the container need to be right-sized or not,” the report found that 57% of organizations have 10% or fewer workloads the require right-sizing, but also found that 30% of organizations had half or more of their workloads in need of review.

Probes for liveness and readiness can determine whether or not a Kubernetes working is functioning as it was meant to. When a liveness probe detects a service is failing, Kubernetes will automatically signal for a container restart to restore the service. But if a container does not have the probe, “a faulty or non-functioning pod will continue to run indefinitely,” the report said, which uses up resources and could cause application errors.

Of organizations reporting the state of these problems, 69% have 11-50% of their workloads missing liveness probes, and 66% have 11-50% of their workloads missing readiness probes.

Further, the findings related to cached versions of Docker images indicate both reliability and security issues. According to the report, “By default, an image will be pulled if it isn’t already cached on the node attempting to run it. Using a cached version can cause variations in images that are running per node, or potentially introduce a security vulnerability because Kubernetes will attempt to use the cached version of an image without verifying where it came from.”

As it pertains to using cached images, 24% of organizations rely on them for more than 90% of their workloads. That can impact the reliability of the applications, Fairwinds wrote.

The report goes into greater detail about missing replicas, missing CPU limits, container security, and much more. The full report can be found here