Kubernetes has matured to the point where users need to not only consider how to get Kubernetes clusters spun up and deployed, but how to maintain and scale them for the long term. 

Kubernetes was the first project to graduate from the Cloud Native Computing Foundation (CNCF), and according to its Project Journey Report, which was released last August, the project has grown from having just 15 contributing companies to over 350 active contributors, which is about a 2,233% increase. Since its inception, over 2,000 companies have contributed code to the project at some point. Moving down to the individual level, Kubernetes has grown from 20 contributing developers at its start to 400 at the time when it was accepted into the CNCF to more than 3,000 at the time of this report — nearly a 15,000% increase. 

Kubernetes has moved past Day 0 and Day 1, and is now in the Day 2 phase for most companies. According to Tobi Knaup, co-CEO and co-founder of cloud-native management company D2IQ, Day 0 is the design and proof of concept phase, Day 1 is the installation and deployment phase, and Day 2 is when things like monitoring, maintenance and troubleshooting come into play. Day 2 is also when an application moves from just a development project to an actual strategic advantage for the business. 

According to Knaup, the process for moving to Day 2 Kubernetes operations is similar to the process for moving any technology out of a lab and into mainstream production. Organizations will need to ensure things are being built in a secure way, that they meet compliance requirements if they’re in a regulated industry, and that things can be done in a repeatable way. Monitoring at scale is also needed. 

Kubernetes is an open-source technology, and according to Kamesh Pemmaraju, author of Platform9’s report on key takeaways from Kubernetes adoption, this aspect can make it tricky for IT administrators to deal with. “Typically traditional IT teams are more used to working with the larger vendors’ more proprietary tools,” said Pemmaraju. For example, an IT administrator that might have been used to a new version of VMware every 12 to 18 months is now having to deal with new versions of Kubernetes every two or three months. 

Having a repeatable way of upgrading can help IT administrators overcome the challenges of dealing with such a fast release schedule compared to what they’re used to. 

Observability is also important because of all of the technologies interacting with Kubernetes. When you’re running Kubernetes clusters, it’s often being deployed alongside a number of other technologies. According to Knaup, you need to be able to have live telemetry data on every part of the system and be able to debug and diagnose problems, and find their root cause. “Those are all concerns that actually Kubernetes itself doesn’t solve,” said Knaup. “So you have to assemble an entire stack of other open-source technologies in the cloud-native ecosystem, to build, for instance an observability stack, or to build a strong security story.” 

There are a number of tools that can help with Kubernetes observability, such as Prometheus, Jaeger, or Fluentd, just to name a few. Pemmaraju recommends IT administrators not only get training in Kubernetes, but also become familiar with what’s happening in the Kubernetes ecosystem. “It’s not just about Kubernetes, but the services around it, whether that’s networking, storage, monitoring, alerting. All of those are things you have to get up to speed with,” said Pemmaraju. 

Another consideration in Day 2 is scalability. When companies first get started with Kubernetes, they might have a few clusters running. But according to Knaup, Kubernetes use can spread quickly throughout an organization after those first few projects are deployed, so having the ability to scale is important. 

Often, Kubernetes gets adopted from the bottom up, meaning that teams adopt Kubernetes separately. Eventually, organizations have to consolidate all of that in a consistent way. 

Pemmaraju added that there are a few different dimensions to scalability to take into account. IT administrators can think about scalability in terms of how big a single cluster becomes, location, number of clusters, or number of physical nodes per cluster, to name a few. For example, a large enterprise likely has developer teams spread across the globe. One consideration that organization would have is whether or not they want their clusters close to where the teams are, or somewhere central. If they’re distributed across the world, they’d have to think about how to manage all of that, and how to apply uniform governance and security to those clusters. 

Another example Pemmaraju gave is about cluster size. A large development team could decide to have a 200-node Kubernetes cluster or 10 clusters of 20 nodes each. 

According to Pemmaraju, the larger the cluster, the more challenging it is to manage. It will be much harder to troubleshoot a 200-node cluster if it fails than to troubleshoot a smaller cluster. Pemmaraju is seeing that a lot of companies are choosing to divide up their clusters into smaller sizes where it’s easier to isolate problems and monitor. 

But if you go too far in breaking up clusters, you might also run into issues in the other direction, Pemmaraju cautioned. “You just have to find the right balance between the size of the cluster and the number of locations. So these are all the considerations you have to think through as you start to scale,” Pemmaraju said.

Kubernetes Operators help with automation

Once a company has addressed these overall concerns, what sort of tooling will help them in Day 2? According to Knaup, operators become essential at this stage. Kubernetes operators are tools that essentially automate the operation of complex Day 2 workloads. More specifically, according to OperatorHub, operators implement and automate common Day 1 activities like installation and configuration and Day 2 activities like re-configuration, updates, backups, failovers, etc. 

OperatorHub is a community-sourced index of operators that are packaged for deployment to Kubernetes clusters. OperatorHub was launched by Red Hat, the creators of the Operator Framework. Amazon, Microsoft, and Google were also in the initial group supporting OperatorHub. 

Red Hat is also the company behind the Operator Framework, which is a toolkit for managing operators.The Operator Framework consists of an SDK for building operators, Operator Lifecycle Management for overseeing installation and updates, and Operator Metering for usage reporting. 

GitOps emerges as powerful methodology for developers to interact with Kubernetes

Another thing Knaup recommends companies look into as they move into Day 2 is GitOps. Because of the current global pandemic, KubeCon EU has been cancelled, but according to Knaup, about 20% of the talks scheduled were about GitOps. 

GitOps is a methodology that was started in 2017 at Weaveworks. According to Weaveworks, GitOps uses “Git as a single source of truth for declarative infrastructure and applications. With Git at the center of your delivery pipelines, developers can make pull requests to accelerate and simplify application deployments and operations tasks to Kubernetes.”

It applies the concepts of CI/CD to Kubernetes, and allows developers to deploy cloud-native apps as quickly as possible, Knaup explained. “It’s really powerful and what’s new about GitOps is it puts more power into developers hands,” said Knaup. “They can literally use Git, the version control system that they’re familiar with, to also deploy their applications.”

Some additional benefits of GitOps, according to Weaveworks, are higher reliability, improved stability, consistency and standardization, and stronger security guarantees.