In order to bring more effective operational practices, DevOps and site reliability engineering (SRE) teams need to go through a culture change within the organization. Red Hat held its virtual summit this week where it talked about how to reinvent IT Ops as SRE.
According to the company, change can happen by automating processes and balancing the demands of speedy production vs. predictable reliability. In addition, the operational teams must change from humans servicing requests, to developers writing automation that services requests.
The role of the SREs in an organization is to keep them focused on making sure that the platforms and services being built are consistently reliable for customers, rather than focusing on speeding up production cycles and pushing out as many features as possible, Red Hat explained.
RELATED CONTENT: Transitioning to SRE
The problem, however, is that multiple IT business processes—patching and provisioning servers, provisioning IPs, subnets, or firewall rules—still require a manual governance process around the technical work.
“DevOps teams have mostly evolved to use automation across the board while Ops teams still operate the traditional way with lots of tickets as well as lack of communication between Ops and Dev,” said Arash Dadras, a manager of the ANZ Open Innovation Lab and Architecture Practice at Red Hat Asia Pacific, who spoke at the summit.
One major change today is that infrastructure delivery is thought of more in terms of software delivery because of the move towards infrastructure as code, according to Dadras.
“Here the infrastructure team has to be behaving more and more like a dev team,” Dadras said.
However, the lack of automation is only part of the problem.
In the 2019 State of DevOps report, the elite performers were organizations where the leadership emphasized the autonomy of teams and encouraged an environment of learning.
“The bottom line is that the difference between elite performers and low performers is less about the technology, but rather about the people and the environment that they work in,” Dadras said.
The new model on solving the problem is no longer focused on changing what people believe in through establishing principles. It is about putting people in an environment where things get done differently, which will ultimately result in a culture change.
The Open Practice Library — a community-driven repository of practices and tools — is an example of how teams can discover and deliver experiments while building the team culture, according to Dadras.
“SRE is all about the balance between stability versus velocity and release, and at the end of the day needs to be measured by KPIs, SLIs and availability,” said Dadras. “If you don’t measure it, you can’t really know.”
During a session at the 2020 Red Hat Summit, the ANZ Bank network team revealed that they reinvented themselves as an Agile DevOps and SRE team by working together with the ANZ Open Innovations Labs practice to introduce some of the modern operational approaches.
“Before, we had a team that worked long and hard hours and constantly needed to go faster with a small team and a big network to look after. On top of that, we didn’t have a clear line of sight of what was keeping our teams busy,” said David Wasley, the Technology Area Lead of ANZ New Zealand. “Also, as a bank, we had the usual compliance and regulatory frameworks and controls that we had to manage with manual processes.”
With Open Labs, Agile practices were introduced. Sprints were timeboxed and stakeholders were shown working code each demo.
The metrics based process mapping (MBPM) allowed the teams to take an existing operational process — such as DNS provisioning — X-ray it, alter it where necessary and to automate its parts. According to Wasley, the process of DNS provisioning at ANZ Bank whittled down from 6 days to 5 minutes.
Alongside training in the dev aspects, the engagement lead remained focused on building a culture of empathy and actively designing a safe and powerful environment.
“A lot of that was not just automation, but actually looking at how we were engaging with other parts of the organization,” Wasley said.
Later on the teams took the practices they learned and automated compliance tasks even earlier than was originally planned.
“The real change was probably about 20% the technology, whether that’s patterns, designs, or scripts, while 80% was a change around the people and the culture,” Wasley said.