There are many reasons companies are growing their investments in the public cloud, from needing to innovate faster and deliver better service to customers or to support DevOps workflows and modern applications. Yet cost savings is still a driving incentive for many CIOs.  The metrics are undeniable: reports vary from an automatic savings of 10 percent to 30 percent or more on infrastructure by avoiding capital costs, labor-intensive maintenance, data center facility and energy costs, and maximizing hardware utilization.

Unfortunately, it’s also easy to overspend in the cloud if you aren’t closely monitoring usage.  And because of the dynamic, distributed nature of IaaS, IT must continually monitor cloud resources to avoid issues affecting response time, security and availability. You can push a button to install a new server, but you can’t forget about it. In fact, IT spending and cost overruns ranked as the second greatest concern for companies running in the cloud, after security, according to the NetEnrich 2019 Cloud Adoption survey.

IT executives worry about cloud-related costs because their people are spending an inordinate amount of time on regular maintenance tasks, such as patching systems, according to 36% of survey participants.  Nearly half also report that it’s expensive to recruit and hire staff with the proper cloud skills to design and manage these new environments. Cloud sprawl is an ongoing concern – resulting in big bills from hidden or underutilized instances.

The good news is that AI-enhanced tools and services will one day soon take care of much of this grunt work – fixing problems automatically and optimizing virtual machines for the best performance and price.  Azure Autoscale and AWS Auto Scaling are popular examples of these new self-healing technologies.

On the other hand, as new applications, technologies and user needs demand more from cloud infrastructure, management and monitoring needs will become even more complex. Before diving in too deep in the cloud, CIOs and other IT leaders should prepare to mitigate the added costs they’ll likely encounter.

The maintenance drain
Cloud providers try to convince you that moving to the cloud will make your life exponentially easier as an IT manager while also saving bundles. True, you no longer need to worry about procuring and managing hardware and bandwidth. You can create a new environment within minutes.  But when it comes to application performance, security, and reliability – that’s still on you. Your team still needs to do all the patching of the virtual machines, monitor storage usage for cost and performance, monitor events and traffic patterns, manage alerts, troubleshoot and remediate issues. Is your application scaling properly and meeting SLAs? Is your infrastructure optimized for cost? There are plenty of tools to help in these efforts, both from the cloud provider and from the software community, but not everything is automated and your people need to learn how to use these technologies effectively.

Tools: the good and the bad
As the cloud providers offer new resources and services, the IT department needs to stay up on them.  These tools can help with security, SLAs and cost management, but they have their own challenges. For instance, let’s take Amazon EC2 Reserve Instances. The service allows a company to purchase up to two years of VMs upfront, at a heavily discounted price. But if you are using a load-balancing service, you might find that sometimes your workloads are being transferred to more expensive VMs, not your reserve instances. This detail might escape notice on a bill but will leave CIOs scratching their heads as to why they’re spending so much more.

Invariably, IT departments will need to acquire and manage a lot of tools in the coming years to effectively run systems in the cloud – and that gets more complicated if you are using multiple cloud platforms. Most companies will need a mix of tools offered by the cloud provider, the open source industry and the commercial vendor marketplace.  There is simply no one-size-fits-all toolset and the cloud providers are generally behind the curve when it comes to delivering advanced monitoring and management technologies. Tool complexity is just a reality – at least for now.

Yet automation is what is making or breaking cloud success. It’s getting easier to create an environment specific to your needs;  you can put certain expensive VMs on “snooze” outside of peak hours, for instance. This leads to the next challenge, which is finding and keeping new talent with cloud expertise.

Nurture people
You don’t need to read the latest survey to know that IT talent, particularly people with skills in cloud architecture and DevOps, is hard to find. The big guns – Google, Amazon, Microsoft, Netflix, Facebook, Salesforce.com and so on – have had an edge on snapping up the hot engineers.

That doesn’t mean a small or midsize company that’s not on the radar can’t attract great people.  But it’s important to retain the talent you do have and offer the best training and growth opportunities you can to your staff.  Developers and engineers love a challenge at work. They enjoy creativity and solving problems and learning new technologies. Consider how you can empower technical staff to make a difference in the company and work directly with business stakeholders. That is the goal after all – IT should inevitably help the business perform better. Great IT services help employees work smarter and can engender greater customer loyalty as well.

Cloud infrastructure has far more benefits than negatives for companies in every industry. To keep overhead down and maximize business benefits of moving to the cloud, focus on these core tenets:

  • Plan your cloud journey far before the migration, gaining a thorough understanding of not only migration costs but expected operations costs. Analyze predicted resources you’ll run in the cloud, and then map out the maintenance tasks needed to manage that according to your budget and SLAs.
  • Automate deployment and monitoring using native cloud tools. Where you can’t automate, create knowledge articles so that staff can work faster when troubleshooting and remediating issues.
  • Constantly optimize and refine strategies using cloud best practices. Tools like Azure Advisor can help guide the way.
  • Hire and retrain staff for skills in DevOps, Agile and continuous integration/development (CI/CD): these are critical capabilities for running IT in the cloud.