During the past decade, companies have been building significant computing workloads on public and private cloud infrastructure or shifting workloads to the cloud. Gartner predicts that spending on public cloud services by end-users worldwide will reach $591 billion in 2023, a 43% increase from 2021. That’s a significant jump and suggests that many companies optimized their cloud migrations for speed and that managing costs and operational efficiency were likely secondary concerns.
Although Gartner forecasts worldwide IT spending to grow 2.4% in 2023, some analysts are cutting back their predictions, and many IT leaders are planning spending adjustments.
The initial race to build cloud capabilities is pivoting toward managing costs, optimizing infrastructure, and automating more operations. Following up on my recent article on seven ways to reduce costs with agile and devops, here are five recommendations of how IT teams can optimize their cloud stacks to reduce costs and improve operational efficiencies.
“Infrastructure has passed a complexity point at which manually deploying infrastructure and applications is an antipattern,” says Marko Anastasov, cofounder of Semaphore CI/CD. “Use infrastructure as code (IaC) tools like Terraform to set up your cloud infrastructure.”
Other IaC platforms and tools include AWS CloudFormation, Azure Arm Templates, Red Hat Ansible, Progress Chef, Puppet, and Kubernetes. These platforms enable setting infrastructure standards (sometimes called patterns or templates) and then using code to manage the configuration and deployment. IaC eliminates the manual steps to build, configure, and deploy cloud infrastructure, including networks, computing, storage, and services.
Anastasov says, “Automation is the key to reducing costs and improving reliability. Using IaC increases visibility on what services you’re running in the cloud and lets you run automated cost analysis tools.”
My take: Using IaC is an important step, but organizations looking for efficiencies should standardize cloud architectures and reusable IaC patterns. There is a trade-off between providing devops teams full infrastructure flexibility versus gaining efficiencies from standardizing cloud stacks and infrastructures. But IT teams using IaC and automation can increase the number of supported infrastructure patterns.
I covered CI/CD (continuous integration and condinuous deployment), continuous testing, and other devops practices earlier. These are expected practices when developing cloud-native applications. Agile teams should also address these security risks in software development and increase devops observability. Consider these key devops practices for all applications.
What should devops teams do beyond these basics when developing apps and microservices where high usage is expected and consistent performance is a key requirement?
Arjun Chandar, CEO at IndustrialML, answers, “When designing a new cloud tech stack with features meant to be distributed across large numbers of clients, making design choices to improve concurrency is a great way to improve your customers’ experience. Using languages and frameworks suited to concurrency will reduce your headaches as you scale.”
My take: When developing new apps and services, the product manager and agile teams need to review what nonfunctional criteria are priorities. For some apps, it’s scalability and performance. For others, it might be reliability, flexibility, or meeting compliance requirements. Teams that recognize these priorities up front are better equipped to debate trade-offs when designing the architecture and developing the code.
More organizations are shifting from desktops and laptops to virtual desktop infrastructures (VDIs) running on the cloud. One market study values the VDI market at $16 billion with a compound annual growth rate (CAGR) of more than 20% through 2023.
“Enterprises are modernizing end-user computing with cloud PCs, a valuable addition to cloud computing strategies that deliver greater agility in uncertain times,” says Matthew Davidson, field CTO at Workspot. “With cloud capabilities and costs varying among hyperscalers, enterprises benefit from deploying cloud PCs across multiple cloud regions and clouds, enabling cost optimization by use case, an important innovation when budgets are tight.”
My take: Many organizations shifted to VDIs during the pandemic, and many rubber-stamped one-size-fits-all configurations. Although this solved an urgent problem and is an efficient way to manage IT resources, it may have delivered a poor user experience, especially for employees with higher-than-average computing needs. IT may find more holistic efficiencies by studying the impact of VDI technologies on employee productivity, identifying usage personas, and creating VDI deployment patterns optimized by persona.
Getting more workloads to the cloud is only the first step of a modernization journey. Providing an efficient and responsive Day 2 model to ensure reliable, efficient, and high-performing cloud stacks and workflows requires IT teams to improve operations iteratively.
Ming Gong, vice president of product at Blameless, recommends improving efficiencies with incident management practices. “We find a poorly defined incident management process to be both a hindrance to productivity and an obstacle to innovation,” he says. “Optimizing your incident management process to remove toil and reduce ambiguity will go a long way towards improving your IT ops efficiency.”
Incidents, outages, and poorly performing systems create downstream impacts that can be easy to measure in e-commerce and customer-facing systems but harder to quantify for many departmental workflow and operational systems. AIops platforms can help incident management teams reduce the mean time to resolve incidents and manage their service-level objectives. These are two best practices for reducing the cost and productivity impacts of incidents.
IT ops teams deploy monitoring tools, observability practices, and AIops to cloud stacks, but monitoring virtual desktops and the user experience is also needed. Davidson says, “Companies should look for VDI solutions that offer comprehensive, global observability for their cloud PCs across public clouds in a single view. This powerful capability empowers IT teams to provide the highest reliability and availability for maximum productivity.”
My take: I believe that you can’t improve what you don’t measure, an idea often attributed to renowned management consultant Peter Drucker. Whether you’re trying to reduce costs, manage more cloud workflows, improve experiences, or increase reliability, I recommend prioritizing observability, monitoring, and AIops at the forefront of your Day 2 models.
“In an economic downturn, enterprises should look at their existing tech stack and evaluate which IT initiatives can make the biggest impact with the smallest lift,” says Clear Skye CEO John Milburn.
Dan Ortma, global finops practice director at SoftwareOne, adds, “Recessionary fears and an overall priority on spend optimization are driving the growth of finops, a cloud financial management practice that brings together IT, finance, engineering, product developers, IT asset management, leadership, and others to align on cloud usage and spending goals.”
IT leaders know that rapidly innovating and deploying reliable applications requires a partnership between IT and financial disciplines. Putting financials before IT can lead to slow project plans and underfunded Day 2 operating models, often a recipe for accelerating technical debt. Instrumenting IT without financial disciplines can lead to inefficiencies and systems that underdeliver business impacts. Cloud finops is one approach to helping engineering, finance, technology, and business teams collaborate on data-driven spending decisions.
IT leaders should develop an architecture strategy that promotes developing platforms and reusing capabilities. Milburn suggests, “See what features or solutions exist within your platform to make the most out of your current investment. Not only does this save money, but it also reduces complications with new tech implementations.”
Anastasov shares this AI example. “Running AI workloads is expensive as it needs powerful GPU hardware. Say your application goes viral. That’s great until you get a gigantic bill at the end of the month that you can’t pay,” he says. He suggests that IT teams “only release a feature after having done a comprehensive cost analysis.”
My take: IT teams should prioritize experimentation and manage innovation pipelines for developing new products, improving experiences, and building data-driven practices. Then, institute financial disciplines while planning pilots and production use cases, which helps reveal cost and efficiency considerations during development phases. For systems already in production, seeking costs and operational improvements is one way to fund tech debt reductions.