The Future of Cloud Infrastructure and AI with Sid Rao

The fast pace of artificial intelligence (AI) development is creating voracious demand for cloud computing and storage as companies seek to process vast data training sets, gain high-performance capacity for inference workloads, and scale AI up and down to meet business demands. The public cloud infrastructure as a service (IaaS) segment is projected to grow 24.5% in 2025.

Hyperscalers provide advanced AI capabilities, and vendors are building solutions on top of hyperscale platforms. Enterprises are using large language models (LLMs) and domain-specific small models to enable various use cases, from empowering employees with role-based chatbots to delivering predictive intelligence and enabling agentic AI systems. C-suite leaders envision a future of new AI-powered business models to deliver highly personalized products and services and continuously optimize operations.

To discuss “The Future of Cloud Infrastructure and AI,” host Michelle Dawn Mooney welcomed guest Sid Rao to “The Hitchhiker’s Guide to IT, Device42’s podcast. Rao is the CEO and Co-Founder of Positron Networks. The podcast is available for viewing here.

The Purpose of This BlogThis blog seeks to empower IT operations leaders and teams with new insights into cloud and AI trends so they can evolve their strategies and practices.

Rao has a 25-year background in building software for companies. After a 10-year stint managing cloud infrastructure at AWS, he started a scientific computing startup focused on the infrastructure challenges scientists face as they leverage AI to drive better research outcomes. The company is currently fundraising for pre-seed money.

Host Mooney asked Rao to share his perspective on cloud infrastructure growth trends. Rao said that cloud computing leaders have overemphasized the impact of generative AI on business with enterprises’ drive to create bots and agents. However, Rao said AI has broader impacts, as reflected in organizational spending. Companies are purchasing not just graphical processing unit (GPU)-based services for inference use cases and applications but also grid computing and diverse storage technologies and architectures. As a result, AI is “having a profound impact on the economics, infrastructure, security models, and types of services I see customers deploying in the cloud,” Rao stated.

Using Cloud Rebalancing to Optimize Price and Performance

Enterprises typically pay for single-tenant services with a database and EC2 instances. However, the high cost of AI and GPU services is causing companies to look at multi-tenant solutions to power AI applications and repatriate some cloud workloads to colocation or on-premises data centers. An IDC report in June 2024 found that 80% of survey respondents expected to do some compute and storage workload rebalancing over the coming 12 months.
Rao said that he is seeing an increasing number of hybrid cloud deployments, where companies are combining GPU-based hyperscaler deployments with control planes in the public cloud and data planes in private clouds.

“Historically, cloud computing has been the craze…. However, one of the things I’m noticing is that the trend is now moving a little bit in the opposite direction. I think that has to do with the fact that GPUs and the infrastructure required to power AI are very expensive. And the margin structure with some of these cloud services doesn’t work for leadership in these IT organizations,” said Rao. “A second driver driving this need for on-prem type deployments and a hybrid cloud approach is storage. Leaders need a large amount of storage and care about data sovereignty and security. And the cost of storage over time in a cloud environment becomes challenging.”

How AI Is Optimizing Cloud Performance and Innovation

Rao stated that AI has three main impacts on cloud adoption.

Driving new business value: Rao predicted that the cost of developing software applications would decrease by 3X-4X over the next two to three years. Developers will use AI tools and agents to accelerate the time it takes to develop and deploy applications, increasing team productivity and reducing costs.

“I can go to ChatGPT and say, ‘Hey, please deploy this React TypeScript application into my AWS account,’ and it will efficiently generate the instructions required to do that,” said Rao.

Creating data privacy and security concerns: “The entire security and data model for AI, especially in a multi-tenant cloud environment, is a topic of intense debate,” stated Rao. While working at AWS, he said leaders at a large financial services institution, an AWS client, “believed that the weights in an LLM or GPT-style model were nothing more than a proxy for the customer data.”

The bank struggled to operationalize generative AI capabilities because it had to deploy its models as single-tenant solutions, with a GPU cluster powering each of them, which didn’t scale economically. The company can’t risk using multi-tenant GPU clusters, where customer data might get mixed or shared, causing regulatory violations.

Using AI to power security defenses and attacks: Enterprises are using machine learning models to predict attack paths, identify attackers, and model threats. Meanwhile, malicious actors leverage it to analyze data, perform reconnaissance on targets, enhance phishing campaigns and malware, and automate the execution of attacks. Enterprise leaders acknowledge that there is work to be done: Just 2% said they have implemented cyber resilience actions across all key business areas, a Deloitte survey found.

“I think the real impact that we’re going to see is how to make AI easy for the everyday user to use and create models and for DevOps and for software development engineers to be able to leverage AI within their applications without breaking the bank,” said Rao.

AI Applications that Enhance Cloud Functionality and Security

Host Mooney asked Rao to discuss AI-powered tools enabling engineers to build new applications. Rao mentioned tools such as:

GitHub CoPilot, which enables developers to automate code production and accelerate the development of solutions.

Amazon Q from AWS, which developers use to host cloud infrastructure and deploy CloudFormation stacks within AWS.

These tools and other security and cloud agents make it easy for developers and DevOps engineers to deploy infrastructures and applications at scale to support their user and customer base.

Beyond developing code, agents can automate patching, upgrades, and pipeline deployment. However, these tools need oversight, as hallucinations can negatively impact service-level agreements.

“If your SLA is four nines, a 0.5% hallucination rate will potentially break your service level,” said Rao. “However, if you’re okay with a 1% failure rate, you can use it as a productivity-enhancing service where the process is automated, but you’ve got a human being making a decision.”

Amazon used its Amazon Q GenAI assistant for software development to automate Java upgrades. As a result, the time to upgrade Java updates plummeted from 50 developer days to a few hours, enabling Amazon to upgrade more than 50% of its production Java systems to modern Java systems with less time and cost. In addition, Amazon’s developers could ship 79% of the auto-generated code reviews without making any changes.

Using AI for Resource Management

Rao said that using AI for resource management was a double-edged sword. “AI is going to drive efficient use of IT infrastructure and IT and cloud resources for applications that don’t require AI, but requiring AI will drive up infrastructure utilization with costly GPU-powered resources,” Rao said.

Enterprises are leveraging various models from different providers to accomplish performance and price objectives, complicating the automation of these processes. Rao said that consuming managed services through companies like OpenAI and Anthropic was straightforward because the companies are responsible for the underlying compute clusters. However, not every company wants a third party to host their sensitive data.

Another approach is to host models using LLaMA. However, with this approach, companies must purchase and manage their hardware, including sourcing hard-to-get GPUs. Enterprises may have to purchase reserved instances and reserve GPU capacity to gain access to them. They must also store large data sets in a cloud environment that provides low-latency data access for model training. Typical storage solutions, such as S3, AWS, or blob storage in Oracle, don’t work because they aren’t fast enough.

After the heady days of GenAI experimentation, companies are scrutinizing projected ROI for new initiatives. They want to see that the new efficiency gains of AI-powered applications are worth the price, said Rao.

Common Challenges with Cloud Security and Data Management

Rao said companies are grappling with heightened data sovereignty, privacy, and security issues as they seek to innovate and create the best models as they scale AI capabilities. GPTs can retrieve and present sensitive data to users, circumventing traditional security guardrails and creating compliance violations.

Machine learning engineers often don’t understand the scope of the challenge because they say that models are an array of floating point numbers, states Rao. However, these values result in tokens that map to vector databases containing sensitive information. In addition, models no longer just predict results based on historical data. They’re continuously learning and evolving and are being applied to more business areas. Companies that don’t change their security and threat models risk exposing vital intellectual property.

As a result, companies are applying end-to-end governance over data, ensuring that it provides lineage, provenance, classification, sovereignty, and compliance while ensuring quality and integrity. That means accounting for where training data comes from, how it’s used to fine-tune models, its management and tenancy structure, and where vector databases are stored and used.

Choosing the Right Technology Partners to Engineer AI Adaptability and Success

So, who should enterprises work with? “I believe that in the world of AI, it is way too early to pick a winner and a loser,” said Rao.

He adds that it is tempting for leaders to choose companies like AWS because they have so much market share, run a hackathon, and then announce a collaboration to solve their challenges. “I’ve seen that happen a few times now, and it’s a mistake,” said Rao. “That’s because there is a lot of innovation at startups.” For example, Rao said that more than 100 foundation models are on the market, offering choices beyond OpenAI’s ChatGPT, Anthropic’s Claude, and DeepSeek AI.

Rao offered tips for how to pick providers:

Create a clear policy mechanism for deciding when to use specific models for different data types and enforce them.

Let your developers select the models they think will be most successful for their applications within these guardrails.

Evaluate the user experience for new AI applications by measuring net promoter scores and other metrics to see which models are doing well in different environments.

“Don’t select a single partner. You’re going to pay a price for doing that. The costs of supporting multiple different models and partners is a lot cheaper than picking a single partner and losing innovation and capabilities that your competitors will be able to offer,” Rao said.

Integrating AI Successfully into Cloud Operations

So, how is AI improving cloud operations? Rao cited using AI to detect distributed denial of service (DDoS) applications in security environments. “DDoS is an extremely complicated problem to solve because you’ve got traffic originating from all over the world — all kinds of IP addresses and originating networks. You’re trying to block illegitimate traffic from an attacker while allowing legitimate customer traffic to come through.”

Companies like Cloudflare, AWS, and Azure are now using GenAI models running against their web traffic logs to detect when traffic is coming from a DDoS attack source and then automatically block it. These models are very good at detecting automated traffic versus human traffic.

Other examples are automated log scanning to determine failures and automated pipeline creation. Rao has not seen examples of AI agents replacing people and doesn’t anticipate they will until error rates are less than 0.1%, 1%, or 0.0001%, depending on application requirements.

Predictions for How Cloud and AI Will Evolve Over the Coming Decade

Rao offered three forecasts for the future of cloud and AI.

Falling costs will make AI more accessible: Rao anticipates the GPUs required for advanced models to drop within five years and parallel matrix operations used to power GenAI models within 10. “We will build the silicone to make this a commodity operation, where it’s pennies an hour to run AI models,” Rao said.

For processing, it’s cloud to edge and back again: While today, enterprises often choose edge processing for AI models, in the future, the ubiquity of fiber, fast 5G speeds, and falling bandwidth costs will make cloud processing the preferred choice for all models except those that are extremely latency-sensitive or have physical security requirements.

AI agents won’t replace humans at scale: Rao says he doesn’t believe AI models will pervasively replace workers because of business liability issues. If a model fails, who will accept accountability, he asks. Is it the fault of the bot developer, the independent software vendor that used the bot, or the enterprise hosting an application with the bot? “Ultimately, you need accountability. Someone needs to own the problem. And while bots are good at responding, they’re not good at owning things. That’s a major drawback of AI,” stated Rao.

How AI Will Impact Enterprises’ Multi-Cloud and Hybrid Cloud Strategies

GPU availability issues and computing and AI model expenses have forced enterprises to adopt hybrid and multi-cloud strategies. This approach will drive the standardization of core APIs that power cloud services.

“Cloud is turning into an operating system for distributed applications, and you can’t have three APIs to open a file,” said Rao. “There needs to be a standard library. Terraform is trying to do that, but there are four different versions of Terraform for deploying applications. We will see some standardization because GPUs are necessary across all these environments.”

Recommendations for Aligning Cloud Strategies with AI Adoption

Rao offered recommendations for aligning cloud and AI strategies. He recommended:

Experimenting but not committing to a single strategy. “There’s a hype cycle right now, but it will die. What will be left will be durable applications that generate business value,” said Rao.

Understanding the economic value of investments. “Am I truly saving money because cash is king,” Rao exhorted. That’s especially true in environments where inflation, volatile customer demand, and fast-changing government policies create market and leader uncertainty.

Having realistic expectations for automation. “Automation is the number-one focus for leaders, but they should test every assumption of the financial model driving their target application,” Rao said. He gave the example of using AI to review thousands of bills to identify cost savings opportunities. As a first step, companies must ensure that models can read the bills properly and evaluate data structures before testing other assumptions. “We often think that software is consistent. However, AI is not consistent software. So, you must test your assumptions before you can predict the cost savings or revenue you’ll drive by using AI within your business operations,” Rao said.

Want to learn more?

Watch the podcast.