0 0

Optimize VM placement with machine learning for reduced costs

Optimize VM placement with machine learning for reduced costs illustration

Tintri VM Scale-out uses proven machine learning algorithms to optimize the distribution of VMs across multiple Tintri storage systems.

  • VM Scale-out uses machine learning to optimize VM placement across multiple storage systems.
  • An ensemble of predictive algorithms are used to accurately forecast the future space and performance requirements of each VM.
  • Following the recommendations optimizes VM placement, saving manual effort, ensuring performance, and increasing operating efficiency for reduced cost.

Tintri views autonomous operation—in which intelligent software eliminates or greatly simplifies infrastructure management—as an essential element of successful IT and an enterprise cloud. The Tintri enterprise cloud platform is designed for autonomous operation, freeing your infrastructure, cloud, and DevOps management teams from routine management and allowing you to focus on higher-value tasks. Previous posts have discussed Tintri autonomous functions including auto-QoS and VM Scale-out, and explained how autonomous operation is greatly simplified due to the fact that Tintri storage operates at VM or container granularity. Traditional storage uses abstractions such as LUNs and volumes that don’t map directly to what’s happening on servers.

It’s probably no surprise that machine learning is central to autonomous operation. Machine learning is often used to devise complex models and algorithms that lend themselves to predictive analytics. Various machine learning approaches are central to Tintri’s enterprise cloud platform.

A series of blogs in coming weeks will look at these enhancements in more detail. In this first post, we look at machine learning in VM scale-out, our solution for optimizing VM placement across multiple storage systems.

VM management challenges

Optimizing VM placement remains a big challenge for most IT teams. And challenges increase as individual VMs grow in size and as faster development cycles require more and more VMs for development and testing. Attempts to solve this problem through automation have suffered from significant limitations thus far:

  • Some solutions recommend a VM migration based on a one-time crossing of a capacity or IOPS threshold
  • Data at the LUN or volume-level leads to bad guesses and poor decisions about optimal VM placement
  • There is little visibility into the impact on performance or the time required to complete a migration
  • Problem VMs constantly get bounced back and forth between arrays without resolving performance issues

Tintri VM Scale-out overcomes these challenges:

  • Optimizes VM placement based on a complete picture of each VM’s storage capacity and performance needs
  • Gives you least-cost recommendations, saving you time, bandwidth, and capacity to maintain optimal VM distribution
  • Learns every time you edit its recommendations and provides fine-grained control

Machine learning in Tintri VM scale-out

VM Scale-out makes it easy to grow your storage footprint without ever having to worry about the placement of individual VMs, using the latest capacity and performance data to recommend the best placement for every VM. Once workloads are deployed, the storage system learns from historical data gathered from the environment.

Although our algorithms are computationally more intensive than threshold-based decision making, the algorithms are designed to be as efficient as possible so that you can respond quickly to changing conditions using the latest available data. The end goal of our modeling is accurately predicting the behavior of each VM in order to determine the most optimal VM placements across multiple storage arrays while minimizing the potential for churn.

Computations are performed on each storage array without consuming excessive resources or interfering with other activities. Tintri Global Center (TGC) pulls the information together from individual arrays, showing you the big picture across your whole environment and recommending changes that optimize your environment for capacity and performance while minimizing the impact of migrations. Simply accepting a recommendation initiates the necessary actions.

Scale-out Storage Platform

Optimize VM placement with machine learning for reduced costs

To forecast space usage, we settled on an ensemble of multiple predictors. Working with data from real customer environments, we found this approach to be more accurate than using any single prediction algorithm.

Two of the predictors use linear regression. Based on the past week or the past month of history, we fit a trend line to the data, as well as model how big the "error" is around that line. This gives us a range of possible outcomes.

The third predictor is a Monte Carlo simulation. We model the future behavior assuming that it has the same distribution of changes (both positive and negative) as the preceding month. For each time step, we pick a point in the past and add or subtract that much space usage. Performing multiple runs of the Monte Carlo simulation for a week of simulated time gives us a range of possible outcomes that are combined with the results from the other two predictors in the final analysis.

Scale-out with machine learning to optimize VM placement for reduced costs

Predicting future performance needs

Several predictors are also used to forecast future performance needs. We begin by assuming that the next week will be the same as one of the previous weeks. You can think of this as being similar to “averaging” the behavior of the past four weeks, but it’s not averaging (which would lower variability), it’s generating a range of predictions.

The second predictor for performance fits the observed load to a log-normal distribution and uses that to generate a range of predictions. The two predictors are weighted and combined to produce a final analysis that predicts the likely future performance needs of each VM.

A smarter approach to scaling storage

At Tintri, our goal is to deliver the simplest management experience possible. By using a variety of machine learning algorithms to accurately predict future capacity, performance, and working set, VM Scale-out gives you back the time you would otherwise have to spend manually load balancing VMs. And it eliminates the churn that results from less sophisticated, threshold-based methods. As a result, Tintri storage operates autonomously and makes smart recommendations while also giving you the fine-grained control to address unique requirements.

Saurabh Modh / Sep 12, 2017

Saurabh is a Senior Product Manager at Tintri, responsible for driving VM scale-out and Tintri Global Center. Saurabh has spent his career driving and building system management products, storage p...more