Tintri VM Scale-out uses proven machine learning algorithms to optimize the distribution of VMs across multiple Tintri storage systems.
Tintri views autonomous operation—in which intelligent software eliminates or greatly simplifies infrastructure management—as an essential element of successful IT and an enterprise cloud. The Tintri enterprise cloud platform is designed for autonomous operation, freeing your infrastructure, cloud, and DevOps management teams from routine management and allowing you to focus on higher-value tasks. Previous posts have discussed Tintri autonomous functions including auto-QoS and VM Scale-out, and explained how autonomous operation is greatly simplified due to the fact that Tintri storage operates at VM or container granularity. Traditional storage uses abstractions such as LUNs and volumes that don’t map directly to what’s happening on servers.
It’s probably no surprise that machine learning is central to autonomous operation. Machine learning is often used to devise complex models and algorithms that lend themselves to predictive analytics. Various machine learning approaches are central to Tintri’s enterprise cloud platform.
A series of blogs in coming weeks will look at these enhancements in more detail. In this first post, we look at machine learning in VM scale-out, our solution for optimizing VM placement across multiple storage systems.
Optimizing VM placement remains a big challenge for most IT teams. And challenges increase as individual VMs grow in size and as faster development cycles require more and more VMs for development and testing. Attempts to solve this problem through automation have suffered from significant limitations thus far:
Tintri VM Scale-out overcomes these challenges:
VM Scale-out makes it easy to grow your storage footprint without ever having to worry about the placement of individual VMs, using the latest capacity and performance data to recommend the best placement for every VM. Once workloads are deployed, the storage system learns from historical data gathered from the environment.
Although our algorithms are computationally more intensive than threshold-based decision making, the algorithms are designed to be as efficient as possible so that you can respond quickly to changing conditions using the latest available data. The end goal of our modeling is accurately predicting the behavior of each VM in order to determine the most optimal VM placements across multiple storage arrays while minimizing the potential for churn.
Computations are performed on each storage array without consuming excessive resources or interfering with other activities. Tintri Global Center (TGC) pulls the information together from individual arrays, showing you the big picture across your whole environment and recommending changes that optimize your environment for capacity and performance while minimizing the impact of migrations. Simply accepting a recommendation initiates the necessary actions.
Scale-out Storage Platform
To forecast space usage, we settled on an ensemble of multiple predictors. Working with data from real customer environments, we found this approach to be more accurate than using any single prediction algorithm.
Two of the predictors use linear regression. Based on the past week or the past month of history, we fit a trend line to the data, as well as model how big the "error" is around that line. This gives us a range of possible outcomes.
The third predictor is a Monte Carlo simulation. We model the future behavior assuming that it has the same distribution of changes (both positive and negative) as the preceding month. For each time step, we pick a point in the past and add or subtract that much space usage. Performing multiple runs of the Monte Carlo simulation for a week of simulated time gives us a range of possible outcomes that are combined with the results from the other two predictors in the final analysis.
Several predictors are also used to forecast future performance needs. We begin by assuming that the next week will be the same as one of the previous weeks. You can think of this as being similar to “averaging” the behavior of the past four weeks, but it’s not averaging (which would lower variability), it’s generating a range of predictions.
The second predictor for performance fits the observed load to a log-normal distribution and uses that to generate a range of predictions. The two predictors are weighted and combined to produce a final analysis that predicts the likely future performance needs of each VM.
At Tintri, our goal is to deliver the simplest management experience possible. By using a variety of machine learning algorithms to accurately predict future capacity, performance, and working set, VM Scale-out gives you back the time you would otherwise have to spend manually load balancing VMs. And it eliminates the churn that results from less sophisticated, threshold-based methods. As a result, Tintri storage operates autonomously and makes smart recommendations while also giving you the fine-grained control to address unique requirements.
Unique control with VM-level actions for infrastructure functions including snapshots, replication and QoS make protection and performance certain in production, and accelerate test and development cycles.