The storage industry provides excellent visibility into capacity consumption and growth. Administrators can quickly see how much space is consumed and how quickly usage is growing. Answers to basic questions like “when will we need more space, and how much?” are relatively easy to address. If on Monday morning administrators arrive in the office to find a storage system short on capacity, existing tools can quickly find the culprit VMs and help develop an action plan.
Unfortunately, understanding the performance reserves of storage is a much trickier business. For example, I am writing this post on a laptop with a 500GB drive. While the drive is happily powering my browser and editor, I highly doubt it would support a 500GB database server. This might take a few such drives, but I don’t know whether it would be two drives, or twenty. Administrators often fall back on IOPS for lack of a better metric, but this is woefully insufficient. A 4KB random read to a 2TB database is not the same as a 64KB sequential write to a 10MB log; indeed on most systems these vary by orders of magnitude. For example, I am confident my laptop drive would support many more IOPS of log analysis than OLTP traffic.
Once caching and tiering burst into the picture, things get even more complex. And when a storage system is overloaded, performance usually does not degrade gracefully — it typically drops off a cliff. This creates an environment where substantial over-provisioning for performance is necessary for routine operation.
A fundamental objective for Tintri VMstore was dramatically simplifying performance management. Enabling performance visibility and management similar to the existing paradigm for capacity was a core design goal. So in addition to relatively conventional capacity metrics, Tintri VMstore also includes an industry-first performance reserves fuel gauge.
Like capacity, performance reserves is a single metric, but measures the performance resources consumed on the appliance (CPU, SSD capacity, HDD IOPS, network bandwidth, etc.) rather than capacity. Tintri shows how much of the appliance performance is currently consumed, and how that has been changing over time. This allows administrators to use performance reserves in much the same way as capacity for monitoring, troubleshooting and planning.
Because Tintri VMstore is VM-aware, we also show the performance reserves consumed by each VM. How much of the performance is consumed by your mail server or your virtual desktops? Tintri VMstore directly displays this information for each VM and vDisk.
If the appliance is more heavily loaded on Monday than the previous Friday, administrators can instantly find the VMs whose performance reserves have increased, and start digging into what changed. A side panel of the UI shows the names of the VMs whose performance consumption has increased the most in the last few days.
This is also helpful in planning. For the first time it makes basic administrative questions straightforward. For example, how many more Exchange servers can be added to this appliance? Administrators simply compare how much performance the Exchange server is consuming compared to the available performance reserves.
Simplicity is Hard Work
Much like tools for capacity, Tintri’s single performance reserves metric does not eliminate every possible instance of troubleshooting. However, performance reserves metric goes a long way toward removing the complexity from performance management, and is sufficient to answer many important questions.
Tintri VMstore employs a sophisticated set of algorithms to distill a single representative performance metric from a disparate set of resource usage statistics. The algorithms account for intermittent workloads, multiple bottleneck resources and interactions by drawing heavily on the custom VM-aware file system for deep data collection and analysis.
Working Set Analysis
In addition to IO rates and throughput, we track information about the working set of each VM. We keep track of the percentage of the blocks in flash at varying levels of hotness. So, for example, we can tell that 80 percent of the IO is going to 20 percent of the data for a particular VM, 90 percent of the IO to 30 percent of the data, etc.
We use the same information to estimate how much additional IO would result if we took some of the flash away from a given VM. This is useful for determining how much each VM is likely to benefit from each incremental amount of flash and disk IOPS and also for estimating how many additional VMs can be supported on the system as a whole.
Combining Multiple Resources
Performance has many facets. Unlike capacity where there is only one space resource that can be consumed, there are many performance resources in a storage system that can be exhausted. For example, processing IO requests consumes CPU flash capacity, HDD IOPS, and network bandwidth. To fully account for this, Tintri VMstore combines the utilization measures from key resources into a single metric using a market scoring rule.
More Than a Management Layer
The performance reserves metric is tightly coupled to Tintri’s VM-aware file system. Although it is tempting to view the performance metrics as an add-on management layer, the underlying file system provides the ability to capture VM-level metrics and manage performance of multiple VMs and vDisks. In the future, we’ll provide even greater visibility and understanding of how VMs behave, and make it simpler to consistently and reliably monitor and allocate performance resource.
Tintri all-flash storage and software controls each application automatically