Storage IO Control, Storage DRS and Auto-Tiering Datastores, Part 2

Bill Hill

Guest Blogger, vExpert
DRS, QoS

This is the second of a two-post series from Bill Hill, a vExpert and guest contributor to this blog. In his previous post, Bill looked at storage IO control, storage DRS and auto-tiering -- and their respective relationships with one another. Today, Bill dives a bit deeper into the subject.

What's up, then, with Auto-Tiering and sDRS?

Storage DRS utilizes SIOC to provide the IO metrics that it can use to determine workload imbalance—specifically, the SIOC workload injector. The workload injector helps classify the datastore performance by requesting completely random blocks of data from the datastore and measuring the amount of time it takes for the data to be properly returned (aka - the response time). The injector has no knowledge of the tiers in the underlying storage array, resulting in what can be a completely random, and inaccurate, profile for the datastore.

For example, if data spans an SSD-based aggregate and a SATA-based aggregate, the blocks can be pulled completely from the SATA-tier OR from the SSD-tier. In either event, the result could be that SIOC does not properly report on the performance of the environment and too many, or not enough, migrations may take place. Excessive migrations can cause significant performance degradation for the VM AND the underlying storage environment.

Additionally, Storage DRS will not automatically execute migration functions without the performance issues persisting for a significant amount of time (ex: 16 hours). Many storage arrays schedule tiering actions in different increments. So, the underlying data layout and performance profile can change from underneath Storage DRS and SIOC.

How to deal with this

There are a number of options available to deal with this issue. Auto-tiering provides so many benefits that I believe removing that functionality would be a significant loss. SIOC provides some amazing prioritization in the event of contention, and removing that functionality would be equally horrible. Storage DRS, though, provides great functionality in moving VM data from one datastore to another in bulk-fashion (all VMDK or none). Auto-tiering partially covers that functionality… while the data does not move to a new datastore, it does move to faster performing tiers dynamically. So, with all of that being said, the preferred method to deal with the possible issues introduced by Storage DRS and auto-tiering, is to disable the IO metric functionality of Storage DRS except perhaps for initial placement decisions.

SIOC is a great feature that helps automate storage balancing tasks, as long as you don’t forget its limitations. For example, although SIOC helps ESX adjust queue depths to storage on a per-VM basis, with most storage systems, you lose track of the per-VM QoS metrics once the IO is queued in the storage system itself. Also, currently, you cannot set the SIOC latency threshold to less than 10ms, which is actually somewhat high when using SSD-based storage systems. Fortunately, VMware will undoubtedly continue to improve SIOC to work better with auto-tiered and SSD-based storage systems in the future.

By disabling the IO metric functionality of Storage DRS, the only other way to trigger a Storage DRS migration between datastores is by a datastore filling up disk capacity. Performance would no longer be considered as criteria for migrations. Auto-tiering would be used to ensure best possible performance for the VM data. All of the other functions would be available from SIOC and Storage DRS!

As with many complex system-level problems, both software and hardware is often needed to achieve the best solution.