For many applications, either solid-state drive (SSD) or hard-disk drive (HDD) is sufficient, and the choice comes down to cost. When determining which way to go, buyers want to know: How do we calculate this cost? And how does this affect the efficiency of hybrid SSD/HDD storage systems? Calculating SSD and HDD Costs In deciding whether to run an application on SSD vs. HDD, the two major factors to consider are the capacity cost and performance cost of the application. The capacity cost is the total cost of storing the application and associated data sets on SSD vs. HDD. The performance cost is the cost of buying enough SSD or HDD to meet the performance needs of the application. Capacity and performance cost are calculated as as follows:
- Capacity cost = GB needed X $ per GB
- Performance cost = IOPS needed X $ per IOPS
Since the dollars per GB of HDD is far lower than SSD, and the dollars per IOPS of SSD is far lower than HDD, the point at which the cost of using SSD is lower than the cost of using HDD is when the capacity cost of storing an application in SSD is lower than the performance cost of using HDD for the same application. Rearranging the terms we get:
- Capacity cost of SSD < performance cost of HDD
- GB needed X $ per GB of SSD < IOPS needed X $ per IOPS of HDD
- IOPS needed/GB needed > $ per GB of SSD/$ per IOPS of HDD
Today, the cost per GB of SSD (multilevel cell (MLC)) is about $2 per GB, while the cost per IOPS of HDD is about $2 per IOPS, so SSD should be used instead of HDD if the IOPS/GB needed by an application is greater than one ($2 per GB for SSD, divided by $2 per IOPS for HDD). If the application generates more than 1 IOPS/GB of data, it is more cost-effective to use SSD than HDD. This ratio is referred to as the threshold IO density (see figure 1, above).
To get a feel for what 1 IOPS/GB means, consider Microsoft sizing guidelines: An Exchange application might be expected to generate about 4 IOPS/GB, while SharePoint might generate about 2 IOPS/GB. This refers to the entire application, rather than just the working set of the application. Based on these estimates, it is most cost-effective to run these applications in SSD rather than HDD.
Note that if you use a storage system that dedupes and compresses, the threshold IO density is even lower, perhaps only 0.5 IOPS/GB or 0.25 IOPS/GB. Furthermore, if you use a hybrid storage system that can store just the working set of the application in SSD, we only need to consider the size of the working set for computing SSD costs, rather than the size of the entire application.
Block size and access frequency
Another useful application of the threshold IO density metric is examining the effect of block size in deciding which data blocks to store on SSD vs. HDD. For example, if placement decisions are made based on a large block size such as 1 GB, then the GB of data must be accessed at least once per second before it is cost-effective to store it in SSD. However, if we use a much smaller granularity such as 8 KB, the 8 KB block of data need only be accessed once every day or two before it is cost-effective to store it in SSD. In other words, 1 IOPS/GB has the same IO density as 1 IO per day/8 KB. This allows a system using small blocks to cache much more of the hot data in SSD than one that uses large blocks.
A system that uses 1 GB blocks brings a lot of cold data into the system in addition to hot data, while a system that uses 8 KB blocks can be much more selective in terms of what it stores on SSD. Because real applications have significant spatial locality, there is a point of diminishing returns. That is, a block size of 512 bytes instead of 8 KB is unlikely to significantly improve performance and the metadata overhead for keeping track of such small blocks is prohibitive.
MLC-based SSD is already very cost-effective relative to HDD for mainstream applications. Furthermore, techniques such as dedupe, compression, hybrid file systems, and managing SSD using small block sizes greatly amplify these advantages. Finally, since some data is most cost-effectively stored on SSD while others are most cost-effectively stored on HDD, a storage system that uses a combination of SSD and HDD will provide better value than one that is all SSD or all HDD.