How much data does your application generate?

One of the benefits of VM-aware storage is that application behavior is easily visible. This applies to several dimensions—performance impact, but also data generation impact.

Is your application generating more data this week than it did last week? Has something suddenly changed? Has a user suddenly decided to fill up their VM’s local filesystem with uncompressible contraband media content? Has somebody turned on guest-side disk or filesystem encryption, causing a big drop in data compressibility on the storage side?

It’s easy to notice things like this when you have per-VM snapshots; all you have to do is look at the size of the incremental snapshots (daily or hourly) and see literally how much new data a VM generated in a given day.

In the field

Recently we had an interesting case where it was necessary for a customer to replicate a 10TB virtual machine between two data centers approximately 1000 miles apart. The network bandwidth available was approximately 80Mbps. How long would it take to replicate such a VM?

Calculating this answer was straightforward. First, we looked at the post-compressed size of the VM. This is possible because the Tintri filesystem calculates the post-compressed size of the VM’s live data as well as its snapshots. It doesn’t matter what the compression rate of the overall system is; that might vary significantly from the compression rate for the data in any one particular VM.

And, because Tintri VM replication also compresses the data in transit, the sum of the compressed size of the VM’s snapshots indicates how much total data will have to transfer over the network. Tintri’s VM replication also utilizes deduplication, but in this case we made no assumption that any of the VM’s data would be found on the destination.

Here is the data on this VM’s snapshots, taken directly from the VMstore UI on the customer’s replication source system:

The “Changed MB” column gives the space consumed by each snapshot on a post-compression basis. We estimated that the replication of these snapshots would take approximately 11 days. The math is pretty simple: 8.5TB divided by (9 MB/sec * 24 hours/day * 3600 seconds/hour) gives approximately 11 days.

One more question

But what if the application in this VM was to generate a lot of additional new data while the existing snapshots were being replicated? If the application generated, say, 1TB per day of new data—this is not unheard of— we would never be able to catch up, given the available WAN bandwidth.

In this case, however, there was nothing to worry about. We looked at the incremental snapshot sizes — the snapshots newer than the oldest one — and they were very modest, cumulatively under 1% of the size of the oldest and largest snapshot of the VM. This pattern is typical of many applications — a lot of apps just don’t generate all that much new data, most of the time.

For the VM here, this means that, barring some unexpected drastic change in application behavior, the incremental snapshots would replicate quickly once the large base snapshot finished (snapshots replicated in order, oldest to newest). Ultimately, the VM here finished replicating in 11 days, as predicted.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

VMstore T7000 Series

Tintri Cloud Platform

Tintri Cloud Engine

How much data does your application generate?

KEY TAKEAWAYS

In the field

One more question

VMstore T7000 Series

Tintri Cloud Platform

Tintri Cloud Engine

How much data does your application generate?

KEY TAKEAWAYS

In the field

One more question

Related Posts