True Tintri Tales: Storage Latency, Resolved

Here at Tintri, the technical marketing lab is our production environment, which we rely on for important demos, new feature validation, education, and other critical activities. So we do our best to make sure it’s in tip-top shape, 24/7/365.

Along the same lines, we address outages or service degradation ASAP. Of course, when a strange issue surfaced last week, we jumped right in.

On the case

While running some maintenance activities, we noticed the following:

The Tintri VMstore UI showed that one of the busier VMs was experiencing latency, around half of which was due to the network delay. This looked odd for several reasons:

The total network throughput at the VMstore level was very low
There is a separate storage network, so non-storage noise can’t have an impact on this network’s load
The environment was supposed to be quiet at that time of night

It was pretty obvious that this was storage traffic, but it wasn’t coming from one of the VMs in this VMstore.

Investigating

Almost as an instinct, we went to check our vCenter. It showed the following:

Here we learned that many packets were sent from one host (which we already assumed), but we still didn’t know which VMs were behind it. Wasn’t that helpful.

But we were so busy looking deep into the details, we almost forgot about the one place where we could see the complete picture: Tintri Global Center (TGC).

TGC consolidates data from all our VMstores, so it holds a complete view of the entire infrastructure.

And sure enough, we logged in and found the answer immediately.

The answer

The top graph shows the total throughput across all 549 VMs in our infrastructure, which are serviced by each of the multiple Tintri VMstores. When we clicked a point in the graph (the red line), we saw a list of the VMs that contributed to the throughput at that point in time.

We could immediately see that VM “SHARE2012” was waaaay up there at the top.

As it turns out, someone used IO-Meter running on the “SHARE2012” VM and it had some extreme settings for testing purposes. This VM created load on the entire network, which even slowed down VMs that were not on the same VMstore or ESXi host.

This turned out to be a classic situation where a single pane of glass view, an oft-misused term, was invaluable in diagnosing the root of our cross-system/cross-platform problems.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

VMstore T7000 Series

Tintri Cloud Platform

Tintri Cloud Engine

True Tintri Tales: Storage Latency, Resolved

On the case

Investigating

The answer

VMstore T7000 Series

Tintri Cloud Platform

Tintri Cloud Engine

True Tintri Tales: Storage Latency, Resolved

On the case

Investigating

The answer

Related Posts