Bug 1466253

Summary: RHV-M reports wrong storage consumption on GlusterFS store
Product: Red Hat Enterprise Virtualization Manager Reporter: Daniel Messer <dmesser>
Component: ovirt-engine-dashboardAssignee: Gobinda Das <godas>
Status: CLOSED WORKSFORME QA Contact: Pavel Stehlik <pstehlik>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.1.2CC: dfediuck, dmesser, kdhananj, knarra, sabose
Target Milestone: ovirt-4.1.6   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-23 06:33:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Gluster RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot for storage guage
none
All the vms in the system
none
dashboard_storage none

Description Daniel Messer 2017-06-29 11:06:12 UTC
Description of problem:
The aggregate storage consumption reported by RHV-M does not add up to what's really stored. When tracked down to the individual disks it turns out that for thick-provisioned disks which are e.g. 5GiB provisioned, the reported actual capacity is 37-38 GiB.
On the other hand the dashboard page only reports the real consumed capacity on the datastore.

In my case:

8 VMs, each with a 10GiB thin-provisioned disk and a 5 GiB thick-provisioned disk. Combined data store usage as per RHV-M dashboard = 42 GiB. So far so correct. When looking at the detailed values by clicking the storage "gauge" on the dashboard I can see the consumers which are VMs that have between 39 and 42 GiB reported utilization alone. When looking at the individual disks it becomes clear the reason is that the 5 GiB thick provisioned disk are reported as 37-38 GiB each.
The GlusterFS mount is 3-way replicated. Even with replication taken into account the values don't add up. When looking at the Gluster mount the capacities of the files are reported correctly for the files but the total directory capacity is wrong:

ll -ah d4c1b3a2-9074-46ea-aeba-907981c10652

total 35G
drwxr-xr-x.  2 vdsm kvm 4.0K Jun 29 00:43 .
drwxr-xr-x. 20 vdsm kvm 4.0K Jun 29 10:24 ..
-rw-rw----.  1 vdsm kvm 5.0G Jun 29 10:21 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d
-rw-rw----.  1 vdsm kvm 1.0M Jun 29 00:32 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d.lease
-rw-r--r--.  1 vdsm kvm  271 Jun 29 00:43 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d.meta

On the bricks the reported size on the other hand is way to small:

ll -ah /gluster_bricks/vmstore/vmstore/b9c28b59-52bb-442a-a3a9-81f4035ee97a/images/d4c1b3a2-9074-46ea-aeba-907981c10652/
total 4.9M
drwxr-xr-x.  2 vdsm kvm  149 Jun 29 00:43 .
drwxr-xr-x. 20 vdsm kvm 8.0K Jun 29 10:24 ..
-rw-rw----.  2 vdsm kvm 4.0M Jun 29 10:21 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d
-rw-rw----.  2 vdsm kvm 1.0M Jun 29 00:32 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d.lease
-rw-r--r--.  2 vdsm kvm  271 Jun 29 00:43 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d.meta

Note that these disks were created during a "Clone VM" operation.

Version-Release number of selected component (if applicable):

RHHI with RHV-M 4.1.2

How reproducible:

Always.

Comment 1 Sahina Bose 2017-07-14 09:14:31 UTC
(In reply to Daniel Messer from comment #0)
> Description of problem:
> The aggregate storage consumption reported by RHV-M does not add up to
> what's really stored. When tracked down to the individual disks it turns out
> that for thick-provisioned disks which are e.g. 5GiB provisioned, the
> reported actual capacity is 37-38 GiB.
> On the other hand the dashboard page only reports the real consumed capacity
> on the datastore.
> 
> In my case:
> 
> 8 VMs, each with a 10GiB thin-provisioned disk and a 5 GiB thick-provisioned
> disk. Combined data store usage as per RHV-M dashboard = 42 GiB. So far so
> correct. When looking at the detailed values by clicking the storage "gauge"
> on the dashboard I can see the consumers which are VMs that have between 39
> and 42 GiB reported utilization alone. When looking at the individual disks
> it becomes clear the reason is that the 5 GiB thick provisioned disk are
> reported as 37-38 GiB each.
> The GlusterFS mount is 3-way replicated. Even with replication taken into
> account the values don't add up. When looking at the Gluster mount the
> capacities of the files are reported correctly for the files but the total
> directory capacity is wrong:
> 
> ll -ah d4c1b3a2-9074-46ea-aeba-907981c10652
> 
> total 35G
> drwxr-xr-x.  2 vdsm kvm 4.0K Jun 29 00:43 .
> drwxr-xr-x. 20 vdsm kvm 4.0K Jun 29 10:24 ..
> -rw-rw----.  1 vdsm kvm 5.0G Jun 29 10:21
> 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d
> -rw-rw----.  1 vdsm kvm 1.0M Jun 29 00:32
> 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d.lease
> -rw-r--r--.  1 vdsm kvm  271 Jun 29 00:43
> 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d.meta


Krutika, is there something related to shard in displaying size?

> 
> On the bricks the reported size on the other hand is way to small:
> 
> ll -ah
> /gluster_bricks/vmstore/vmstore/b9c28b59-52bb-442a-a3a9-81f4035ee97a/images/
> d4c1b3a2-9074-46ea-aeba-907981c10652/
> total 4.9M
> drwxr-xr-x.  2 vdsm kvm  149 Jun 29 00:43 .
> drwxr-xr-x. 20 vdsm kvm 8.0K Jun 29 10:24 ..
> -rw-rw----.  2 vdsm kvm 4.0M Jun 29 10:21
> 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d
> -rw-rw----.  2 vdsm kvm 1.0M Jun 29 00:32
> 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d.lease
> -rw-r--r--.  2 vdsm kvm  271 Jun 29 00:43
> 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d.meta

This is expected as only the first shard is created in this directory. All other shards are in .shard of brick.

> 
> Note that these disks were created during a "Clone VM" operation.
> 
> Version-Release number of selected component (if applicable):
> 
> RHHI with RHV-M 4.1.2
> 
> How reproducible:
> 
> Always.

Comment 4 Krutika Dhananjay 2017-07-17 06:47:43 UTC
This does look very similar to https://bugzilla.redhat.com/show_bug.cgi?id=1332861

Shard relies on the size and block count returned by posix translator (which directly talks to the disks) for its aggregated size accounting. With XFS preallocation, the number of blocks allocated in anticipation of further writes might be initially more than what is required. And it is the block count - preallocation included - that shard would have used for its accounting. But XFS, unbeknownst to shard, releases these blocks at a later point. I'm not sure if this difference can become this wide over time. Let me test that out and get back, in case there are other bugs/problems that are contributing to it.

-Krutika

Comment 5 Krutika Dhananjay 2017-07-17 07:26:35 UTC
Hi Daniel,

Could you provide the xattrs of the example files you mentioned in the Description of this bug from the brick?

# getfattr -d -m . -e hex <file-path-wrt-brick>


Specifically I'm looking for xattrs of the original image, .meta and .lease files listed below:


ll -ah /gluster_bricks/vmstore/vmstore/b9c28b59-52bb-442a-a3a9-81f4035ee97a/images/d4c1b3a2-9074-46ea-aeba-907981c10652/
total 4.9M
drwxr-xr-x.  2 vdsm kvm  149 Jun 29 00:43 .
drwxr-xr-x. 20 vdsm kvm 8.0K Jun 29 10:24 ..
-rw-rw----.  2 vdsm kvm 4.0M Jun 29 10:21 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d
-rw-rw----.  2 vdsm kvm 1.0M Jun 29 00:32 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d.lease
-rw-r--r--.  2 vdsm kvm  271 Jun 29 00:43 3517fa85-42f9-4c0c-89fc-47b82d2e3a9d.meta

Comment 6 Sahina Bose 2017-07-31 07:25:11 UTC
Kasturi, can you check if you can reproduce this?

Comment 8 RamaKasturi 2017-08-10 09:24:41 UTC
Hi sahina,

    As far as i understood the bug description, i had tried to reproduce the scenario and below is what is my analysis.

1) Created a vm with 10GiB thin provisioned , installed OS .
2) Once the vm came up i have attached another disk of size 5GiB preallocated / thick provisioned.
3) powered off the vm and used the Clone VM button and cloned another 7vms.
4) powered all the machines on.
5) Now when i click on the storage gauge in the dashboard i see all the vms present in the system are listed with their current usage. Attached screenshot for the same.
6) Created 8 vms and totally there are 9 vms available in the system including hosted Engine and all of them show up in the storage gauge.

One issue i observed is total usage of data volume is 16.8 GiB where as storage gauge reports it only as 14.0 GiB.

@sahina, please let me know in case i have missed something from here.

Thanks
kasturi

Comment 9 RamaKasturi 2017-08-10 09:25:08 UTC
Created attachment 1311639 [details]
screenshot for storage guage

Comment 10 RamaKasturi 2017-08-10 09:25:51 UTC
Created attachment 1311640 [details]
All the vms in the system

Comment 11 RamaKasturi 2017-08-10 09:28:13 UTC
Since my disks can accommodate 12.8 Tera bytes of space and i have used very little out of that which is like ~19GB, dasbhoard does not display much usage.

Attaching screenshot for the same.

Comment 12 RamaKasturi 2017-08-10 09:28:36 UTC
Created attachment 1311642 [details]
dashboard_storage

Comment 13 RamaKasturi 2017-08-10 11:47:24 UTC
From the mount point:
==============================================
[root@rhsqa-grafton1 ~]# ll -ah /rhev/data-center/mnt/glusterSD/10.70.36.79\:_data/9a86cceb-f5fa-42ce-a457-ff594bc80263/images/c3858a8e-9ab3-4bf0-9867-b1e560f899d5/
total 5.1G
drwxr-xr-x.  2 vdsm kvm 4.0K Aug  9 11:39 .
drwxr-xr-x. 20 vdsm kvm 4.0K Aug 10 14:39 ..
-rw-rw----.  1 vdsm kvm 5.0G Aug  9 11:38 bb8ba404-5ced-4248-87b6-0052f196e00b
-rw-rw----.  1 vdsm kvm 1.0M Aug  9 11:39 bb8ba404-5ced-4248-87b6-0052f196e00b.lease
-rw-r--r--.  1 vdsm kvm  325 Aug  9 11:39 bb8ba404-5ced-4248-87b6-0052f196e00b.meta

From the brick backend: (shard size here is 64M)
================================================

[root@rhsqa-grafton1 ~]# ll -ah /gluster_bricks/data/data/9a86cceb-f5fa-42ce-a457-ff594bc80263/images/c3858a8e-9ab3-4bf0-9867-b1e560f899d5/
total 66M
drwxr-xr-x.  2 vdsm kvm  165 Aug  9 11:39 .
drwxr-xr-x. 20 vdsm kvm 8.0K Aug 10 14:39 ..
-rw-rw----.  2 vdsm kvm  64M Aug  9 11:38 bb8ba404-5ced-4248-87b6-0052f196e00b
-rw-rw----.  2 vdsm kvm 1.0M Aug  9 11:39 bb8ba404-5ced-4248-87b6-0052f196e00b.lease
-rw-r--r--.  2 vdsm kvm  325 Aug  9 11:39 bb8ba404-5ced-4248-87b6-0052f196e00b.meta

Comment 14 Sahina Bose 2017-08-23 06:33:18 UTC
Closing this as per Comment 13 as we could not reproduce the issue. Please re-open if you face this issue again, and can provide steps to recreate it.