Bug 1232238

Summary: [RHEV-RHGS] After self-heal operation, VM Image file loses the sparseness property
Product: [Community] GlusterFS Reporter: Anuradha <atalur>
Component: replicateAssignee: Anuradha <atalur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-bugs, ravishankar, sasundar, smohan
Target Milestone: ---Keywords: Reopened, TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8rc2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1223677
: 1235966 (view as bug list) Environment:
Last Closed: 2016-06-16 13:12:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1223677    
Bug Blocks: 1223636, 1235966    

Description Anuradha 2015-06-16 10:48:05 UTC
+++ This bug was initially created as a clone of Bug #1223677 +++

Description of problem:
-----------------------
RHEV data domain was backed by replica 3 gluster volume and one of the node was down, while creating a image file. 

After self-heal, it was observed that the sparseness property on the image file on the healed NODE was no longer observed

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHGS 3.1 Nightly build ( glusterfs-3.7.0-2.el6rhs )

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
0. Create a 3 node trusted storage pool ( gluster cluster )
1. Create a replica 3 volume
2. Optimize the volume for virt-store usecase
3. Start the volume
4. Use this volume as a RHEV data domain
5. Interrupt the traffic between hypervisors and one of the node in Trusted Storage Pool ( gluster cluster ) [ used iptables for this step ]
6. Create a new VM from RHEV and install RHEL 6.7 on that application VM
7. Restore the network between Hypervisor & NODE in gluster cluster
8. Initiate self-heal
9. Look for the actual size of the file on all the nodes

Actual results:
---------------
The VM Image file size of the node, on which the heal operation has completed has blown up to full size ( losing its sparseness )

Expected results:
-----------------
VM file should continue to be a sparse file even after self-heal

--- Additional comment from SATHEESARAN on 2015-05-21 06:19:03 EDT ---

1. Cluster Information
-----------------------
RHGS Node1 - dhcp37-113.lab.eng.blr.redhat.com
RHGS Node2 - dhcp37-58.lab.eng.blr.redhat.com
RHGS Node3 - dhcp37-150.lab.eng.blr.redhat.com

2. Volume Information
---------------------
[root@dhcp37-113 ~]# gluster volume info rhevstore
                                                                                                                                                                                                               
Volume Name: rhevstore                                                                                                                                                                                         
Type: Replicate                                                                                                                                                                                                
Volume ID: 5f2a9457-3cd7-455f-8823-4b50272091e2                                                                                                                                                                
Status: Started                                                                                                                                                                                                
Number of Bricks: 1 x 3 = 3                                                                                                                                                                                    
Transport-type: tcp                                                                                                                                                                                            
Bricks:                                                                                                                                                                                                        
Brick1: 10.70.37.113:/rhs/brick2/store                                                                                                                                                                         
Brick2: 10.70.37.58:/rhs/brick2/store                                                                                                                                                                          
Brick3: 10.70.37.150:/rhs/brick2/store                                                                                                                                                                         
Options Reconfigured:                                                                                                                                                                                          
performance.write-behind: off
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on

3. Volume status
-----------------
[root@dhcp37-113 ~]# gluster volume status rhevstore
Status of volume: rhevstore
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.113:/rhs/brick2/store        49153     0          Y       9898 
Brick 10.70.37.58:/rhs/brick2/store         49153     0          Y       7499 
Brick 10.70.37.150:/rhs/brick2/store        49153     0          Y       3119 
NFS Server on localhost                     2049      0          Y       9916 
Self-heal Daemon on localhost               N/A       N/A        Y       9925 
NFS Server on 10.70.37.150                  2049      0          Y       3137 
Self-heal Daemon on 10.70.37.150            N/A       N/A        Y       3145 
NFS Server on 10.70.37.58                   2049      0          Y       7517 
Self-heal Daemon on 10.70.37.58             N/A       N/A        Y       7525 
 
Task Status of Volume rhevstore
------------------------------------------------------------------------------
There are no active volume tasks

4. VM Image files as seen in the bricks

[root@dhcp37-113 ~]# ls /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03 -lsah
2.9G -rw-rw----. 2 36 36 21G May 21 11:37 /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03

[root@dhcp37-150 ~]# ls /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03 -lsah
2.9G -rw-rw----. 2 36 36 21G May 21 11:35 /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03

[root@dhcp37-58 ~]# ls /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03 -lsah
21G -rw-rw----. 2 36 36 21G May 21 11:35 /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03

5. The node that has lost sparseness
-------------------------------------
dhcp37-58.lab.eng.blr.redhat.com which is subvolume -> rhevstore-client-1

--- Additional comment from SATHEESARAN on 2015-05-21 06:20:44 EDT ---



--- Additional comment from SATHEESARAN on 2015-05-21 06:28:42 EDT ---



--- Additional comment from SATHEESARAN on 2015-05-21 06:30:35 EDT ---



--- Additional comment from SATHEESARAN on 2015-06-11 21:46:16 EDT ---

This is a serious issue with VM usecase, where the expectation is to create a sparse image file,but self-heal breaks that, leading the image to occupy the full size. This would lead to admin complaining about wasted disk space.

I consider this issue as a blocker for RHGS-3.1

Comment 1 Anand Avati 2015-06-16 11:19:08 UTC
REVIEW: http://review.gluster.org/11252 (cluster/afr : truncate all sinks files) posted (#1) for review on master by Anuradha Talur (atalur)

Comment 2 Anand Avati 2015-06-17 07:40:37 UTC
REVIEW: http://review.gluster.org/11252 (cluster/afr : truncate all sinks files) posted (#2) for review on master by Anuradha Talur (atalur)

Comment 3 Anand Avati 2015-06-25 12:13:32 UTC
REVIEW: http://review.gluster.org/11252 (cluster/afr : truncate all sinks files) posted (#3) for review on master by Anuradha Talur (atalur)

Comment 4 Nagaprasad Sathyanarayana 2015-10-25 15:00:28 UTC
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 5 Niels de Vos 2016-06-16 13:12:44 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user