Bug 1232238

Summary:	[RHEV-RHGS] After self-heal operation, VM Image file loses the sparseness property
Product:	[Community] GlusterFS	Reporter:	Anuradha <atalur>
Component:	replicate	Assignee:	Anuradha <atalur>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs, gluster-bugs, ravishankar, sasundar, smohan
Target Milestone:	---	Keywords:	Reopened, TestBlocker
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8rc2	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1223677
Clones:	1235966 (view as bug list)		Environment:
Last Closed:	2016-06-16 13:12:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1223677
Bug Blocks:	1223636, 1235966

Description Anuradha 2015-06-16 10:48:05 UTC

+++ This bug was initially created as a clone of Bug #1223677 +++

Description of problem:
-----------------------
RHEV data domain was backed by replica 3 gluster volume and one of the node was down, while creating a image file. 

After self-heal, it was observed that the sparseness property on the image file on the healed NODE was no longer observed

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHGS 3.1 Nightly build ( glusterfs-3.7.0-2.el6rhs )

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
0. Create a 3 node trusted storage pool ( gluster cluster )
1. Create a replica 3 volume
2. Optimize the volume for virt-store usecase
3. Start the volume
4. Use this volume as a RHEV data domain
5. Interrupt the traffic between hypervisors and one of the node in Trusted Storage Pool ( gluster cluster ) [ used iptables for this step ]
6. Create a new VM from RHEV and install RHEL 6.7 on that application VM
7. Restore the network between Hypervisor & NODE in gluster cluster
8. Initiate self-heal
9. Look for the actual size of the file on all the nodes

Actual results:
---------------
The VM Image file size of the node, on which the heal operation has completed has blown up to full size ( losing its sparseness )

Expected results:
-----------------
VM file should continue to be a sparse file even after self-heal

--- Additional comment from SATHEESARAN on 2015-05-21 06:19:03 EDT ---

1. Cluster Information
-----------------------
RHGS Node1 - dhcp37-113.lab.eng.blr.redhat.com
RHGS Node2 - dhcp37-58.lab.eng.blr.redhat.com
RHGS Node3 - dhcp37-150.lab.eng.blr.redhat.com

2. Volume Information
---------------------
[root@dhcp37-113 ~]# gluster volume info rhevstore
                                                                                                                                                                                                               
Volume Name: rhevstore                                                                                                                                                                                         
Type: Replicate                                                                                                                                                                                                
Volume ID: 5f2a9457-3cd7-455f-8823-4b50272091e2                                                                                                                                                                
Status: Started                                                                                                                                                                                                
Number of Bricks: 1 x 3 = 3                                                                                                                                                                                    
Transport-type: tcp                                                                                                                                                                                            
Bricks:                                                                                                                                                                                                        
Brick1: 10.70.37.113:/rhs/brick2/store                                                                                                                                                                         
Brick2: 10.70.37.58:/rhs/brick2/store                                                                                                                                                                          
Brick3: 10.70.37.150:/rhs/brick2/store                                                                                                                                                                         
Options Reconfigured:                                                                                                                                                                                          
performance.write-behind: off
storage.owner-gid: 36
storage.owner-uid: 36
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on

3. Volume status
-----------------
[root@dhcp37-113 ~]# gluster volume status rhevstore
Status of volume: rhevstore
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.37.113:/rhs/brick2/store        49153     0          Y       9898 
Brick 10.70.37.58:/rhs/brick2/store         49153     0          Y       7499 
Brick 10.70.37.150:/rhs/brick2/store        49153     0          Y       3119 
NFS Server on localhost                     2049      0          Y       9916 
Self-heal Daemon on localhost               N/A       N/A        Y       9925 
NFS Server on 10.70.37.150                  2049      0          Y       3137 
Self-heal Daemon on 10.70.37.150            N/A       N/A        Y       3145 
NFS Server on 10.70.37.58                   2049      0          Y       7517 
Self-heal Daemon on 10.70.37.58             N/A       N/A        Y       7525 
 
Task Status of Volume rhevstore
------------------------------------------------------------------------------
There are no active volume tasks

4. VM Image files as seen in the bricks

[root@dhcp37-113 ~]# ls /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03 -lsah
2.9G -rw-rw----. 2 36 36 21G May 21 11:37 /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03

[root@dhcp37-150 ~]# ls /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03 -lsah
2.9G -rw-rw----. 2 36 36 21G May 21 11:35 /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03

[root@dhcp37-58 ~]# ls /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03 -lsah
21G -rw-rw----. 2 36 36 21G May 21 11:35 /rhs/brick2/store/525e245f-3c49-454e-b075-809b5b7aea20/images/702d5ecd-fcf1-457f-b74c-6ad5db08ce26/01c14342-844a-40b0-ad08-ee44c0b08b03

5. The node that has lost sparseness
-------------------------------------
dhcp37-58.lab.eng.blr.redhat.com which is subvolume -> rhevstore-client-1

--- Additional comment from SATHEESARAN on 2015-05-21 06:20:44 EDT ---



--- Additional comment from SATHEESARAN on 2015-05-21 06:28:42 EDT ---



--- Additional comment from SATHEESARAN on 2015-05-21 06:30:35 EDT ---



--- Additional comment from SATHEESARAN on 2015-06-11 21:46:16 EDT ---

This is a serious issue with VM usecase, where the expectation is to create a sparse image file,but self-heal breaks that, leading the image to occupy the full size. This would lead to admin complaining about wasted disk space.

I consider this issue as a blocker for RHGS-3.1

Comment 1 Anand Avati 2015-06-16 11:19:08 UTC

REVIEW: http://review.gluster.org/11252 (cluster/afr : truncate all sinks files) posted (#1) for review on master by Anuradha Talur (atalur)

Comment 2 Anand Avati 2015-06-17 07:40:37 UTC

REVIEW: http://review.gluster.org/11252 (cluster/afr : truncate all sinks files) posted (#2) for review on master by Anuradha Talur (atalur)

Comment 3 Anand Avati 2015-06-25 12:13:32 UTC

REVIEW: http://review.gluster.org/11252 (cluster/afr : truncate all sinks files) posted (#3) for review on master by Anuradha Talur (atalur)

Comment 4 Nagaprasad Sathyanarayana 2015-10-25 15:00:28 UTC

Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 5 Niels de Vos 2016-06-16 13:12:44 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user