1234848 – Disperse volume : heal fails during file truncates

Bug 1234848 - Disperse volume : heal fails during file truncates

Summary: Disperse volume : heal fails during file truncates

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	disperse
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Sunil Kumar Acharya
QA Contact:	Matt Zywusko
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1223636
TreeView+	depends on / blocked

Reported:	2015-06-23 11:38 UTC by Bhaskarakiran
Modified:	2017-04-06 06:00 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-04-06 06:00:52 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Bhaskarakiran 2015-06-23 11:38:03 UTC

Description of problem:
======================

During heal, if the files are truncated it fails. The steps which are followed :

1. Bring down one of the brick
2. Create a 1GB file with random data
3. Bring the brick back up and trigger heal
4. While the heal is in progress, truncate the file on the client mount
5. Bricks' file size doesn't match to that of the mount

Version-Release number of selected component (if applicable):
==============================================================
[root@transformers ~]# gluster --version
glusterfs 3.7.1 built on Jun 18 2015 12:26:09
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@transformers ~]# 


How reproducible:
=================

Often

Steps to Reproduce:
===================
As in description

Actual results:
===============
heal fails


Expected results:
=================

Additional info:

Comment 4 Sunil Kumar Acharya 2017-02-03 18:45:08 UTC

We tried recreating the issue.

1. Created a disperse volume.
2. Killed one of the bricks.
3. Created 1GB file.

Mount point:

[root@varada mount-1]# ls -l
total 1000000
-rw-r--r--. 1 root root 1024000000 Feb  3 16:02 testfile
[root@varada mount-1]#

Bricks:

[root@varada mount-1]# ls -l /LAB/store/ec-*
/LAB/store/ec-1:
total 250008
-rw-r--r--. 2 root root 256000000 Feb  3 16:02 testfile

/LAB/store/ec-2:
total 250008
-rw-r--r--. 2 root root 256000000 Feb  3 16:02 testfile

/LAB/store/ec-3:
total 250008
-rw-r--r--. 2 root root 256000000 Feb  3 16:02 testfile

/LAB/store/ec-4:
total 0

/LAB/store/ec-5:
total 250008
-rw-r--r--. 2 root root 256000000 Feb  3 16:02 testfile

/LAB/store/ec-6:
total 250008
-rw-r--r--. 2 root root 256000000 Feb  3 16:02 testfile
[root@varada mount-1]#

4. Brick was brought online.

5. File was listed for healing. Initiated the healing.

[root@varada mount-1]# gluster volume heal ec-1 info
Brick varada:/LAB/store/ec-1
/testfile 
Status: Connected
Number of entries: 1

Brick varada:/LAB/store/ec-2
/testfile 
Status: Connected
Number of entries: 1

Brick varada:/LAB/store/ec-3
/testfile 
Status: Connected
Number of entries: 1

Brick varada:/LAB/store/ec-4
Status: Connected
Number of entries: 0

Brick varada:/LAB/store/ec-5
/testfile 
Status: Connected
Number of entries: 1

Brick varada:/LAB/store/ec-6
/testfile 
Status: Connected
Number of entries: 1

[root@varada mount-1]# 
[root@varada mount-1]# gluster volume heal ec-1 
Launching heal operation to perform index self heal on volume ec-1 has been successful 
Use heal info commands to check status
[root@varada mount-1]# 
[root@varada mount-1]# ls -l /LAB/store/ec-*
/LAB/store/ec-1:
total 250008
-rw-r--r--. 2 root root 256000000 Feb  3 16:02 testfile

/LAB/store/ec-2:
total 250008
-rw-r--r--. 2 root root 256000000 Feb  3 16:02 testfile

/LAB/store/ec-3:
total 250008
-rw-r--r--. 2 root root 256000000 Feb  3 16:02 testfile

/LAB/store/ec-4:
total 145412
-rw-r--r--. 2 root root 148897792 Feb  3 16:03 testfile

/LAB/store/ec-5:
total 250008
-rw-r--r--. 2 root root 256000000 Feb  3 16:02 testfile

/LAB/store/ec-6:
total 250008
-rw-r--r--. 2 root root 256000000 Feb  3 16:02 testfile
[root@varada mount-1]# 

6. Truncated the file.

[root@varada mount-1]# truncate -s 700MB testfile
[root@varada mount-1]# ls -l
total 683594
-rw-r--r--. 1 root root 700000000 Feb  3 16:04 testfile
[root@varada mount-1]# 

7. After some time checked the file size on bricks.

[root@varada mount-1]# ls -l /LAB/store/ec-*
/LAB/store/ec-1:
total 170908
-rw-r--r--. 2 root root 175000064 Feb  3 16:04 testfile

/LAB/store/ec-2:
total 170908
-rw-r--r--. 2 root root 175000064 Feb  3 16:04 testfile

/LAB/store/ec-3:
total 170908
-rw-r--r--. 2 root root 175000064 Feb  3 16:04 testfile

/LAB/store/ec-4:
total 170904
-rw-r--r--. 2 root root 175000064 Feb  3 16:04 testfile

/LAB/store/ec-5:
total 170908
-rw-r--r--. 2 root root 175000064 Feb  3 16:04 testfile

/LAB/store/ec-6:
total 170908
-rw-r--r--. 2 root root 175000064 Feb  3 16:04 testfile
[root@varada mount-1]# 


All the files were of same size on the bricks. It can be observed that the file size on mount point is not same as file size on bricks.

Above testes were performed several times both on upstream and downstream v3.7.1. Behavior explained above was observed consistently and is working as expected.

Comment 5 Sunil Kumar Acharya 2017-02-06 07:39:24 UTC

As detailed in my previous update, I am not able to recreate the issue. I have also discussed it with QA (Nag) and he is fine with the observation. 
Any suggestion?

Comment 6 Pranith Kumar K 2017-02-08 08:10:24 UTC

Sure, go ahead and close it as works for me.

Note You need to log in before you can comment on or make changes to this bug.