1381170 – Files Not able to heal even clear source is available

Bug 1381170 - Files Not able to heal even clear source is available

Summary: Files Not able to heal even clear source is available

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	arbiter
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Ravishankar N
QA Contact:	Karan Sandha
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-03 10:09 UTC by Karan Sandha
Modified:	2016-10-04 06:23 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-10-04 06:23:50 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Karan Sandha 2016-10-03 10:09:41 UTC

Description of problem:
Created a 1*(2+1) volume created 10 files of different sizes. After the file creation add-brick was done to make the volume 4*(2+1). triggered fixed-layout and re-balance. but one of the file still in healing stage.


Version-Release number of selected component (if applicable):
[root@dhcp46-231 ~]# gluster --version
glusterfs 3.8.4 built on Sep 20 2016 07:17:14
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


How reproducible:
2/2
Logs are placed at 
rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

Steps to Reproduce:
1. Create a 1*(2+1) create 10 files of different sizes from fuse mount (TESTVOL)
2. Add-bricks to make the volume to 4*(2+1) total bricks:- 12
3. Trigger fixed-layout on the volume.
4. now check for heal info on the volume.
5. There is 1 file needing heal even when a clear source is available.


[root@dhcp46-231 ~]# gluster volume info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 828a2940-f899-4711-ad73-5c2ffadaa36d
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: dhcp46-231.lab.eng.blr.redhat.com:/bricks/brick0/testvol
Brick2: dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick0/testvol
Brick3: dhcp47-111.lab.eng.blr.redhat.com:/bricks/brick0/testvol (arbiter)
Brick4: dhcp46-231.lab.eng.blr.redhat.com:/bricks/brick1/testvol
Brick5: dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick1/testvol
Brick6: dhcp47-111.lab.eng.blr.redhat.com:/bricks/brick1/testvol (arbiter)
Brick7: dhcp46-231.lab.eng.blr.redhat.com:/bricks/brick2/testvol
Brick8: dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick2/testvol
Brick9: dhcp47-111.lab.eng.blr.redhat.com:/bricks/brick2/testvol (arbiter)
Brick10: dhcp46-231.lab.eng.blr.redhat.com:/bricks/brick3/testvol
Brick11: dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick3/testvol
Brick12: dhcp47-111.lab.eng.blr.redhat.com:/bricks/brick3/testvol (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
cluster.eager-lock: off
cluster.locking-scheme: granular
diagnostics.client-log-level: INFO



Actual results:
Heal is pending to be completed.

Expected results:
Heal should be completed.


Additional info:

[root@dhcp46-231 testvol]# g file7
# file: file7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol-client-1=0x0000004a0000000000000000
trusted.bit-rot.version=0x020000000000000057ea10560001eb8f
trusted.gfid=0x33a1cf65636744688dc285724a0a0e2d


######################################
[root@dhcp46-50 testvol]# g file7
# file: file7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000160000000000000000
trusted.afr.testvol-client-1=0x000000490000000000000000
trusted.bit-rot.version=0x030000000000000057f0d20a0000d059
trusted.gfid=0x33a1cf65636744688dc285724a0a0e2d


##############################################


[root@dhcp47-111 testvol]# g file7
# file: file7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol-client-1=0x0000004a0000000000000000
trusted.bit-rot.version=0x020000000000000057e8cea7000e9e00
trusted.gfid=0x33a1cf65636744688dc285724a0a0e2d

Comment 2 Ravishankar N 2016-10-03 12:26:07 UTC

From the initial observations from Karan's setup, afr_selfheal_data_do() is failing with ENOTCONN (all bricks are up and shd is connected to them though) due to which subsequent functions like afr_selfheal_undo_pending() is not happening and the heal never completes.

I was not able to see at what point in afr_selfheal_data_do() the code fails because the values seem to be optimized out when debugging with gdb and break points don't work as expected. There are 2 places which do return ENOTCONN but those conditions are not true in the function and the gdb issue is not helping in finding where the problem is.

I have provided my dev VMs (with the latest downstream source)to Karan to see if the issue can be re-created on them.

Comment 3 Ravishankar N 2016-10-04 05:01:01 UTC

So I managed to debug further with Karan's setup and it was found that the sink brick (Brick2: dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick0/testvol)  was 100% full 

---------------------------------------------------------------
[root@dhcp46-50 brick0]# df -hT|grep brick0
/dev/mapper/RHS_vg0-RHS_lv0      xfs       6.5G  6.5G   20K 100% /bricks/brick0

[root@dhcp46-50 ~]# cd /bricks/brick0/
[root@dhcp46-50 brick0]# ls
testvol
[root@dhcp46-50 brick0]# 
[root@dhcp46-50 brick0]# touch deleteme
touch: cannot touch ‘deleteme’: No space left on device
---------------------------------------------------------------

because of which there were short writes by posix_writev on this brick. 

afr_selfheal_data_do {

        for (off = 0; off < replies[source].poststat.ia_size; off += block) {

                if (AFR_COUNT (healed_sinks, priv->child_count) == 0) {
                        ret = -ENOTCONN;   ------------(1)
                        goto out;
                }

                ret = afr_selfheal_data_block(healed_sinks)------(2)
}

In the snippet above, (2) resets healed_sinks[2] to 0 due to short writes, because of which (1) is hit in the next iteration and afr returns ENOTCONN. So undo-pending never happens and the heal attempts go on forever.

Karan, shall I close the BZ? Sorry for not catching this sooner.

Comment 4 Karan Sandha 2016-10-04 06:22:04 UTC

Ravi, Yup go ahead.

Note You need to log in before you can comment on or make changes to this bug.