Bug 1381170

Summary: Files Not able to heal even clear source is available
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Karan Sandha <ksandha>
Component: arbiterAssignee: Ravishankar N <ravishankar>
Status: CLOSED NOTABUG QA Contact: Karan Sandha <ksandha>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: ksandha, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-04 06:23:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Karan Sandha 2016-10-03 10:09:41 UTC
Description of problem:
Created a 1*(2+1) volume created 10 files of different sizes. After the file creation add-brick was done to make the volume 4*(2+1). triggered fixed-layout and re-balance. but one of the file still in healing stage.


Version-Release number of selected component (if applicable):
[root@dhcp46-231 ~]# gluster --version
glusterfs 3.8.4 built on Sep 20 2016 07:17:14
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


How reproducible:
2/2
Logs are placed at 
rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

Steps to Reproduce:
1. Create a 1*(2+1) create 10 files of different sizes from fuse mount (TESTVOL)
2. Add-bricks to make the volume to 4*(2+1) total bricks:- 12
3. Trigger fixed-layout on the volume.
4. now check for heal info on the volume.
5. There is 1 file needing heal even when a clear source is available.


[root@dhcp46-231 ~]# gluster volume info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 828a2940-f899-4711-ad73-5c2ffadaa36d
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: dhcp46-231.lab.eng.blr.redhat.com:/bricks/brick0/testvol
Brick2: dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick0/testvol
Brick3: dhcp47-111.lab.eng.blr.redhat.com:/bricks/brick0/testvol (arbiter)
Brick4: dhcp46-231.lab.eng.blr.redhat.com:/bricks/brick1/testvol
Brick5: dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick1/testvol
Brick6: dhcp47-111.lab.eng.blr.redhat.com:/bricks/brick1/testvol (arbiter)
Brick7: dhcp46-231.lab.eng.blr.redhat.com:/bricks/brick2/testvol
Brick8: dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick2/testvol
Brick9: dhcp47-111.lab.eng.blr.redhat.com:/bricks/brick2/testvol (arbiter)
Brick10: dhcp46-231.lab.eng.blr.redhat.com:/bricks/brick3/testvol
Brick11: dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick3/testvol
Brick12: dhcp47-111.lab.eng.blr.redhat.com:/bricks/brick3/testvol (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
cluster.eager-lock: off
cluster.locking-scheme: granular
diagnostics.client-log-level: INFO



Actual results:
Heal is pending to be completed.

Expected results:
Heal should be completed.


Additional info:

[root@dhcp46-231 testvol]# g file7
# file: file7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol-client-1=0x0000004a0000000000000000
trusted.bit-rot.version=0x020000000000000057ea10560001eb8f
trusted.gfid=0x33a1cf65636744688dc285724a0a0e2d


######################################
[root@dhcp46-50 testvol]# g file7
# file: file7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000160000000000000000
trusted.afr.testvol-client-1=0x000000490000000000000000
trusted.bit-rot.version=0x030000000000000057f0d20a0000d059
trusted.gfid=0x33a1cf65636744688dc285724a0a0e2d


##############################################


[root@dhcp47-111 testvol]# g file7
# file: file7
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.testvol-client-1=0x0000004a0000000000000000
trusted.bit-rot.version=0x020000000000000057e8cea7000e9e00
trusted.gfid=0x33a1cf65636744688dc285724a0a0e2d

Comment 2 Ravishankar N 2016-10-03 12:26:07 UTC
From the initial observations from Karan's setup, afr_selfheal_data_do() is failing with ENOTCONN (all bricks are up and shd is connected to them though) due to which subsequent functions like afr_selfheal_undo_pending() is not happening and the heal never completes.

I was not able to see at what point in afr_selfheal_data_do() the code fails because the values seem to be optimized out when debugging with gdb and break points don't work as expected. There are 2 places which do return ENOTCONN but those conditions are not true in the function and the gdb issue is not helping in finding where the problem is.

I have provided my dev VMs (with the latest downstream source)to Karan to see if the issue can be re-created on them.

Comment 3 Ravishankar N 2016-10-04 05:01:01 UTC
So I managed to debug further with Karan's setup and it was found that the sink brick (Brick2: dhcp46-50.lab.eng.blr.redhat.com:/bricks/brick0/testvol)  was 100% full 

---------------------------------------------------------------
[root@dhcp46-50 brick0]# df -hT|grep brick0
/dev/mapper/RHS_vg0-RHS_lv0      xfs       6.5G  6.5G   20K 100% /bricks/brick0

[root@dhcp46-50 ~]# cd /bricks/brick0/
[root@dhcp46-50 brick0]# ls
testvol
[root@dhcp46-50 brick0]# 
[root@dhcp46-50 brick0]# touch deleteme
touch: cannot touch ‘deleteme’: No space left on device
---------------------------------------------------------------

because of which there were short writes by posix_writev on this brick. 

afr_selfheal_data_do {

        for (off = 0; off < replies[source].poststat.ia_size; off += block) {

                if (AFR_COUNT (healed_sinks, priv->child_count) == 0) {
                        ret = -ENOTCONN;   ------------(1)
                        goto out;
                }

                ret = afr_selfheal_data_block(healed_sinks)------(2)
}

In the snippet above, (2) resets healed_sinks[2] to 0 due to short writes, because of which (1) is hit in the next iteration and afr returns ENOTCONN. So undo-pending never happens and the heal attempts go on forever.

Karan, shall I close the BZ? Sorry for not catching this sooner.

Comment 4 Karan Sandha 2016-10-04 06:22:04 UTC
Ravi, Yup go ahead.