Bug 861015

Summary: Self-heal daemon referring to null gfid's
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: spandura
Component: glusterfsAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED ERRATA QA Contact: spandura
Severity: high Docs Contact:
Priority: medium    
Version: 2.0CC: grajaiya, laurent.chouinard, rfortier, rhs-bugs, shaines, surs, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.4rhs-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-09-23 22:33:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
glustershd log file. none

Description spandura 2012-09-27 10:14:40 UTC
Created attachment 617990 [details]
glustershd log file.

Description of problem:
-------------------------
In the self-heal daemon log messages we see "inode link failed on the inode (00000000-0000-0000-0000-000000000000)" when self-heal daemon is self-healing Virtual Machines.

Version-Release number of selected component (if applicable):
------------------------------------------------------------
[root@rhs-client6 ~]# gluster --version
glusterfs 3.3.0rhsvirt1 built on Sep 25 2012 14:53:06

[root@rhs-client6 ~]# rpm -qa | grep gluster
glusterfs-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-rdma-3.3.0rhsvirt1-6.el6rhs.x86_64
vdsm-gluster-4.9.6-14.el6rhs.noarch
gluster-swift-plugin-1.0-5.noarch
gluster-swift-container-1.4.8-4.el6.noarch
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-fuse-3.3.0rhsvirt1-6.el6rhs.x86_64
glusterfs-geo-replication-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-proxy-1.4.8-4.el6.noarch
gluster-swift-account-1.4.8-4.el6.noarch
gluster-swift-doc-1.4.8-4.el6.noarch
glusterfs-server-3.3.0rhsvirt1-6.el6rhs.x86_64
gluster-swift-1.4.8-4.el6.noarch
gluster-swift-object-1.4.8-4.el6.noarch

Steps to Reproduce:
--------------------
1. Create distribute replicate volume (2 x 2) . 4 servers, one brick on each server. Brick details:

brick1:-
rhs-client6.lab.eng.blr.redhat.com:/disk1 

brick2:-
rhs-client7.lab.eng.blr.redhat.com:/disk1

brick3:-
rhs-client8.lab.eng.blr.redhat.com:/disk1 

brick4:-
rhs-client9.lab.eng.blr.redhat.com:/disk1 

2. Powered OFF hosts rhs-client7.lab.eng.blr.redhat.com and rhs-client9.lab.eng.blr.redhat.com

3. Created new VM's from RHEVM.

4. Powered ON hosts rhs-client7.lab.eng.blr.redhat.com and rhs-client9.lab.eng.blr.redhat.com

5. self-heal daemon starts self-healing newly created VM's on to bricks "brick2" and "brick4"
  
Actual results:
---------------
[2012-09-26 11:52:00.358377] I [client-handshake.c:1614:select_server_supported_programs] 0-replicate-rhevh-client-1: Using Program GlusterFS 3.3.0rhsvirt1, Num (1298437), Version (330)
[2012-09-26 11:52:00.358754] I [client-handshake.c:1411:client_setvolume_cbk] 0-replicate-rhevh-client-1: Connected to 10.70.36.31:24012, attached to remote volume '/disk2'.
[2012-09-26 11:52:00.358792] I [client-handshake.c:1423:client_setvolume_cbk] 0-replicate-rhevh-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2012-09-26 11:52:00.361151] I [client-handshake.c:453:client_set_lk_version_cbk] 0-replicate-rhevh-client-1: Server lk version = 1
[2012-09-26 11:52:00.363141] I [client-handshake.c:1614:select_server_supported_programs] 0-dist-rep-rhevh-client-3: Using Program GlusterFS 3.3.0rhsvirt1, Num (1298437), Version (330)
[2012-09-26 11:52:00.363486] I [client-handshake.c:1411:client_setvolume_cbk] 0-dist-rep-rhevh-client-3: Connected to 10.70.36.33:24011, attached to remote volume '/disk1'.
[2012-09-26 11:52:00.363510] I [client-handshake.c:1423:client_setvolume_cbk] 0-dist-rep-rhevh-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2012-09-26 11:52:00.364092] I [client-handshake.c:453:client_set_lk_version_cbk] 0-dist-rep-rhevh-client-3: Server lk version = 1
[2012-09-26 11:54:06.497560] E [afr-self-heald.c:685:_link_inode_update_loc] 0-dist-rep-rhevh-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2012-09-26 11:54:06.498099] E [afr-self-heald.c:685:_link_inode_update_loc] 0-dist-rep-rhevh-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2012-09-26 11:54:06.498829] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 0-dist-rep-rhevh-client-0: remote operation failed: No such file or directory. Path: <gfid:0c27dbce-46e8-4ad3-8ba8-7ad94ebb47ae> (00000000-0000-0000-0000-000000000000). Key: glusterfs.gfid2path
[2012-09-26 11:54:06.514905] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 0-dist-rep-rhevh-client-0: remote operation failed: No such file or directory. Path: <gfid:6b498872-7387-47b2-a4d7-02eeb8c23c99> (00000000-0000-0000-0000-000000000000). Key: glusterfs.gfid2path
[2012-09-26 11:54:06.515234] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 0-dist-rep-rhevh-client-0: remote operation failed: No such file or directory. Path: <gfid:17146af7-a6a0-4b59-8063-5572d76aa8e6> (00000000-0000-0000-0000-000000000000). Key: glusterfs.gfid2path
[2012-09-26 11:54:06.515537] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 0-dist-rep-rhevh-client-0: remote operation failed: No such file or directory. Path: <gfid:c68681c4-7f15-43c8-aa99-a860482ab5a7> (00000000-0000-0000-0000-000000000000). Key: glusterfs.gfid2path
[2012-09-26 11:54:06.515860] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 0-dist-rep-rhevh-client-0: remote operation failed: No such file or directory. Path: <gfid:5725b541-26a3-4e4f-aeca-0ed974c3209e> (00000000-0000-0000-0000-000000000000). Key: glusterfs.gfid2path
[2012-09-26 11:54:06.516152] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 0-dist-rep-rhevh-client-0: remote operation failed: No such file or directory. Path: <gfid:306d4115-c5d7-4ff2-a32c-fd2bae17016b> (00000000-0000-0000-0000-000000000000). Key: glusterfs.gfid2path
[2012-09-26 11:54:06.516426] E [afr-self-heald.c:685:_link_inode_update_loc] 0-dist-rep-rhevh-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)

Comment 1 spandura 2012-09-27 10:17:16 UTC
Brick1 log messages at the same time of self-heal:-
--------------------------------------------------


[2012-09-26 11:54:06.498683] I [server3_1-fops.c:823:server_getxattr_cbk] 0-dist-rep-rhevh-server: 80: GETXATTR (null) (glusterfs.gfid2path) ==> -1 (No such file or directory)
[2012-09-26 11:54:06.514790] I [server3_1-fops.c:823:server_getxattr_cbk] 0-dist-rep-rhevh-server: 81: GETXATTR (null) (glusterfs.gfid2path) ==> -1 (No such file or directory)
[2012-09-26 11:54:06.515144] I [server3_1-fops.c:823:server_getxattr_cbk] 0-dist-rep-rhevh-server: 82: GETXATTR (null) (glusterfs.gfid2path) ==> -1 (No such file or directory)
[2012-09-26 11:54:06.515458] I [server3_1-fops.c:823:server_getxattr_cbk] 0-dist-rep-rhevh-server: 83: GETXATTR (null) (glusterfs.gfid2path) ==> -1 (No such file or directory)
[2012-09-26 11:54:06.515766] I [server3_1-fops.c:823:server_getxattr_cbk] 0-dist-rep-rhevh-server: 84: GETXATTR (null) (glusterfs.gfid2path) ==> -1 (No such file or directory)
[2012-09-26 11:54:06.516079] I [server3_1-fops.c:823:server_getxattr_cbk] 0-dist-rep-rhevh-server: 85: GETXATTR (null) (glusterfs.gfid2path) ==> -1 (No such file or directory)
[2012-09-26 11:54:06.516873] I [server3_1-fops.c:823:server_getxattr_cbk] 0-dist-rep-rhevh-server: 88: GETXATTR (null) (glusterfs.gfid2path) ==> -1 (No such file or directory)
[2012-09-26 11:54:06.517170] I [server3_1-fops.c:823:server_getxattr_cbk] 0-dist-rep-rhevh-server: 89: GETXATTR (null) (glusterfs.gfid2path) ==> -1 (No such file or directory)

Comment 3 Pranith Kumar K 2012-10-16 04:36:24 UTC
Steps to recreate inode-link failures:
1) Create a replicate volume.
2) Bring one of the bricks down.
3) create some files:
   for i in {1..10}; do dd if=/dev/zero of=$i bs=1M count=10; done
4) execute gluster volume heal info

Comment 4 Pranith Kumar K 2012-10-16 10:32:05 UTC
Steps to recreate getxattr failures:
[2012-10-16 16:02:41.822320] W [client3_1-fops.c:1114:client3_1_getxattr_cbk] 0-r2-client-1: remote operation failed: No such file or directory. Path: <gfid:4d221ab3-cacd-4971-91bf-d37d71eefb2c> (00000000-0000-0000-0000-000000000000). Key: glusterfs.gfid2path

1) Create a replicate volume.
2) Bring one of the bricks down.
3) create some files:
   for i in {1..10}; do dd if=/dev/zero of=$i bs=1M count=10; done
4) Delete the files from the mount point.
5) execute gluster volume heal info

Comment 6 Vijay Bellur 2012-10-29 03:27:15 UTC
CHANGE: http://review.gluster.org/4097 (protocols: Suppress getxattr log when errno is ENOENT) merged in master by Vijay Bellur (vbellur)

Comment 7 Vijay Bellur 2013-01-22 05:48:43 UTC
CHANGE: http://review.gluster.org/4090 (cluster/afr: Link inode only on lookup) merged in master by Anand Avati (avati)

Comment 8 Vijay Bellur 2013-01-22 06:04:05 UTC
CHANGE: http://review.gluster.org/4399 (Tests: Added function to get pending heal count from heal-info) merged in master by Anand Avati (avati)

Comment 9 Vijay Bellur 2013-01-22 06:07:22 UTC
CHANGE: http://review.gluster.org/4400 (Tests: functions for shd statedump, child_up_status) merged in master by Anand Avati (avati)

Comment 10 Vijay Bellur 2013-01-22 06:08:20 UTC
CHANGE: http://review.gluster.org/4401 (self-heald basic tests) merged in master by Anand Avati (avati)

Comment 11 Vijay Bellur 2013-01-22 06:08:55 UTC
CHANGE: http://review.gluster.org/4402 (Test to check if inode-link failures appear) merged in master by Anand Avati (avati)

Comment 12 Vijay Bellur 2013-01-23 03:44:04 UTC
CHANGE: http://review.gluster.org/4098 (self-heald: Remove stale index even in heal info) merged in master by Anand Avati (avati)

Comment 13 Vijay Bellur 2013-01-23 03:44:19 UTC
CHANGE: http://review.gluster.org/4408 (Tests: Add utils to get index-path, index-count) merged in master by Anand Avati (avati)

Comment 14 Vijay Bellur 2013-01-23 03:44:32 UTC
CHANGE: http://review.gluster.org/4409 (Tests: Check that stale indices are removed on heal-info) merged in master by Anand Avati (avati)

Comment 16 spandura 2013-07-09 11:31:19 UTC
Verified fix on the build by executing the steps as mentioned in Comment 4:


root@king [Jul-09-2013-17:00:10] >rpm -qa | grep glusterfs
glusterfs-fuse-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.12rhs.beta3-1.el6rhs.x86_64
glusterfs-devel-3.4.0.12rhs.beta3-1.el6rhs.x86_64
root@king [Jul-09-2013-17:00:15] >
root@king [Jul-09-2013-17:00:15] >
root@king [Jul-09-2013-17:00:17] >
root@king [Jul-09-2013-17:00:17] >gluster --version
glusterfs 3.4.0.12rhs.beta3 built on Jul  6 2013 14:35:18

Bug is fixed.

Comment 17 spandura 2013-07-23 12:36:50 UTC
Verified the bug on build with the steps specified in comment 4: 
=================================================================
root@king [Jul-23-2013-18:05:29] >rpm -qa | grep glusterfs-server
glusterfs-server-3.3.0.11rhs-1.el6rhs.x86_64
root@king [Jul-23-2013-18:05:38] >
root@king [Jul-23-2013-18:05:39] >gluster --version
glusterfs 3.3.0.11rhs built on Jul  3 2013 05:17:12

Bug is fixed on the above build too.

Comment 18 Scott Haines 2013-09-23 22:33:26 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html