Bug 1529488

Summary: entries not getting cleared post healing of softlinks (stale entries showing up in heal info)
Product: [Community] GlusterFS Reporter: Ravishankar N <ravishankar>
Component: disperseAssignee: Ravishankar N <ravishankar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: aspandey, bugs, nchilaka, pkarampu, ravishankar
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-4.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1527309
: 1534842 1534847 1534848 (view as bug list) Environment:
Last Closed: 2018-03-15 11:24:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1527309    
Bug Blocks: 1534842, 1534847, 1534848    

Description Ravishankar N 2017-12-28 10:59:27 UTC
+++ This bug was initially created as a clone of Bug #1527309 +++

Description of problem:
======================
on an ec volume, stale entries of softlinks are not at all getting cleared even after healing is complete
[root@dhcp35-192 ecv]# gluster v heal ecv full
Launching heal operation to perform full self heal on volume ecv has been successful 
Use heal info commands to check status
[root@dhcp35-192 ecv]# gluster v heal ecv  info
Brick dhcp35-192.lab.eng.blr.redhat.com:/rhs/brick2/ecv
/var/run 
/var/lock 
/var/mail 
Status: Connected
Number of entries: 3

Brick dhcp35-214.lab.eng.blr.redhat.com:/rhs/brick2/ecv
/var/run 
/var/lock 
/var/mail 
Status: Connected
Number of entries: 3

Brick dhcp35-215.lab.eng.blr.redhat.com:/rhs/brick2/ecv
Status: Connected
Number of entries: 0


root@dhcp35-214 ecv]# ls /rhs/brick2/ecv/var/ -lh
total 8.0K
drwxr-xr-x.  2 root root    6 Dec 19 12:45 adm
drwxr-xr-x.  5 root root   44 Dec 19 12:46 cache
drwxr-xr-x.  2 root root    6 Dec 19 12:46 crash
drwxr-xr-x.  3 root root   34 Dec 19 12:46 db
drwxr-xr-x.  3 root root   18 Dec 19 12:46 empty
drwxr-xr-x.  2 root root    6 Dec 19 12:46 games
drwxr-xr-x.  2 root root    6 Dec 19 12:46 gopher
drwxr-xr-x.  3 root root   18 Dec 19 12:46 kerberos
drwxr-xr-x. 26 root root 4.0K Dec 19 12:45 lib
drwxr-xr-x.  2 root root    6 Dec 19 12:46 local
lrwxrwxrwx.  2 root root   11 Dec 19 12:45 lock -> ../run/lock
drwxr-xr-x.  9 root root 4.0K Dec 19 12:45 log
lrwxrwxrwx.  2 root root   10 Dec 19 12:46 mail -> spool/mail
drwxr-xr-x.  2 root root    6 Dec 19 12:46 nis
drwxr-xr-x.  2 root root    6 Dec 19 12:46 opt
drwxr-xr-x.  2 root root    6 Dec 19 12:46 preserve
lrwxrwxrwx.  2 root root    6 Dec 19 12:45 run -> ../run
drwxr-xr-x. 10 root root  114 Dec 19 12:46 spool
drwxr-xr-t.  3 root root   85 Dec 19 12:45 tmp
drwxr-xr-x.  2 root root    6 Dec 19 12:46 yp
[root@dhcp35-214 ecv]# 
 

Version-Release number of selected component (if applicable):
[root@dhcp35-78 ~]# rpm -qa|grep gluster
glusterfs-rdma-3.12.2-1.el7rhgs.x86_64
glusterfs-server-3.12.2-1.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.12.2-1.el7rhgs.x86_64
glusterfs-libs-3.12.2-1.el7rhgs.x86_64
glusterfs-fuse-3.12.2-1.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-1.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-api-3.12.2-1.el7rhgs.x86_64
python2-gluster-3.12.2-1.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-1.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.9.0-1.el7.x86_64
glusterfs-cli-3.12.2-1.el7rhgs.x86_64
[root@dhcp35-78 ~]# 


How reproducible:
================
2/2

Steps to Reproduce:
1.create a 4+2 ec volume
2.copied /var to mount point
3.from backend deleted var directory on one of the bricks
4. did an ls -lRt on mount
5. issued a heal command to heal files


Actual results:
=============
all files got healed except below 3 entries which were showing up in heal info , irrespective of number of time heal was triggered.
all the files were softlinks
[root@dhcp35-192 ecv]# gluster v heal ecv  info
Brick dhcp35-192.lab.eng.blr.redhat.com:/rhs/brick2/ecv
/var/run 
/var/lock 
/var/mail 
Status: Connected
Number of entries: 3

Brick dhcp35-214.lab.eng.blr.redhat.com:/rhs/brick2/ecv
/var/run 
/var/lock 
/var/mail 
Status: Connected
Number of entries: 3

Brick dhcp35-215.lab.eng.blr.redhat.com:/rhs/brick2/ecv
Status: Connected
Number of entries: 0

root@dhcp35-214 ecv]# ls /rhs/brick2/ecv/var/ -lh
total 8.0K
drwxr-xr-x.  2 root root    6 Dec 19 12:45 adm
drwxr-xr-x.  5 root root   44 Dec 19 12:46 cache
drwxr-xr-x.  2 root root    6 Dec 19 12:46 crash
drwxr-xr-x.  3 root root   34 Dec 19 12:46 db
drwxr-xr-x.  3 root root   18 Dec 19 12:46 empty
drwxr-xr-x.  2 root root    6 Dec 19 12:46 games
drwxr-xr-x.  2 root root    6 Dec 19 12:46 gopher
drwxr-xr-x.  3 root root   18 Dec 19 12:46 kerberos
drwxr-xr-x. 26 root root 4.0K Dec 19 12:45 lib
drwxr-xr-x.  2 root root    6 Dec 19 12:46 local
lrwxrwxrwx.  2 root root   11 Dec 19 12:45 lock -> ../run/lock
drwxr-xr-x.  9 root root 4.0K Dec 19 12:45 log
lrwxrwxrwx.  2 root root   10 Dec 19 12:46 mail -> spool/mail
drwxr-xr-x.  2 root root    6 Dec 19 12:46 nis
drwxr-xr-x.  2 root root    6 Dec 19 12:46 opt
drwxr-xr-x.  2 root root    6 Dec 19 12:46 preserve
lrwxrwxrwx.  2 root root    6 Dec 19 12:45 run -> ../run
drwxr-xr-x. 10 root root  114 Dec 19 12:46 spool
drwxr-xr-t.  3 root root   85 Dec 19 12:45 tmp
drwxr-xr-x.  2 root root    6 Dec 19 12:46 yp


--- Additional comment from Ashish Pandey on 2017-12-26 00:55:08 EST ---

upstream patch -
https://review.gluster.org/#/c/19070/

Comment 1 Worker Ant 2017-12-28 11:05:03 UTC
REVIEW: https://review.gluster.org/19070 (posix: delete stale gfid handles in nameless lookup) posted (#2) for review on master by Ravishankar N

Comment 2 Worker Ant 2018-01-16 03:45:32 UTC
COMMIT: https://review.gluster.org/19070 committed in master by \"Ravishankar N\" <ravishankar> with a commit message- posix: delete stale gfid handles in nameless lookup

..in order for self-heal of symlinks to work properly (see BZ for
details).

Change-Id: I9a011d00b07a690446f7fd3589e96f840e8b7501
BUG: 1529488
Signed-off-by: Ravishankar N <ravishankar>

Comment 3 Shyamsundar 2018-03-15 11:24:00 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report.

glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 4 Ravishankar N 2018-07-13 10:18:36 UTC
Just recording what was happening without the fix, with the test in the description so that its easier without going through all review comments in the patch/ trying it out again.

When we delete the symlink from the brick (and not the .glusterfs hardlink to it) and do look up from mount ,name heal will create a new inode. Thus the .glusterfs entry and the symlink are no longer hardlinks to each other.

This will cause metadata self-heal (setfattr) on the sink to fail:
-----------------------------------------------------------------------------
[2018-07-13 08:56:45.834709] E [posix-handle.c:334:posix_is_malformed_link] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f15b63622b0] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10b1a)[0x7f15a7b7db1a] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10cb5)[0x7f15a7b7dcb5] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x1108b)[0x7f15a7b7e08b] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x8a39)[0x7f15a7b75a39] ))))) 0-patchy-posix: malformed internal link FILE for /d/backends/patchy2/.glusterfs/53/4a/534ac265-b7f4-4a72-b621-6cc1c770b133
[2018-07-13 08:56:45.834784] E [MSGID: 113097] [posix-helpers.c:704:posix_istat] 0-patchy-posix: Failed to create handle path for 534ac265-b7f4-4a72-b621-6cc1c770b133/ [Stale file handle]
[2018-07-13 08:56:45.835132] E [posix-handle.c:334:posix_is_malformed_link] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f15b63622b0] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10b1a)[0x7f15a7b7db1a] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10cb5)[0x7f15a7b7dcb5] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x1108b)[0x7f15a7b7e08b] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x2a18b)[0x7f15a7b9718b] ))))) 0-patchy-posix: malformed internal link FILE for /d/backends/patchy2/.glusterfs/53/4a/534ac265-b7f4-4a72-b621-6cc1c770b133
[2018-07-13 08:56:45.835176] E [MSGID: 113091] [posix-inode-fd-ops.c:321:posix_setattr] 0-patchy-posix: Failed to create inode handle for path /SOFTLINK
[2018-07-13 08:56:45.835202] E [MSGID: 113018] [posix-inode-fd-ops.c:327:posix_setattr] 0-patchy-posix: setattr (lstat) on <null> failed
[2018-07-13 08:56:45.835300] I [MSGID: 115072] [server-rpc-fops_v2.c:1612:server4_setattr_cbk] 0-patchy-server: 13110: SETATTR /SOFTLINK (534ac265-b7f4-4a72-b621-6cc1c770b133), client: CTX_ID:b242a09f-a32b-4019-b42b-7b8830e458fc-GRAPH_ID:0-PID:15159-HOST:ravi3-PC_NAME:patchy-client-2-RECON_NO:-0, error-xlator: -

-----------------------------------------------------------------------------
v1 of the patch tried to fix the issue by deleting the stale .glusterfs entry during  posix_symlink () (sent during selfheal)

v2 of the patch onwards fixes it by deleting it in lookup.