+++ This bug was initially created as a clone of Bug #1527309 +++ Description of problem: ====================== on an ec volume, stale entries of softlinks are not at all getting cleared even after healing is complete [root@dhcp35-192 ecv]# gluster v heal ecv full Launching heal operation to perform full self heal on volume ecv has been successful Use heal info commands to check status [root@dhcp35-192 ecv]# gluster v heal ecv info Brick dhcp35-192.lab.eng.blr.redhat.com:/rhs/brick2/ecv /var/run /var/lock /var/mail Status: Connected Number of entries: 3 Brick dhcp35-214.lab.eng.blr.redhat.com:/rhs/brick2/ecv /var/run /var/lock /var/mail Status: Connected Number of entries: 3 Brick dhcp35-215.lab.eng.blr.redhat.com:/rhs/brick2/ecv Status: Connected Number of entries: 0 root@dhcp35-214 ecv]# ls /rhs/brick2/ecv/var/ -lh total 8.0K drwxr-xr-x. 2 root root 6 Dec 19 12:45 adm drwxr-xr-x. 5 root root 44 Dec 19 12:46 cache drwxr-xr-x. 2 root root 6 Dec 19 12:46 crash drwxr-xr-x. 3 root root 34 Dec 19 12:46 db drwxr-xr-x. 3 root root 18 Dec 19 12:46 empty drwxr-xr-x. 2 root root 6 Dec 19 12:46 games drwxr-xr-x. 2 root root 6 Dec 19 12:46 gopher drwxr-xr-x. 3 root root 18 Dec 19 12:46 kerberos drwxr-xr-x. 26 root root 4.0K Dec 19 12:45 lib drwxr-xr-x. 2 root root 6 Dec 19 12:46 local lrwxrwxrwx. 2 root root 11 Dec 19 12:45 lock -> ../run/lock drwxr-xr-x. 9 root root 4.0K Dec 19 12:45 log lrwxrwxrwx. 2 root root 10 Dec 19 12:46 mail -> spool/mail drwxr-xr-x. 2 root root 6 Dec 19 12:46 nis drwxr-xr-x. 2 root root 6 Dec 19 12:46 opt drwxr-xr-x. 2 root root 6 Dec 19 12:46 preserve lrwxrwxrwx. 2 root root 6 Dec 19 12:45 run -> ../run drwxr-xr-x. 10 root root 114 Dec 19 12:46 spool drwxr-xr-t. 3 root root 85 Dec 19 12:45 tmp drwxr-xr-x. 2 root root 6 Dec 19 12:46 yp [root@dhcp35-214 ecv]# Version-Release number of selected component (if applicable): [root@dhcp35-78 ~]# rpm -qa|grep gluster glusterfs-rdma-3.12.2-1.el7rhgs.x86_64 glusterfs-server-3.12.2-1.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-3.12.2-1.el7rhgs.x86_64 glusterfs-libs-3.12.2-1.el7rhgs.x86_64 glusterfs-fuse-3.12.2-1.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-1.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 glusterfs-api-3.12.2-1.el7rhgs.x86_64 python2-gluster-3.12.2-1.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-1.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.2.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.9.0-1.el7.x86_64 glusterfs-cli-3.12.2-1.el7rhgs.x86_64 [root@dhcp35-78 ~]# How reproducible: ================ 2/2 Steps to Reproduce: 1.create a 4+2 ec volume 2.copied /var to mount point 3.from backend deleted var directory on one of the bricks 4. did an ls -lRt on mount 5. issued a heal command to heal files Actual results: ============= all files got healed except below 3 entries which were showing up in heal info , irrespective of number of time heal was triggered. all the files were softlinks [root@dhcp35-192 ecv]# gluster v heal ecv info Brick dhcp35-192.lab.eng.blr.redhat.com:/rhs/brick2/ecv /var/run /var/lock /var/mail Status: Connected Number of entries: 3 Brick dhcp35-214.lab.eng.blr.redhat.com:/rhs/brick2/ecv /var/run /var/lock /var/mail Status: Connected Number of entries: 3 Brick dhcp35-215.lab.eng.blr.redhat.com:/rhs/brick2/ecv Status: Connected Number of entries: 0 root@dhcp35-214 ecv]# ls /rhs/brick2/ecv/var/ -lh total 8.0K drwxr-xr-x. 2 root root 6 Dec 19 12:45 adm drwxr-xr-x. 5 root root 44 Dec 19 12:46 cache drwxr-xr-x. 2 root root 6 Dec 19 12:46 crash drwxr-xr-x. 3 root root 34 Dec 19 12:46 db drwxr-xr-x. 3 root root 18 Dec 19 12:46 empty drwxr-xr-x. 2 root root 6 Dec 19 12:46 games drwxr-xr-x. 2 root root 6 Dec 19 12:46 gopher drwxr-xr-x. 3 root root 18 Dec 19 12:46 kerberos drwxr-xr-x. 26 root root 4.0K Dec 19 12:45 lib drwxr-xr-x. 2 root root 6 Dec 19 12:46 local lrwxrwxrwx. 2 root root 11 Dec 19 12:45 lock -> ../run/lock drwxr-xr-x. 9 root root 4.0K Dec 19 12:45 log lrwxrwxrwx. 2 root root 10 Dec 19 12:46 mail -> spool/mail drwxr-xr-x. 2 root root 6 Dec 19 12:46 nis drwxr-xr-x. 2 root root 6 Dec 19 12:46 opt drwxr-xr-x. 2 root root 6 Dec 19 12:46 preserve lrwxrwxrwx. 2 root root 6 Dec 19 12:45 run -> ../run drwxr-xr-x. 10 root root 114 Dec 19 12:46 spool drwxr-xr-t. 3 root root 85 Dec 19 12:45 tmp drwxr-xr-x. 2 root root 6 Dec 19 12:46 yp --- Additional comment from Ashish Pandey on 2017-12-26 00:55:08 EST --- upstream patch - https://review.gluster.org/#/c/19070/
REVIEW: https://review.gluster.org/19070 (posix: delete stale gfid handles in nameless lookup) posted (#2) for review on master by Ravishankar N
COMMIT: https://review.gluster.org/19070 committed in master by \"Ravishankar N\" <ravishankar> with a commit message- posix: delete stale gfid handles in nameless lookup ..in order for self-heal of symlinks to work properly (see BZ for details). Change-Id: I9a011d00b07a690446f7fd3589e96f840e8b7501 BUG: 1529488 Signed-off-by: Ravishankar N <ravishankar>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report. glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html [2] https://www.gluster.org/pipermail/gluster-users/
Just recording what was happening without the fix, with the test in the description so that its easier without going through all review comments in the patch/ trying it out again. When we delete the symlink from the brick (and not the .glusterfs hardlink to it) and do look up from mount ,name heal will create a new inode. Thus the .glusterfs entry and the symlink are no longer hardlinks to each other. This will cause metadata self-heal (setfattr) on the sink to fail: ----------------------------------------------------------------------------- [2018-07-13 08:56:45.834709] E [posix-handle.c:334:posix_is_malformed_link] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f15b63622b0] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10b1a)[0x7f15a7b7db1a] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10cb5)[0x7f15a7b7dcb5] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x1108b)[0x7f15a7b7e08b] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x8a39)[0x7f15a7b75a39] ))))) 0-patchy-posix: malformed internal link FILE for /d/backends/patchy2/.glusterfs/53/4a/534ac265-b7f4-4a72-b621-6cc1c770b133 [2018-07-13 08:56:45.834784] E [MSGID: 113097] [posix-helpers.c:704:posix_istat] 0-patchy-posix: Failed to create handle path for 534ac265-b7f4-4a72-b621-6cc1c770b133/ [Stale file handle] [2018-07-13 08:56:45.835132] E [posix-handle.c:334:posix_is_malformed_link] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f15b63622b0] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10b1a)[0x7f15a7b7db1a] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10cb5)[0x7f15a7b7dcb5] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x1108b)[0x7f15a7b7e08b] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x2a18b)[0x7f15a7b9718b] ))))) 0-patchy-posix: malformed internal link FILE for /d/backends/patchy2/.glusterfs/53/4a/534ac265-b7f4-4a72-b621-6cc1c770b133 [2018-07-13 08:56:45.835176] E [MSGID: 113091] [posix-inode-fd-ops.c:321:posix_setattr] 0-patchy-posix: Failed to create inode handle for path /SOFTLINK [2018-07-13 08:56:45.835202] E [MSGID: 113018] [posix-inode-fd-ops.c:327:posix_setattr] 0-patchy-posix: setattr (lstat) on <null> failed [2018-07-13 08:56:45.835300] I [MSGID: 115072] [server-rpc-fops_v2.c:1612:server4_setattr_cbk] 0-patchy-server: 13110: SETATTR /SOFTLINK (534ac265-b7f4-4a72-b621-6cc1c770b133), client: CTX_ID:b242a09f-a32b-4019-b42b-7b8830e458fc-GRAPH_ID:0-PID:15159-HOST:ravi3-PC_NAME:patchy-client-2-RECON_NO:-0, error-xlator: - ----------------------------------------------------------------------------- v1 of the patch tried to fix the issue by deleting the stale .glusterfs entry during posix_symlink () (sent during selfheal) v2 of the patch onwards fixes it by deleting it in lookup.