1529488 – entries not getting cleared post healing of softlinks (stale entries showing up in heal info)

Bug 1529488 - entries not getting cleared post healing of softlinks (stale entries showing up in heal info)

Summary: entries not getting cleared post healing of softlinks (stale entries showing ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	disperse
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1527309
Blocks:	1534842 1534847 1534848
TreeView+	depends on / blocked

Reported:	2017-12-28 10:59 UTC by Ravishankar N
Modified:	2018-07-13 10:18 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-4.0.0
Clone Of:	1527309
Clones:	1534842 1534847 1534848 (view as bug list)
Environment:
Last Closed:	2018-03-15 11:24:00 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ravishankar N 2017-12-28 10:59:27 UTC

+++ This bug was initially created as a clone of Bug #1527309 +++

Description of problem:
======================
on an ec volume, stale entries of softlinks are not at all getting cleared even after healing is complete
[root@dhcp35-192 ecv]# gluster v heal ecv full
Launching heal operation to perform full self heal on volume ecv has been successful 
Use heal info commands to check status
[root@dhcp35-192 ecv]# gluster v heal ecv  info
Brick dhcp35-192.lab.eng.blr.redhat.com:/rhs/brick2/ecv
/var/run 
/var/lock 
/var/mail 
Status: Connected
Number of entries: 3

Brick dhcp35-214.lab.eng.blr.redhat.com:/rhs/brick2/ecv
/var/run 
/var/lock 
/var/mail 
Status: Connected
Number of entries: 3

Brick dhcp35-215.lab.eng.blr.redhat.com:/rhs/brick2/ecv
Status: Connected
Number of entries: 0


root@dhcp35-214 ecv]# ls /rhs/brick2/ecv/var/ -lh
total 8.0K
drwxr-xr-x.  2 root root    6 Dec 19 12:45 adm
drwxr-xr-x.  5 root root   44 Dec 19 12:46 cache
drwxr-xr-x.  2 root root    6 Dec 19 12:46 crash
drwxr-xr-x.  3 root root   34 Dec 19 12:46 db
drwxr-xr-x.  3 root root   18 Dec 19 12:46 empty
drwxr-xr-x.  2 root root    6 Dec 19 12:46 games
drwxr-xr-x.  2 root root    6 Dec 19 12:46 gopher
drwxr-xr-x.  3 root root   18 Dec 19 12:46 kerberos
drwxr-xr-x. 26 root root 4.0K Dec 19 12:45 lib
drwxr-xr-x.  2 root root    6 Dec 19 12:46 local
lrwxrwxrwx.  2 root root   11 Dec 19 12:45 lock -> ../run/lock
drwxr-xr-x.  9 root root 4.0K Dec 19 12:45 log
lrwxrwxrwx.  2 root root   10 Dec 19 12:46 mail -> spool/mail
drwxr-xr-x.  2 root root    6 Dec 19 12:46 nis
drwxr-xr-x.  2 root root    6 Dec 19 12:46 opt
drwxr-xr-x.  2 root root    6 Dec 19 12:46 preserve
lrwxrwxrwx.  2 root root    6 Dec 19 12:45 run -> ../run
drwxr-xr-x. 10 root root  114 Dec 19 12:46 spool
drwxr-xr-t.  3 root root   85 Dec 19 12:45 tmp
drwxr-xr-x.  2 root root    6 Dec 19 12:46 yp
[root@dhcp35-214 ecv]# 
 

Version-Release number of selected component (if applicable):
[root@dhcp35-78 ~]# rpm -qa|grep gluster
glusterfs-rdma-3.12.2-1.el7rhgs.x86_64
glusterfs-server-3.12.2-1.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-3.12.2-1.el7rhgs.x86_64
glusterfs-libs-3.12.2-1.el7rhgs.x86_64
glusterfs-fuse-3.12.2-1.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-1.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-api-3.12.2-1.el7rhgs.x86_64
python2-gluster-3.12.2-1.el7rhgs.x86_64
glusterfs-client-xlators-3.12.2-1.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.9.0-1.el7.x86_64
glusterfs-cli-3.12.2-1.el7rhgs.x86_64
[root@dhcp35-78 ~]# 


How reproducible:
================
2/2

Steps to Reproduce:
1.create a 4+2 ec volume
2.copied /var to mount point
3.from backend deleted var directory on one of the bricks
4. did an ls -lRt on mount
5. issued a heal command to heal files


Actual results:
=============
all files got healed except below 3 entries which were showing up in heal info , irrespective of number of time heal was triggered.
all the files were softlinks
[root@dhcp35-192 ecv]# gluster v heal ecv  info
Brick dhcp35-192.lab.eng.blr.redhat.com:/rhs/brick2/ecv
/var/run 
/var/lock 
/var/mail 
Status: Connected
Number of entries: 3

Brick dhcp35-214.lab.eng.blr.redhat.com:/rhs/brick2/ecv
/var/run 
/var/lock 
/var/mail 
Status: Connected
Number of entries: 3

Brick dhcp35-215.lab.eng.blr.redhat.com:/rhs/brick2/ecv
Status: Connected
Number of entries: 0

root@dhcp35-214 ecv]# ls /rhs/brick2/ecv/var/ -lh
total 8.0K
drwxr-xr-x.  2 root root    6 Dec 19 12:45 adm
drwxr-xr-x.  5 root root   44 Dec 19 12:46 cache
drwxr-xr-x.  2 root root    6 Dec 19 12:46 crash
drwxr-xr-x.  3 root root   34 Dec 19 12:46 db
drwxr-xr-x.  3 root root   18 Dec 19 12:46 empty
drwxr-xr-x.  2 root root    6 Dec 19 12:46 games
drwxr-xr-x.  2 root root    6 Dec 19 12:46 gopher
drwxr-xr-x.  3 root root   18 Dec 19 12:46 kerberos
drwxr-xr-x. 26 root root 4.0K Dec 19 12:45 lib
drwxr-xr-x.  2 root root    6 Dec 19 12:46 local
lrwxrwxrwx.  2 root root   11 Dec 19 12:45 lock -> ../run/lock
drwxr-xr-x.  9 root root 4.0K Dec 19 12:45 log
lrwxrwxrwx.  2 root root   10 Dec 19 12:46 mail -> spool/mail
drwxr-xr-x.  2 root root    6 Dec 19 12:46 nis
drwxr-xr-x.  2 root root    6 Dec 19 12:46 opt
drwxr-xr-x.  2 root root    6 Dec 19 12:46 preserve
lrwxrwxrwx.  2 root root    6 Dec 19 12:45 run -> ../run
drwxr-xr-x. 10 root root  114 Dec 19 12:46 spool
drwxr-xr-t.  3 root root   85 Dec 19 12:45 tmp
drwxr-xr-x.  2 root root    6 Dec 19 12:46 yp


--- Additional comment from Ashish Pandey on 2017-12-26 00:55:08 EST ---

upstream patch -
https://review.gluster.org/#/c/19070/

Comment 1 Worker Ant 2017-12-28 11:05:03 UTC

REVIEW: https://review.gluster.org/19070 (posix: delete stale gfid handles in nameless lookup) posted (#2) for review on master by Ravishankar N

Comment 2 Worker Ant 2018-01-16 03:45:32 UTC

COMMIT: https://review.gluster.org/19070 committed in master by \"Ravishankar N\" <ravishankar> with a commit message- posix: delete stale gfid handles in nameless lookup

..in order for self-heal of symlinks to work properly (see BZ for
details).

Change-Id: I9a011d00b07a690446f7fd3589e96f840e8b7501
BUG: 1529488
Signed-off-by: Ravishankar N <ravishankar>

Comment 3 Shyamsundar 2018-03-15 11:24:00 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.0.0, please open a new bug report.

glusterfs-4.0.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2018-March/000092.html
[2] https://www.gluster.org/pipermail/gluster-users/

Comment 4 Ravishankar N 2018-07-13 10:18:36 UTC

Just recording what was happening without the fix, with the test in the description so that its easier without going through all review comments in the patch/ trying it out again.

When we delete the symlink from the brick (and not the .glusterfs hardlink to it) and do look up from mount ,name heal will create a new inode. Thus the .glusterfs entry and the symlink are no longer hardlinks to each other.

This will cause metadata self-heal (setfattr) on the sink to fail:
-----------------------------------------------------------------------------
[2018-07-13 08:56:45.834709] E [posix-handle.c:334:posix_is_malformed_link] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f15b63622b0] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10b1a)[0x7f15a7b7db1a] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10cb5)[0x7f15a7b7dcb5] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x1108b)[0x7f15a7b7e08b] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x8a39)[0x7f15a7b75a39] ))))) 0-patchy-posix: malformed internal link FILE for /d/backends/patchy2/.glusterfs/53/4a/534ac265-b7f4-4a72-b621-6cc1c770b133
[2018-07-13 08:56:45.834784] E [MSGID: 113097] [posix-helpers.c:704:posix_istat] 0-patchy-posix: Failed to create handle path for 534ac265-b7f4-4a72-b621-6cc1c770b133/ [Stale file handle]
[2018-07-13 08:56:45.835132] E [posix-handle.c:334:posix_is_malformed_link] (--> /usr/local/lib/libglusterfs.so.0(_gf_log_callingfn+0x1ee)[0x7f15b63622b0] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10b1a)[0x7f15a7b7db1a] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x10cb5)[0x7f15a7b7dcb5] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x1108b)[0x7f15a7b7e08b] (--> /usr/local/lib/glusterfs/4.2dev/xlator/storage/posix.so(+0x2a18b)[0x7f15a7b9718b] ))))) 0-patchy-posix: malformed internal link FILE for /d/backends/patchy2/.glusterfs/53/4a/534ac265-b7f4-4a72-b621-6cc1c770b133
[2018-07-13 08:56:45.835176] E [MSGID: 113091] [posix-inode-fd-ops.c:321:posix_setattr] 0-patchy-posix: Failed to create inode handle for path /SOFTLINK
[2018-07-13 08:56:45.835202] E [MSGID: 113018] [posix-inode-fd-ops.c:327:posix_setattr] 0-patchy-posix: setattr (lstat) on <null> failed
[2018-07-13 08:56:45.835300] I [MSGID: 115072] [server-rpc-fops_v2.c:1612:server4_setattr_cbk] 0-patchy-server: 13110: SETATTR /SOFTLINK (534ac265-b7f4-4a72-b621-6cc1c770b133), client: CTX_ID:b242a09f-a32b-4019-b42b-7b8830e458fc-GRAPH_ID:0-PID:15159-HOST:ravi3-PC_NAME:patchy-client-2-RECON_NO:-0, error-xlator: -

-----------------------------------------------------------------------------
v1 of the patch tried to fix the issue by deleting the stale .glusterfs entry during  posix_symlink () (sent during selfheal)

v2 of the patch onwards fixes it by deleting it in lookup.

Note You need to log in before you can comment on or make changes to this bug.