Bug 762613 (GLUSTER-881)

Summary: GlusterFS daemon hangs on replication of symlink (3.0.4)
Product: [Community] GlusterFS Reporter: Frank Enderle <frank.enderle>
Component: replicateAssignee: Anand Avati <aavati>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0.3CC: amarts, anush, chrisw, elvanor, gluster-bugs, lakshmipathi, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: DNR CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Frank Enderle 2010-05-03 10:30:56 UTC
i have a two node setup which is configured for replication mode (raid 1). i observed the following behaviour.

0. the config files have been created using glusterfs-volgen as described in the installation manual.

1. setup a shared volume in raid 1 mode
2. start glusterfsd on both nodes
3. mount the volume on both nodes on /export (for example)
4. execute the following commands:
   cd /export
   touch file
5. stop the glusterfsd on one node (e.g. node two)
6. execute the following commands
   cd /export
   ln -s /export/file test   (absolute path is mandatory to provoke the failure)
7. start the glusterfsd again
8. wait until the clients connect to the newly started daemon
9. execute the following command (on any of the nodes)
   ls /export

this command will lead to a hang stating the following debug infos on the newly started node:

[2010-05-03 12:18:09] D [server-resolve.c:238:resolve_path_deep] brick1: RESOLVE READLINK() seeking deep resolution of /test
[2010-05-03 12:18:09] D [server-protocol.c:2188:server_readlink_cbk] server-tcp: 13: READLINK /test (0) ==> -1 (No such file or directory)

a umount -f /export on the newly started glusterfs node followed by a mount /export resolves the problem. however this is not as expected.

Comment 1 Jean-Noel Rivasseau 2010-09-28 10:31:56 UTC
I am pretty sure I am also running into this. I did not do the same debugging session but realized this happened because some extended attribute were set on one node and not the other.

Eg, you can reproduce it by creating the symlink while both daemons are running, then manually adding:

setfattr -h -n trusted.afr.node1 -v 0sAAAAAQAAAAEAAAAA symlink

I hope this gets solved soon, while this is not the case it is unsafe to use absolute symlinks.

Comment 2 Vijay Bellur 2010-09-29 10:17:50 UTC
PATCH: http://patches.gluster.com/patch/5069 in master (storage/posix: prevent chmod() from getting called on symlinks)

Comment 3 Amar Tumballi 2010-10-05 06:01:17 UTC
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.

Comment 4 Amar Tumballi 2010-10-05 08:36:40 UTC
I guess we should not be having anymore issues with symlink self-healing. Avati, can you confirm and update the bug status?

Comment 5 Anand Avati 2010-12-05 06:41:27 UTC
PATCH: http://patches.gluster.com/patch/5808 in release-3.0 (storage/posix: prevent chmod() from getting called on symlinks)

Comment 6 Anand Avati 2010-12-06 10:20:48 UTC
PATCH: http://patches.gluster.com/patch/5817 in release-3.0 (check if the file is a symlink while doing utimes)

Comment 7 Lakshmipathi G 2010-12-06 10:55:37 UTC
verified,works with 3.0.7qa2.

Comment 8 Anand Avati 2010-12-07 03:18:54 UTC
PATCH: http://patches.gluster.com/patch/5818 in master (check whether the file is a symlink while doing utimes)

Comment 9 Amar Tumballi 2011-02-15 04:46:14 UTC
Internal enhancement, User need not be bothered.