Bug 762613 (GLUSTER-881) - GlusterFS daemon hangs on replication of symlink (3.0.4)
Summary: GlusterFS daemon hangs on replication of symlink (3.0.4)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-881
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.0.3
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Anand Avati
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-03 10:30 UTC by Frank Enderle
Modified: 2015-12-01 16:45 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: DNR
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Frank Enderle 2010-05-03 10:30:56 UTC
i have a two node setup which is configured for replication mode (raid 1). i observed the following behaviour.

0. the config files have been created using glusterfs-volgen as described in the installation manual.

1. setup a shared volume in raid 1 mode
2. start glusterfsd on both nodes
3. mount the volume on both nodes on /export (for example)
4. execute the following commands:
   cd /export
   touch file
5. stop the glusterfsd on one node (e.g. node two)
6. execute the following commands
   cd /export
   ln -s /export/file test   (absolute path is mandatory to provoke the failure)
7. start the glusterfsd again
8. wait until the clients connect to the newly started daemon
9. execute the following command (on any of the nodes)
   ls /export

this command will lead to a hang stating the following debug infos on the newly started node:

[2010-05-03 12:18:09] D [server-resolve.c:238:resolve_path_deep] brick1: RESOLVE READLINK() seeking deep resolution of /test
[2010-05-03 12:18:09] D [server-protocol.c:2188:server_readlink_cbk] server-tcp: 13: READLINK /test (0) ==> -1 (No such file or directory)

a umount -f /export on the newly started glusterfs node followed by a mount /export resolves the problem. however this is not as expected.

Comment 1 Jean-Noel Rivasseau 2010-09-28 10:31:56 UTC
I am pretty sure I am also running into this. I did not do the same debugging session but realized this happened because some extended attribute were set on one node and not the other.

Eg, you can reproduce it by creating the symlink while both daemons are running, then manually adding:

setfattr -h -n trusted.afr.node1 -v 0sAAAAAQAAAAEAAAAA symlink

I hope this gets solved soon, while this is not the case it is unsafe to use absolute symlinks.

Comment 2 Vijay Bellur 2010-09-29 10:17:50 UTC
PATCH: http://patches.gluster.com/patch/5069 in master (storage/posix: prevent chmod() from getting called on symlinks)

Comment 3 Amar Tumballi 2010-10-05 06:01:17 UTC
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.

Comment 4 Amar Tumballi 2010-10-05 08:36:40 UTC
I guess we should not be having anymore issues with symlink self-healing. Avati, can you confirm and update the bug status?

Comment 5 Anand Avati 2010-12-05 06:41:27 UTC
PATCH: http://patches.gluster.com/patch/5808 in release-3.0 (storage/posix: prevent chmod() from getting called on symlinks)

Comment 6 Anand Avati 2010-12-06 10:20:48 UTC
PATCH: http://patches.gluster.com/patch/5817 in release-3.0 (check if the file is a symlink while doing utimes)

Comment 7 Lakshmipathi G 2010-12-06 10:55:37 UTC
verified,works with 3.0.7qa2.

Comment 8 Anand Avati 2010-12-07 03:18:54 UTC
PATCH: http://patches.gluster.com/patch/5818 in master (check whether the file is a symlink while doing utimes)

Comment 9 Amar Tumballi 2011-02-15 04:46:14 UTC
Internal enhancement, User need not be bothered.


Note You need to log in before you can comment on or make changes to this bug.