+++ This bug was initially created as a clone of Bug #859581 +++ Description of problem: I was seeing errors self-healing "/". Upon checking the .glusterfs/00/00/00000000-0000-0000-0000-000000000001 stats, I discovered that some of my bricks had directories instead of symlinks. I replaced the directories with symlinks to ../../.. and set the gfid on those symlinks to 0x00000000000000000000000000000001 and healing was able to return to normal. Version-Release number of selected component (if applicable): 3.3.0 How reproducible: Unsure Steps to Reproduce: Sorry, there were a lot of things happening all at once so I'm not sure which one of them caused this to happen. I do replica 3 volumes so that may be a variable in this. --- Additional comment from Jeff Darcy on 2012-10-31 09:07:28 EDT --- Something affecting self-heal like this would normally make it urgent, but it looks like chance/frequency of occurrence might be low so I'll step it down one notch. --- Additional comment from Vijay Bellur on 2012-12-11 04:15:46 EST --- Unable to reproduce this problem. Please feel free to re-open with more details (logs) if you happen to notice this problem again. --- Additional comment from Joe Julian on 2013-03-05 15:22:59 EST --- I've had two more reports of this problem in IRC. Still no repro though. --- Additional comment from Xavier Hernandez on 2013-03-26 15:46:36 EDT --- I've also suffered this problem. As per pranithk's request on irc I post some information from a bad directory. root@server:/pool/c/.glusterfs/1d/c2# ls -l drwx------ 2 root root 4.0K Mar 20 16:39 1dc2745b-4e1b-41a1-ba9d-59bceb06809c root@server:/pool/c/.glusterfs/1d/c2# getfattr -m ".*" -e hex -d 1dc2745b-4e1b-41a1-ba9d-59bceb06809c root@server:/pool/c/.glusterfs/1d/c2# ls -l 1dc2745b-4e1b-41a1-ba9d-59bceb06809c lrwxrwxrwx 1 root root 55 Mar 20 16:39 BACKUP -> ../../1d/c2/1dc2745b-4e1b-41a1-ba9d-59bceb06809c/BACKUP root@server:/pool/c/<path to real directory># getfattr -m ".*" -e hex -d . # file: . trusted.afr.vol01-client-4=0x000000000000000000000000 trusted.afr.vol01-client-5=0x000000000000000000000000 trusted.gfid=0x1dc2745b4e1b41a1ba9d59bceb06809c trusted.glusterfs.dht=0x0000000100000000b6db6db4db6db6d7 On another brick without problems: aff395fe-2d22-49eb-afa1-85c6b70c600f -> ../../1d/c2/1dc2745b-4e1b-41a1-ba9d-59bceb06809c/BACKUP 1dc2745b-4e1b-41a1-ba9d-59bceb06809c -> ../../65/03/650342d0-58cf-48eb-927f-856698b9fff9/<parent directory of BACKUP> Another example. It's not exactly the same case, but unfortunately I don't have the extended attributes of the real directory: root@server:/pool/c/.glusterfs/cd/80# ls -l drwx------ 2 root root 22 Mar 21 15:14 cd8019dd-880f-40d4-a18a-9a6e45ef0510 root@server:/pool/c/.glusterfs/cd/80# stat cd8019dd-880f-40d4-a18a-9a6e45ef0510 File: `cd8019dd-880f-40d4-a18a-9a6e45ef0510' Size: 22 Blocks: 0 IO Block: 4096 directory Device: 10302h/66306d Inode: 2480359527 Links: 2 Access: (0700/drwx------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-03-21 20:37:13.314935893 +0100 Modify: 2013-03-21 15:14:26.416891579 +0100 Change: 2013-03-21 15:14:26.416891579 +0100 root@server:/pool/c/.glusterfs/cd/80# getfattr -m ".*" -e hex -d cd8019dd-880f-40d4-a18a-9a6e45ef0510 root@server:/pool/c/.glusterfs/cd/80# ls -l cd8019dd-880f-40d4-a18a-9a6e45ef0510 total 0 lrwxrwxrwx 1 root root 58 Mar 19 16:45 T02-19_20 -> ../../f0/96/f0968474-b319-4073-a453-eafe3bd7e60f/T02-19_20 On another brick where there is no problem: b731f16a-d361-4e83-9e03-11f04d51ee08 -> ../../f0/96/f0968474-b319-4073-a453-eafe3bd7e60f/T02-19_20 It seems that there has been some kind of split-brain incorrectly solved. --- Additional comment from Xavier Hernandez on 2013-03-28 10:15:03 EDT --- I have been able to reproduce the problem. I have had to modify directly the contents of one brick. I'm not sure how/if these modifications can happen without direct access to the brick. [root@glnas01 ~]# gluster volume create vol01 replica 2 glnas01:/bricks/b01 glnas02:/bricks/b01 Creation of volume vol01 has been successful. Please start the volume to access data. [root@glnas01 ~]# gluster volume start vol01 Starting volume vol01 has been successful [root@glnas01 ~]# mount -t glusterfs glnas01:/vol01 /vol01 [root@glnas01 ~]# mkdir -p /vol01/dir1/dir2 [root@glnas01 ~]# getfattr -m. -e hex -d /bricks/b01/dir1 getfattr: Removing leading '/' from absolute path names # file: bricks/b01/dir1 trusted.gfid=0x43e7a966ce8944e7ba8f2cb00fc0a16f [root@glnas01 ~]# getfattr -m. -e hex -d /bricks/b01/dir1/dir2 getfattr: Removing leading '/' from absolute path names # file: bricks/b01/dir1/dir2 trusted.gfid=0x923114807a9445819e1f38ae427a8b95 [root@glnas01 ~]# rm -f /bricks/b01/.glusterfs/43/e7/43e7a966-ce89-44e7-ba8f-2cb00fc0a16f [root@glnas01 ~]# rmdir /bricks/b01/dir1/dir2 [root@glnas01 ~]# gluster volume heal vol01 full Launching Heal operation on volume vol01 has been successful Use heal info commands to check status [root@glnas01 ~]# ls -l /bricks/b01/.glusterfs/43/e7 total 4 drwx------ 2 root root 4096 28 mar 14:53 43e7a966-ce89-44e7-ba8f-2cb00fc0a16f The problem is caused by self-heal when it tries to regenerate dir2 with an existing gfid inside .glusterfs and at least one of the parent gfid's of dir2 does not exist. In posix_handle_soft() newpath is built using MAKE_PATH_HANDLE() that returns /bricks/b01/.glusterfs/43/e7/43e7a966-ce89-44e7-ba8f-2cb00fc0a16f/dir2 instead of the expected /bricks/b01/.glusterfs/92/31/92311480-7a94-4581-9e1f-38ae427a8b95 because this last symbolic link exists and MAKE_PATH_HANDLE() tries to resolve it. However, as 43e7a966-ce89-44e7-ba8f-2cb00fc0a16f does not exist, it can't resolve it. After that, a call to posix_handle_mkdir_hashes() creates the last two levels of the dirname of the path, in this case 'e7' and '43e7a966-ce89-44e7-ba8f-2cb00fc0a16f'. --- Additional comment from Anand Avati on 2013-05-23 06:06:04 EDT --- REVIEW: http://review.gluster.org/5075 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#1) for review on master by Xavier Hernandez (xhernandez) --- Additional comment from Bjoern Teipel on 2013-08-15 21:00:27 EDT --- Hey I have the same problems in case glusterfs is killed by the kernel. OOM, still think the glusterfs has memory allocation issues. It took me a long time to figure the issues out with the 00000000-0000-0000-0000-000000000001 link/directory This is a serious issue for me as well and caused few hours downtime to cleanup the mess. --- Additional comment from Anand Avati on 2014-01-17 07:41:46 EST --- REVIEW: http://review.gluster.org/5075 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#2) for review on master by Xavier Hernandez (xhernandez) --- Additional comment from Anand Avati on 2014-01-20 05:26:35 EST --- REVIEW: http://review.gluster.org/6736 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#1) for review on release-3.5 by Xavier Hernandez (xhernandez) --- Additional comment from Anand Avati on 2014-01-20 05:35:40 EST --- REVIEW: http://review.gluster.org/6737 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#1) for review on release-3.4 by Xavier Hernandez (xhernandez) --- Additional comment from Anand Avati on 2014-01-23 03:26:36 EST --- REVIEW: http://review.gluster.org/5075 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#3) for review on master by Xavier Hernandez (xhernandez) --- Additional comment from Anand Avati on 2014-01-23 04:28:07 EST --- REVIEW: http://review.gluster.org/6736 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#2) for review on release-3.5 by Xavier Hernandez (xhernandez) --- Additional comment from Anand Avati on 2014-01-23 04:35:33 EST --- REVIEW: http://review.gluster.org/6737 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#2) for review on release-3.4 by Xavier Hernandez (xhernandez) --- Additional comment from Anand Avati on 2014-05-02 12:34:51 EDT --- COMMIT: http://review.gluster.org/5075 committed in master by Vijay Bellur (vbellur) ------ commit c7838fbd6afd876c922e1ec681bbbcf73be653e5 Author: Xavier Hernandez <xhernandez> Date: Thu May 23 11:13:25 2013 +0200 storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes() Whenever a new directory is created, its corresponding gfid file must also be created. This was done first calling MAKE_HANDLE_PATH() to get the path of the gfid file, then calling posix_handle_mkdir_hashes() to create the parent directories of the gfid, and finally creating the soft-link. In normal circumstances, the gfid we want to create won't exist and MAKE_HANDLE_PATH() will return a simple path to the new gfid. However if the volume is damaged and a self-heal is running, it is possible that we try to create an already existing gfid. In this case, MAKE_HANDLE_PATH() will return a path to the directory instead of the path to the gfid. To solve this problem, every time a path to a gfid is needed, a call to MAKE_HANDLE_ABSPATH() is made instead of the call to MAKE_HANDLE_PATH(). Change-Id: Ic319cc38c170434db8e86e2f89f0b8c28c0d611a BUG: 859581 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/5075 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu> Reviewed-by: Vijay Bellur <vbellur> --- Additional comment from Anand Avati on 2014-05-05 09:38:15 EDT --- COMMIT: http://review.gluster.org/6736 committed in release-3.5 by Niels de Vos (ndevos) ------ commit b3fd7004a4a579c64ed29ee7eeb7e0fa57a3591f Author: Xavier Hernandez <xhernandez> Date: Thu May 23 11:13:25 2013 +0200 storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes() Whenever a new directory is created, its corresponding gfid file must also be created. This was done first calling MAKE_HANDLE_PATH() to get the path of the gfid file, then calling posix_handle_mkdir_hashes() to create the parent directories of the gfid, and finally creating the soft-link. In normal circumstances, the gfid we want to create won't exist and MAKE_HANDLE_PATH() will return a simple path to the new gfid. However if the volume is damaged and a self-heal is running, it is possible that we try to create an already existing gfid. In this case, MAKE_HANDLE_PATH() will return a path to the directory instead of the path to the gfid. To solve this problem, every time a path to a gfid is needed, a call to MAKE_HANDLE_ABSPATH() is made instead of the call to MAKE_HANDLE_PATH(). BUG: 859581 Change-Id: I84405bf04562e647fc02445f45358e9451f9b479 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/6736 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Kaleb KEITHLEY <kkeithle> Reviewed-by: Raghavendra Bhat <raghavendra> Reviewed-by: Pranith Kumar Karampuri <pkarampu> Reviewed-by: Niels de Vos <ndevos> --- Additional comment from Niels de Vos on 2014-05-05 23:44:07 EDT --- Moving to POST, still waiting for the merging of http://review.gluster.org/6737.
REVIEW: http://review.gluster.org/6737 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#3) for review on release-3.4 by Xavier Hernandez (xhernandez)
COMMIT: http://review.gluster.org/6737 committed in release-3.4 by Kaleb KEITHLEY (kkeithle) ------ commit 4f8f96c62b21185f27d8e76912a808af80e22608 Author: Xavier Hernandez <xhernandez> Date: Thu May 23 11:13:25 2013 +0200 storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes() Whenever a new directory is created, its corresponding gfid file must also be created. This was done first calling MAKE_HANDLE_PATH() to get the path of the gfid file, then calling posix_handle_mkdir_hashes() to create the parent directories of the gfid, and finally creating the soft-link. In normal circumstances, the gfid we want to create won't exist and MAKE_HANDLE_PATH() will return a simple path to the new gfid. However if the volume is damaged and a self-heal is running, it is possible that we try to create an already existing gfid. In this case, MAKE_HANDLE_PATH() will return a path to the directory instead of the path to the gfid. To solve this problem, every time a path to a gfid is needed, a call to MAKE_HANDLE_ABSPATH() is made instead of the call to MAKE_HANDLE_PATH(). BUG: 1099955 Change-Id: I5bcd2b3c38d172c75946f33519e057e76d960a24 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/6737 Reviewed-by: Kaleb KEITHLEY <kkeithle> Tested-by: Gluster Build System <jenkins.com>