Bug 859581 - self-heal process can sometimes create directories instead of symlinks for the root gfid file in .glusterfs
self-heal process can sometimes create directories instead of symlinks for th...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: replicate (Show other bugs)
3.3.0
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Xavi Hernandez
: Reopened
Depends On:
Blocks: 867330 1055707 1066689 1099955
  Show dependency treegraph
 
Reported: 2012-09-22 00:46 EDT by Joe Julian
Modified: 2014-06-24 07:02 EDT (History)
7 users (show)

See Also:
Fixed In Version: glusterfs-3.5.1beta
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 867330 1055707 1066689 1099955 (view as bug list)
Environment:
Last Closed: 2014-06-24 07:02:54 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joe Julian 2012-09-22 00:46:06 EDT
Description of problem:
I was seeing errors self-healing "/". Upon checking the .glusterfs/00/00/00000000-0000-0000-0000-000000000001 stats, I discovered that some of my bricks had directories instead of symlinks. I replaced the directories with symlinks to ../../.. and set the gfid on those symlinks to 0x00000000000000000000000000000001 and healing was able to return to normal.

Version-Release number of selected component (if applicable):
3.3.0

How reproducible:
Unsure

Steps to Reproduce:
Sorry, there were a lot of things happening all at once so I'm not sure which one of them caused this to happen.

I do replica 3 volumes so that may be a variable in this.
Comment 1 Jeff Darcy 2012-10-31 09:07:28 EDT
Something affecting self-heal like this would normally make it urgent, but it looks like chance/frequency of occurrence might be low so I'll step it down one notch.
Comment 2 Vijay Bellur 2012-12-11 04:15:46 EST
Unable to reproduce this problem. Please feel free to re-open with more details (logs) if you happen to notice this problem again.
Comment 3 Joe Julian 2013-03-05 15:22:59 EST
I've had two more reports of this problem in IRC. Still no repro though.
Comment 4 Xavi Hernandez 2013-03-26 15:46:36 EDT
I've also suffered this problem.

As per pranithk's request on irc I post some information from a bad directory.

root@server:/pool/c/.glusterfs/1d/c2# ls -l
drwx------ 2 root root 4.0K Mar 20 16:39 1dc2745b-4e1b-41a1-ba9d-59bceb06809c
root@server:/pool/c/.glusterfs/1d/c2# getfattr -m ".*" -e hex -d 1dc2745b-4e1b-41a1-ba9d-59bceb06809c
root@server:/pool/c/.glusterfs/1d/c2# ls -l 1dc2745b-4e1b-41a1-ba9d-59bceb06809c
lrwxrwxrwx 1 root root 55 Mar 20 16:39 BACKUP -> ../../1d/c2/1dc2745b-4e1b-41a1-ba9d-59bceb06809c/BACKUP
root@server:/pool/c/<path to real directory># getfattr -m ".*" -e hex -d .
# file: .
trusted.afr.vol01-client-4=0x000000000000000000000000
trusted.afr.vol01-client-5=0x000000000000000000000000
trusted.gfid=0x1dc2745b4e1b41a1ba9d59bceb06809c
trusted.glusterfs.dht=0x0000000100000000b6db6db4db6db6d7

On another brick without problems:

aff395fe-2d22-49eb-afa1-85c6b70c600f -> ../../1d/c2/1dc2745b-4e1b-41a1-ba9d-59bceb06809c/BACKUP
1dc2745b-4e1b-41a1-ba9d-59bceb06809c -> ../../65/03/650342d0-58cf-48eb-927f-856698b9fff9/<parent directory of BACKUP>

Another example. It's not exactly the same case, but unfortunately I don't have the extended attributes of the real directory:

root@server:/pool/c/.glusterfs/cd/80# ls -l
drwx------ 2 root root   22 Mar 21 15:14 cd8019dd-880f-40d4-a18a-9a6e45ef0510
root@server:/pool/c/.glusterfs/cd/80# stat cd8019dd-880f-40d4-a18a-9a6e45ef0510
  File: `cd8019dd-880f-40d4-a18a-9a6e45ef0510'
  Size: 22              Blocks: 0          IO Block: 4096   directory
Device: 10302h/66306d   Inode: 2480359527  Links: 2
Access: (0700/drwx------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-03-21 20:37:13.314935893 +0100
Modify: 2013-03-21 15:14:26.416891579 +0100
Change: 2013-03-21 15:14:26.416891579 +0100
root@server:/pool/c/.glusterfs/cd/80# getfattr -m ".*" -e hex -d cd8019dd-880f-40d4-a18a-9a6e45ef0510
root@server:/pool/c/.glusterfs/cd/80# ls -l cd8019dd-880f-40d4-a18a-9a6e45ef0510
total 0
lrwxrwxrwx 1 root root 58 Mar 19 16:45 T02-19_20 -> ../../f0/96/f0968474-b319-4073-a453-eafe3bd7e60f/T02-19_20

On another brick where there is no problem:

b731f16a-d361-4e83-9e03-11f04d51ee08 -> ../../f0/96/f0968474-b319-4073-a453-eafe3bd7e60f/T02-19_20

It seems that there has been some kind of split-brain incorrectly solved.
Comment 5 Xavi Hernandez 2013-03-28 10:15:03 EDT
I have been able to reproduce the problem. I have had to modify directly the contents of one brick. I'm not sure how/if these modifications can happen without direct access to the brick.

[root@glnas01 ~]# gluster volume create vol01 replica 2 glnas01:/bricks/b01 glnas02:/bricks/b01
Creation of volume vol01 has been successful. Please start the volume to access data.
[root@glnas01 ~]# gluster volume start vol01
Starting volume vol01 has been successful
[root@glnas01 ~]# mount -t glusterfs glnas01:/vol01 /vol01
[root@glnas01 ~]# mkdir -p /vol01/dir1/dir2
[root@glnas01 ~]# getfattr -m. -e hex -d /bricks/b01/dir1
getfattr: Removing leading '/' from absolute path names
# file: bricks/b01/dir1
trusted.gfid=0x43e7a966ce8944e7ba8f2cb00fc0a16f

[root@glnas01 ~]# getfattr -m. -e hex -d /bricks/b01/dir1/dir2
getfattr: Removing leading '/' from absolute path names
# file: bricks/b01/dir1/dir2
trusted.gfid=0x923114807a9445819e1f38ae427a8b95

[root@glnas01 ~]# rm -f /bricks/b01/.glusterfs/43/e7/43e7a966-ce89-44e7-ba8f-2cb00fc0a16f
[root@glnas01 ~]# rmdir /bricks/b01/dir1/dir2
[root@glnas01 ~]# gluster volume heal vol01 full
Launching Heal operation on volume vol01 has been successful
Use heal info commands to check status
[root@glnas01 ~]# ls -l /bricks/b01/.glusterfs/43/e7
total 4
drwx------ 2 root root 4096 28 mar 14:53 43e7a966-ce89-44e7-ba8f-2cb00fc0a16f

The problem is caused by self-heal when it tries to regenerate dir2 with an existing gfid inside .glusterfs and at least one of the parent gfid's of dir2 does not exist.

In posix_handle_soft() newpath is built using MAKE_PATH_HANDLE() that returns /bricks/b01/.glusterfs/43/e7/43e7a966-ce89-44e7-ba8f-2cb00fc0a16f/dir2 instead of the expected /bricks/b01/.glusterfs/92/31/92311480-7a94-4581-9e1f-38ae427a8b95 because this last symbolic link exists and MAKE_PATH_HANDLE() tries to resolve it. However, as 43e7a966-ce89-44e7-ba8f-2cb00fc0a16f does not exist, it can't resolve it.

After that, a call to posix_handle_mkdir_hashes() creates the last two levels of the dirname of the path, in this case 'e7' and '43e7a966-ce89-44e7-ba8f-2cb00fc0a16f'.
Comment 6 Anand Avati 2013-05-23 06:06:04 EDT
REVIEW: http://review.gluster.org/5075 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#1) for review on master by Xavier Hernandez (xhernandez@datalab.es)
Comment 7 Bjoern Teipel 2013-08-15 21:00:27 EDT
Hey I have the same problems in case glusterfs is killed by the kernel.
OOM, still think the glusterfs has memory allocation issues.
It took me a long time to figure the issues out with the 00000000-0000-0000-0000-000000000001 link/directory

This is a serious issue for me as well and caused few hours downtime to cleanup the mess.
Comment 8 Anand Avati 2014-01-17 07:41:46 EST
REVIEW: http://review.gluster.org/5075 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#2) for review on master by Xavier Hernandez (xhernandez@datalab.es)
Comment 9 Anand Avati 2014-01-20 05:26:35 EST
REVIEW: http://review.gluster.org/6736 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#1) for review on release-3.5 by Xavier Hernandez (xhernandez@datalab.es)
Comment 10 Anand Avati 2014-01-20 05:35:40 EST
REVIEW: http://review.gluster.org/6737 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#1) for review on release-3.4 by Xavier Hernandez (xhernandez@datalab.es)
Comment 11 Anand Avati 2014-01-23 03:26:36 EST
REVIEW: http://review.gluster.org/5075 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#3) for review on master by Xavier Hernandez (xhernandez@datalab.es)
Comment 12 Anand Avati 2014-01-23 04:28:07 EST
REVIEW: http://review.gluster.org/6736 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#2) for review on release-3.5 by Xavier Hernandez (xhernandez@datalab.es)
Comment 13 Anand Avati 2014-01-23 04:35:33 EST
REVIEW: http://review.gluster.org/6737 (storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()) posted (#2) for review on release-3.4 by Xavier Hernandez (xhernandez@datalab.es)
Comment 14 Anand Avati 2014-05-02 12:34:51 EDT
COMMIT: http://review.gluster.org/5075 committed in master by Vijay Bellur (vbellur@redhat.com) 
------
commit c7838fbd6afd876c922e1ec681bbbcf73be653e5
Author: Xavier Hernandez <xhernandez@datalab.es>
Date:   Thu May 23 11:13:25 2013 +0200

    storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()
    
    Whenever a new directory is created, its corresponding gfid file must
    also be created. This was done first calling MAKE_HANDLE_PATH() to get
    the path of the gfid file, then calling posix_handle_mkdir_hashes() to
    create the parent directories of the gfid, and finally creating the
    soft-link.
    
    In normal circumstances, the gfid we want to create won't exist and
    MAKE_HANDLE_PATH() will return a simple path to the new gfid. However if
    the volume is damaged and a self-heal is running, it is possible that we
    try to create an already existing gfid. In this case, MAKE_HANDLE_PATH()
    will return a path to the directory instead of the path to the gfid.
    
    To solve this problem, every time a path to a gfid is needed, a call to
    MAKE_HANDLE_ABSPATH() is made instead of the call to MAKE_HANDLE_PATH().
    
    Change-Id: Ic319cc38c170434db8e86e2f89f0b8c28c0d611a
    BUG: 859581
    Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
    Reviewed-on: http://review.gluster.org/5075
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Comment 15 Anand Avati 2014-05-05 09:38:15 EDT
COMMIT: http://review.gluster.org/6736 committed in release-3.5 by Niels de Vos (ndevos@redhat.com) 
------
commit b3fd7004a4a579c64ed29ee7eeb7e0fa57a3591f
Author: Xavier Hernandez <xhernandez@datalab.es>
Date:   Thu May 23 11:13:25 2013 +0200

    storage/posix: do not dereference gfid symlinks before posix_handle_mkdir_hashes()
    
    Whenever a new directory is created, its corresponding gfid file must
    also be created. This was done first calling MAKE_HANDLE_PATH() to get
    the path of the gfid file, then calling posix_handle_mkdir_hashes() to
    create the parent directories of the gfid, and finally creating the
    soft-link.
    
    In normal circumstances, the gfid we want to create won't exist and
    MAKE_HANDLE_PATH() will return a simple path to the new gfid. However if
    the volume is damaged and a self-heal is running, it is possible that we
    try to create an already existing gfid. In this case, MAKE_HANDLE_PATH()
    will return a path to the directory instead of the path to the gfid.
    
    To solve this problem, every time a path to a gfid is needed, a call to
    MAKE_HANDLE_ABSPATH() is made instead of the call to MAKE_HANDLE_PATH().
    
    BUG: 859581
    Change-Id: I84405bf04562e647fc02445f45358e9451f9b479
    Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
    Reviewed-on: http://review.gluster.org/6736
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
    Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    Reviewed-by: Niels de Vos <ndevos@redhat.com>
Comment 16 Niels de Vos 2014-05-05 23:44:07 EDT
Moving to POST, still waiting for the merging of http://review.gluster.org/6737.
Comment 17 Niels de Vos 2014-05-25 05:06:50 EDT
The first (and last?) Beta for GlusterFS 3.5.1 has been released [1]. Please verify if the release solves this bug report for you. In case the glusterfs-3.5.1beta release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-May/040377.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/
Comment 18 Xavi Hernandez 2014-05-26 05:33:49 EDT
The problem seems to be fixed in this version.
Comment 19 Niels de Vos 2014-06-24 07:02:54 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.1, please reopen this bug report.

glusterfs-3.5.1 has been announced on the Gluster Users mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-June/040723.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.