Hide Forgot
Mounting with FUSE happens as follows: 1. mount(2) syscall invoked and completed. At this point the fs is not yet usable, all calls into it block. 2. Add an mtab entry via a contrived invocation of mount(8). 3. Proceed on to a handshake with the kernel (INIT message). After this fs goes alive and the calls to fs blocked in between 1. and 3. can complete. As mount(8) is implemented, step 2. involves a readlink(2) against the mountpoint. This is not a problem usually, but certain configurations of the Linux audit system imply that some preliminary checks are done against the readlink'd path, practically in the form of getxattr(2) calls, which effectively call into the file system, hence at this stage will block. So the mount(8) process in 2. can't complete, therefore we get stuck in step 2. * Original Fedora bugreport: https://bugzilla.redhat.com/show_bug.cgi?id=493565 * fuse-devel@ thread on it: http://thread.gmane.org/gmane.linux.file-systems/36651 * libfuse fix: http://fuse.cvs.sourceforge.net/viewvc/fuse/fuse/lib/mount_util.c?view=log#rev1.14 http://git.gluster.com/?p=users/csaba/fuse.git;a=commit;h=d6bc53b3d50776c6da4b6e221029c0d2e40f4db7 This fix makes use of an on-demand crafted mount(8) option which makes mount(8) to skip the call to readlink(2). * The respective commit to util-linux-ng: http://git.kernel.org/?p=utils/util-linux-ng/util-linux-ng.git;a=commit;h=v2.17-rc1-12-g45fc569
This is type of bug i saw in storage platform at times during mount after a reboot, since it was i pressed 3 finger salute, it made glusterfs crash in io-cache which is fixed. But the issue was exactly what you mentioned here. Atleast some one reproduced it elsewhere :).
(In reply to comment #1) > This is type of bug i saw in storage platform at times during mount after > a reboot How can that be? Are you sure it's the same bug? I mean, you say "at times", and "mount after a reboot"... With the kind of audit configuration which triggers it, it should occur quite deterministically. Csaba
PATCH: http://patches.gluster.com/patch/2638 in master (fuse: add mtab entry asynchronously)
This happened in Storage Platform, i have a audit config for Fedora11 which seems to be similar to what Fedora is referring in their bug tracker. This happens exactly at "waitpid()" just hangs around in fuse_mnt_add_mount() call. Check #364, which is fixed for a segfault, but there is another problem in the same backtrace, which for me seems quite similar to what you tried to fix in this patch or no?
(In reply to comment #4) > Check #364, which is fixed for a segfault, but there is another problem in the > same backtrace, which for me seems quite similar to what you tried to fix in > this > patch or no? Apparently seems so, I was just wondering about reproducibility.
Reproducing it is really tough some times, perhaps you can reproduce this in 5 in 1 reboots using a Fedora 11 virtual image. But how does you patch fixes this issue? now if you don't update the mtab and just give a failure how do we understand the volume is mounted and under use?. Does "/proc/mounts" has this value?. Should we catch this error?
(In reply to comment #6) > Reproducing it is really tough some times, perhaps you can reproduce this in 5 > in 1 reboots using a Fedora 11 virtual image. Well it's not worth for the time to reproduce it (I just took a modified mount(8) binary for testing which has explicitly called into the fs at that certain point), I just don't see how is it possible that once a system configured so that it's affected by the bug, it just shows up sporadically and not all the time. > But how does you patch fixes this issue? now if you don't update the mtab and > just > give a failure how do we understand the volume is mounted and under use?. Does > "/proc/mounts" has this value?. Should we catch this error? It's unlikely that mtab update would fail. Cases of non-existing mtab, symlink'd mtab, mtab on ro mount are checked in advance, and if it's like that, mtab is not even tried to update. So mtab update could fail if there is some error with the underlying fs, or disk full, etc. In this case: - the appropriate info is still there in /proc/mounts - the appropriate error msg is in the logs - this doesn't affect directly glusterfs functionality - however the system is quite likely pretty much f*cked up anyway So, then it's a system administration problem and not a Glusterfs issue. And wrt the other fix, Miklos + Karel Zak's mount(8) option hackery... I guess Fedora adopts that mount option and the libfuse patch to fix this on their behalf, but what if someone independently configures the system in similar way (like you in storage platform... well OK that's semi-independent), I don't wanna give them a choice to have their whining heard :P
(In reply to comment #0) Some fixups on the urls included in the original problem description. > * Original Fedora bugreport: > https://bugzilla.redhat.com/0 Somehow this url is garbled, correct one is https://bugzilla.redhat.com/show_bug.cgi?id=493565 > * libfuse fix: > http://fuse.cvs.sourceforge.net/viewvc/fuse/fuse/lib/mount_util.c?view=log#rev1.14 > http://git.gluster.com/?p=users/csaba/fuse.git;a=commit;h=d6bc53b3d50776c6da4b6e221029c0d2e40f4db7 Both these repos are gone, the current valid url is: http://fuse.git.sourceforge.net/git/gitweb.cgi?p=fuse/fuse;a=commitdiff;h=4c3d9b195
(In reply to comment #8) > (In reply to comment #0) > > Some fixups on the urls included in the original problem description. > > > * Original Fedora bugreport: > > https://bugzilla.redhat.com/0 > > Somehow this url is garbled, correct one is > > https://bugzilla.redhat.com/0 ?? wtf... "https://bugzilla.redhat.com/show_bug.cgi?id=493565" ie. "https://bugzilla.redhat.com/", bug id 493565
(In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #0) > > > > Some fixups on the urls included in the original problem description. > > > > > * Original Fedora bugreport: > > > https://bugzilla.redhat.com/0 > > > > Somehow this url is garbled, correct one is > > > > https://bugzilla.redhat.com/0 > > ?? wtf... > > "https://bugzilla.redhat.com/0" > > ie. "https://bugzilla.redhat.com/", bug id 493565 Oh I guess the poor thing tries to resolve the path in the foreign bugzilla url as a link to a local bug entry which then fails gloriously.