Created attachment 238 [details] This is a log of a chat in irc.concentric.net in channel #phazed, which also evidences the security problems by the number of users named gdm.
Created attachment 239 [details] patch for .spec file
Created attachment 240 [details] new wersion of mktemp-1.4-linux.patch
While testing frequent dovecot index corruption on gluster, we have noted that rename() is not being handled atomically by gluster. Attached are two test programs and gluster config to reproduce this bug. reader.c and writer.c. Usage: gcc reader.c -o reader -Wall gcc writer.c -o writer -Wall touch dovecot.index ./writer& ./reader It should keep running and not printing anything. In local filesystems it does that. But when running on glusterfs, it fails: #./reader open(dovecot.index): No such file or directory This should never be happening. Apparently glusterfs doesn't handle rename() atomically in all situations.
PATCH: http://patches.gluster.com/patch/3528 in master (mount/fuse: Handle setting entry-timeout to 0.)
PATCH: http://patches.gluster.com/patch/3526 in release-3.0 (glusterfsd: Handle setting entry-timeout to 0.)
PATCH: http://patches.gluster.com/patch/3527 in release-3.0 (mount/fuse: Handle setting entry-timeout to 0.)
PATCH: http://patches.gluster.com/patch/3536 in master (glusterfsd: Handle setting entry-timeout to 0)
Most of the self-heal (replicate related) bugs are now fixed with 3.1.0 branch. As we are just week behind the GA release time.. we would like you to test the particular bug in 3.1.0RC releases, and let us know if its fixed.
Please update the status of this bug as its been more than 6months since its filed (bug id < 2000) Please resolve it with proper resolution if its not valid anymore. If its still valid and not critical, move it to 'enhancement' severity.
Some one in QA team, can you confirm whether this issue is fixed now ? (may be in master/3.2.1), and also make sure to include the scripts provided in our sanity.
(In reply to comment #10) > can you confirm whether this issue is fixed now ? I was about to file a duplicate bug for this problem, found in both 3.1.3 and 3.2.1. So, NO, this is not fixed in 3.2.1. In fact, I can add a bit more characterization to the problem. We are using replicated volumes and two test programs. One program renames new files onto old files. The other program opens and reads (only) the old files by name. The open/read test program gets frequent ENOENT returns from both open(2) and from read(2) folling a successful open, but only when run on the same node as the rename test program. When run from another node, the open/read test program never returns ENOENT. In fact, we have a dandy test case where one instance of the rename test program is launched on each node hosting a brick, renaming a disjoint set of files. One instance of the open/read test program is launched on each of the same nodes, but scanning the entire set of files being renamed. In other words, all instances of the open/read program are scanning the same total set of files, and each instance of the rename program are touching a non-overlapping set of files (in different directories, in fact). Each instance of the open/read program gets ENOENT errors for exactly and only the files being touched by the rename program running on the same node. Under our operating conditions, we can generate multiple spurious ENOENT returns per second.
Hi, There is work around for this issue which is related to renames followed by stat/open (Along with all the patches already committed). http://patches.gluster.com/patch/3632/ This prevents queuing of rename/stat/open in io-threats. Please let us know if this patch fixes the issue. This fix is a work around for this specific case, and hence will not be available in the releases
Created attachment 545
(In reply to comment #12) > Please let us know if this patch fixes the issue. We applied the patch to 3.2.1 and gave it a try. Unfortunately, that doesn't seem to have any impact on the problem we're seeing. I've attached our test programs, with instructions.
(In reply to comment #14) > We applied the patch to 3.2.1 and gave it a try. Unfortunately, > that doesn't seem to have any impact on the problem we're seeing. I've been reminded of something in our test configuration that may be relevant to reproducing the problem. Our bricks are all on USB flash devices, which are _slow_ and therefore could have a significant impact on any timing-related bugs. Assuming the timing is relevant, the dm-delay device mapper target (DM_DELAY Kconfig option) can be used to increase read and write latency for selected devices. That would be especially useful if you put timing-stressed regression tests in your QA suite.
(In reply to comment #15) > (In reply to comment #14) > > We applied the patch to 3.2.1 and gave it a try. Unfortunately, > > that doesn't seem to have any impact on the problem we're seeing. > > I've been reminded of something in our test configuration that may be > relevant to reproducing the problem. Our bricks are all on USB flash > devices, which are _slow_ and therefore could have a significant > impact on any timing-related bugs. > > Assuming the timing is relevant, the dm-delay device mapper target > (DM_DELAY Kconfig option) can be used to increase read and write > latency for selected devices. That would be especially useful if > you put timing-stressed regression tests in your QA suite. Thanks Stuart. We will test with this and get back to you.. Avati
We just need to run tests for checking for rename failures. In our preliminary testing, we couldn't find issues. Can't take this bug as 'blocker' for 3.3.0 release.
(In reply to comment #17) Just confirming that the problem has not gone away spontaneously for us, and that this is a critical problem preventing use of gluster for our application. Please let me know if there is anything additional information that would be helpful in reproducing this problem. In our environment, we can produce this multiple times a second. Also, our suspicions point to either the fuse module or the gluster-fuse interface. This "smells" like a bad cached entry on the node doing the rename. The repositories and all remote nodes have no difficulty seeing the results of the rename. Only the node doing the rename is affected.
We just upgraded the backing store for the gluster repository on our test cluster from USB flash drives to SSD's, and we are still seeing multiple occurrences per second. So it is not necessary to have incredibly slow devices to reproduce this problem. Same test programs as previously submitted. About 80% of the errors are ENOENT on open(2), while the rest are ENOENT on read(2) after a successful open(2) (which remains wonderfully bizarre to me).
This is still an issue with glusterfs-3.5.2. The steps in comment #3 result in the following error: open(dovecot.index): No such file or directory This error is hit in just a few seconds while running on a single-brick volume. Mounting with '-o entry-timeout=0' does not make a noticeable difference.
Several changes are needed for the test program to work 1. FUSE needs to send open intent along with lookup, if the lookup is being performed for an open() 2. Gluster needs to somehow use the intent flag, and keep the server side inode "alive" (as if already opened) for some time for a future actual open's guaranteed success even in the event of the file getting unlink'ed or rename'd over. 3. Server side resolver must serialize over dentries to fix the lookup/open + unlink race. 1 and 2 are somewhat related, 3 fixes a separate issue. all fixes are needed for this bug.
Guys any news on this ? I still have this issue. I have a file not found issue while renaming. Thanks.
Hi, I am also having this issue, running GlusterFS 3.6.2. I'm using a test setup similar to the ones that other users have, and I get problems trying to read during a rename - usually a "Stale file handle" message. Can we raise the priority on this? Needing rename() to behave atomically is a common use case, and it also breaks POSIX compliance.
Hi all, we also detected problems with rename. After testing our own file system, it turned out to be a problem in fuse, i.e. you can reproduce this bug on the basic fusexmp file system, or any other fuse based file system (sshfs for example).
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.