Problem: Two applications running on different glusterfs mount points (with same configuration) write and read from the same file in a synchronous manner (app 1 waits for app 2 to read the data which it has written before writing again). With all the read-caching translators enabled (atleast io-cache and quick-read), app 2 should be able to read the updated data written by app1. Solution: 1. direct-io-mode should be enabled in kernel. 2. read-caching translators can have their cache-timeout value set to 0, so that validation is done for every read and data can be read from backend if file is found to be changed. But, both io-cache and quick-read does not support nano second resolution in mtime, which needs to be supported. regards, Raghavendra
PATCH: http://patches.gluster.com/patch/3100 in master (fuse: change behavior of direct io mode.)
PATCH: http://patches.gluster.com/patch/3101 in master (performance/io-cache: make use of nano second resolution of mtime during cache validation.)
PATCH: http://patches.gluster.com/patch/3102 in master (performance/quick-read: make use of nanosecond resolution of mtime to decide whether to keep cache or not.)
PATCH: http://patches.gluster.com/patch/3107 in release-2.0 (fuse: change behavior of direct io mode.)
PATCH: http://patches.gluster.com/patch/3108 in release-2.0 (core/protocol.h: add nanosecond resolution handling while converting to/from gf_stat_t and stat.)
PATCH: http://patches.gluster.com/patch/3109 in release-2.0 (performance/io-cache: make use of nano second resolution of mtime during cache validation.)
PATCH: http://patches.gluster.com/patch/3110 in release-2.0 (performance/quick-read: make use of nanosecond resolution of mtime to decide whether to keep cache or not.)
PATCH: http://patches.gluster.com/patch/3103 in release-3.0 (fuse: change behavior of direct io mode.)
PATCH: http://patches.gluster.com/patch/3104 in release-3.0 (performance/io-cache: make use of nano second resolution of mtime during cache validation.)
PATCH: http://patches.gluster.com/patch/3105 in release-3.0 (core/protocol.h: add nanosecond resolution handling while converting to/from gf_stat_t and stat.)
PATCH: http://patches.gluster.com/patch/3106 in release-3.0 (performance/quick-read: make use of nanosecond resolution of mtime to decide whether to keep cache or not.)
PATCH: http://patches.gluster.com/patch/3133 in master (protocol: fix endianness for nanosecond field in stat structure)
PATCH: http://patches.gluster.com/patch/3131 in release-2.0 (protocol: fix endianness for nanosecond field in stat structure)
PATCH: http://patches.gluster.com/patch/3132 in release-3.0 (protocol: fix endianness for nanosecond field in stat structure)
need one more command line change needed by Raghu
(In reply to comment #1) > PATCH: http://patches.gluster.com/patch/3100 in master (fuse: change behavior > of direct io mode.) This patch introduces a regression. With a simple fuse+posix configuration: # strace ./a.out execve("./a.out", ["./a.out"], [/* 25 vars */]) = -1 EFAULT (Bad address) It works fine at the previous commit: commit 7cb8982cbbe8298cd1bdd35055f7d3818f4a136f Author: Amar Tumballi <amar> Date: Wed Apr 7 04:19:48 2010 +0000 backword compatibility with 3.0.x releases - restored but breaks if you apply: commit 9c2bc1bc61af888192bde18170b113b4f6f8b4ca Author: Anand Avati <avati> Date: Mon Apr 5 13:35:45 2010 +0000 fuse: change behavior of direct io mode. Interestingly, this only affects the 'master' branch. The test works fine even with this patch on 3.0.4.
in default mode, only write should have 'direct-io' enabled.
*** Bug 929 has been marked as a duplicate of this bug. ***
(In reply to comment #16) > With a simple fuse+posix configuration: > > # strace ./a.out > execve("./a.out", ["./a.out"], [/* 25 vars */]) = -1 EFAULT (Bad address) [...] > but breaks if you apply: > > commit 9c2bc1bc61af888192bde18170b113b4f6f8b4ca Tried both simple execution and kernel make menuconfig with 9c2bc1bc61. It succeeds here, on 2.6.32. What system do you have?
(In reply to comment #19) root@booradley:/home/raghu# uname -a Linux booradley 2.6.24.5-smp #2 SMP Wed Apr 30 13:41:38 CDT 2008 i686 Intel(R) Pentium(R) Dual CPU T2330 @ 1.60GHz GenuineIntel GNU/Linux
starting glusterfs with --direct-io-mode=off will solve this problem.
*** Bug 1029 has been marked as a duplicate of this bug. ***
PATCH: http://patches.gluster.com/patch/4348 in master (fuse: introduce pre-test micro-framework, check for execve-over-direct-IO)
patch http://patches.gluster.com/patch/4348 results in Segfault on my laptop. Below is the backtrace. Program received signal SIGSEGV, Segmentation fault. 0xb7f2796d in inode_ref (inode=0xffffffff) at ../../../libglusterfs/src/inode.c:476 476 table = inode->table; (gdb) bt #0 0xb7f2796d in inode_ref (inode=0xffffffff) at ../../../libglusterfs/src/inode.c:476 #1 0xb71e897e in fuse_ino_to_inode (ino=4294967295, fuse=0x8072538) at ../../../../../xlators/mount/fuse/src/fuse-helpers.c:178 #2 0xb71e8b33 in fuse_loc_fill (loc=0x8143f34, state=0x8143f28, ino=4294967295, par=0, name=0x0) at ../../../../../xlators/mount/fuse/src/fuse-helpers.c:230 #3 0xb71f3f15 in fuse_open (this=0x8072538, finh=0x8143e68, msg=0x8143e90) at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:1633 #4 0xb71fbf55 in fuse_std_fallback (this=0x8072538, finh=0x8143e68, msg=0x8143e90) at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:3489 #5 0xb71fc27f in fuse_pre_open (this=0x8072538, finh=0x8143e68, msg=0x8143e90) at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:3572 #6 0xb71fb6af in fuse_thread_proc (data=0x8072538) at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:3195 #7 0xb7ec7383 in start_thread () from /lib/libpthread.so.0 #8 0xb7e4c05e in clone () from /lib/libc.so.6 (gdb) f 5 #5 0xb71fc27f in fuse_pre_open (this=0x8072538, finh=0x8143e68, msg=0x8143e90) at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:3572 3572 FUSE_PRE_TEST_TEST; (gdb) p fuse->nodeid No symbol "fuse" in current context. (gdb) p finh->nodeid $6 = 4294967295 (gdb) p (uint64_t) -1 $7 = 18446744073709551615 As we can see finh->nodeid is not equal to the one set in fuse_pre_lookup, hence it is falling back to fuse_std_fallback.
*** Bug 1515 has been marked as a duplicate of this bug. ***
PATCH: http://patches.gluster.com/patch/4292 in master (mount/fuse: By default enable direct-io only for fds not openened with O_RDONLY.)
I reopen this, as our current, defected approach to this problem is the culprit for the recently spotted "ping vs tail" issue (ie. in one shell we do a ping, directed to a file, in other shell we track that file with "tail -f", and what we get is just zeros). Miklos pointed out, but we didn't listen carefully (most of the blame is mine of course): with the current FUSE kernel code accessing a file with direct mode and buffered mode at the same time leads to inconsistencies (that's why he doesn't want to expose the O_DIRECT open flag to users, rather he lets the fs daemon to control direct/buffered mode, whom is expected to be careful about it). On the contrary, our chosen I/O policy for the "big writes not available" case -- ie. buffered mode for read-only opens (to support execve) and direct mode in other cases (to get beyond page sized I/O) directly violates this principle. That's why we get those zeros. (And the reason for not seeing the ping vs tail issue on recent kernels is that those kernels support big writes, in which case we have a sane policy; not some mystical kernel fix as I guessed.) I think the most user-friendly solution would be to store a flag in the inode what type of open (direct/buffered) it got for first time (if there is any), and subsequent opens would be of the same kind (clear the flag on final close). For the initial open we could just use the same policy we have as of now. Is that feasible, locking-wise, etc.?
Please update the status of this bug as its been more than 6months since its filed (bug id < 2000) Please resolve it with proper resolution if its not valid anymore. If its still valid and not critical, move it to 'enhancement' severity.
(In reply to comment #28) > Please update the status of this bug as its been more than 6months since its > filed (bug id < 2000) > > Please resolve it with proper resolution if its not valid anymore. If its still > valid and not critical, move it to 'enhancement' severity. It is not an enhancement, neither a moribund bugzilla entry; it is a valid bug of normal severity which can and should be fixed. Actually what happened is that we have committed a fix for it some time before. Then recently it turned out that the fix was buggy. My action taken upon it was to reopen this old bug; it should be kept on the close future to-do list now. Was it a bad idea to reopen? Setting now to normal; if you confirm that the ancient bz id annoys you then will close it and file a new entry for the currently observable irregularity.
Du, pls see comment #27 on the current issue and the proposed fix.
Amar, why did you set it to "enhancement"? The reason for this being open is a perfectly reproducible irregular behavior caused by a completely understood error in our code, with a plan for fixing it. If, by any chance, the reason is that from a release engineering POV, it's annoying to see a "normal" bug w/o a milestone being assigned to... then -- given this is a genuine *bug* -- setting it to "enhancement" is not the right way to make that annoyance go away. In that case, the proper treatment for the annoyance is to assign a milestone.
Done. Sorry for confusion.
(In reply to comment #32) > Done. > > Sorry for confusion. In fact I have to apologize... retrospectively, it was a bad idea to overload this old bug report with the issue with which we are dealing with now -- that was the root of all the confusion.
Planing to keep 3.4.x branch as "internal enhancements" release without any features. So moving these bugs to 3.4.0 target milestone.
*** Bug 3780 has been marked as a duplicate of this bug. ***
Patch @ review.gluster.com/20 fixes the Cadence application problem of corrupted output file.
*** Bug 3800 has been marked as a duplicate of this bug. ***
CHANGE: http://review.gluster.com/55 (When an fd is being opened, it inherits direct-io-mode characterstics) merged in master by Anand Avati (avati)
Csaba, Can I mark this bug as resolved/fixed? regards, Raghavendra.
(In reply to comment #39) > Csaba, > > Can I mark this bug as resolved/fixed? It can be resolved, fixes one of my issues.
(In reply to comment #39) > Csaba, > > Can I mark this bug as resolved/fixed? Yes, as soon as the fix gets committed for the 3.{1,2} branches -- not just that's what seems to be theoretically correct but the customer need which has put now this issue into focus is concerning 3.1, AFAIK.
ON_QA for upstream. Not sure if we will backport the fix to 3.{1,2}.x branches.
3.3.0qa43 does not fix the symptoms I was seeing in bug 765512.
Hi Joe, Can you please send us fuse-dump (msgs exchanged b/w fuse-kernel module and glusterfs) and strace of the application? You can use --dump-fuse option of glusterfs for getting fuse-dump. Also, is there any simple test-case which we can use to reproduce the bug locally? regards, Raghavendra.
*** Bug 811919 has been marked as a duplicate of this bug. ***
CHANGE: http://review.gluster.com/3531 (mount/fuse: use correct fdctx to inherit direct-io-values from.) merged in master by Anand Avati (avati)
already in master and release-3.3. Please upgrade to 3.3.0
*** Bug 844837 has been marked as a duplicate of this bug. ***