Bug 762533 (GLUSTER-801) - Direct io-mode support and related changes in caching translators.
Summary: Direct io-mode support and related changes in caching translators.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-801
Product: GlusterFS
Classification: Community
Component: fuse
Version: mainline
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Raghavendra G
QA Contact: Anush Shetty
URL: None
Whiteboard: None
: GLUSTER-929 GLUSTER-1029 GLUSTER-1515 GLUSTER-3780 765532 844837 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-05 07:56 UTC by Raghavendra G
Modified: 2013-07-24 17:43 UTC (History)
14 users (show)

Fixed In Version: glusterfs-3.4.0
Clone Of:
Environment:
Last Closed: 2013-07-24 17:43:47 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Raghavendra G 2010-04-05 07:56:07 UTC
Problem: Two applications running on different glusterfs mount points (with same configuration) write and read from the same file in a synchronous manner (app 1 waits for app 2 to read the data which it has written before writing again). With all the read-caching translators enabled (atleast io-cache and quick-read), app 2 should be able to read the updated data written by app1.

Solution:
1. direct-io-mode should be enabled in kernel.
2. read-caching translators can have their cache-timeout value set to 0, so that validation is done for every read and data can be read from backend if file is found to be changed. But, both io-cache and quick-read does not support nano second resolution in mtime, which needs to be supported.

regards,
Raghavendra

Comment 1 Anand Avati 2010-04-08 07:08:03 UTC
PATCH: http://patches.gluster.com/patch/3100 in master (fuse: change behavior of direct io mode.)

Comment 2 Anand Avati 2010-04-08 07:08:07 UTC
PATCH: http://patches.gluster.com/patch/3101 in master (performance/io-cache: make use of nano second resolution of mtime during cache validation.)

Comment 3 Anand Avati 2010-04-08 07:08:11 UTC
PATCH: http://patches.gluster.com/patch/3102 in master (performance/quick-read: make use of nanosecond resolution of mtime to decide whether to keep cache or not.)

Comment 4 Anand Avati 2010-04-08 07:08:38 UTC
PATCH: http://patches.gluster.com/patch/3107 in release-2.0 (fuse: change behavior of direct io mode.)

Comment 5 Anand Avati 2010-04-08 07:08:42 UTC
PATCH: http://patches.gluster.com/patch/3108 in release-2.0 (core/protocol.h: add nanosecond resolution handling while converting to/from gf_stat_t and stat.)

Comment 6 Anand Avati 2010-04-08 07:08:45 UTC
PATCH: http://patches.gluster.com/patch/3109 in release-2.0 (performance/io-cache: make use of nano second resolution of mtime during cache validation.)

Comment 7 Anand Avati 2010-04-08 07:08:49 UTC
PATCH: http://patches.gluster.com/patch/3110 in release-2.0 (performance/quick-read: make use of nanosecond resolution of mtime to decide whether to keep cache or not.)

Comment 8 Anand Avati 2010-04-08 07:09:00 UTC
PATCH: http://patches.gluster.com/patch/3103 in release-3.0 (fuse: change behavior of direct io mode.)

Comment 9 Anand Avati 2010-04-08 07:09:04 UTC
PATCH: http://patches.gluster.com/patch/3104 in release-3.0 (performance/io-cache: make use of nano second resolution of mtime during cache validation.)

Comment 10 Anand Avati 2010-04-08 07:09:07 UTC
PATCH: http://patches.gluster.com/patch/3105 in release-3.0 (core/protocol.h: add nanosecond resolution handling while converting to/from gf_stat_t and stat.)

Comment 11 Anand Avati 2010-04-08 07:09:11 UTC
PATCH: http://patches.gluster.com/patch/3106 in release-3.0 (performance/quick-read: make use of nanosecond resolution of mtime to decide whether to keep cache or not.)

Comment 12 Anand Avati 2010-04-08 16:57:09 UTC
PATCH: http://patches.gluster.com/patch/3133 in master (protocol: fix endianness for nanosecond field in stat structure)

Comment 13 Anand Avati 2010-04-08 16:57:12 UTC
PATCH: http://patches.gluster.com/patch/3131 in release-2.0 (protocol: fix endianness for nanosecond field in stat structure)

Comment 14 Anand Avati 2010-04-08 16:57:24 UTC
PATCH: http://patches.gluster.com/patch/3132 in release-3.0 (protocol: fix endianness for nanosecond field in stat structure)

Comment 15 Amar Tumballi 2010-04-20 08:16:28 UTC
need one more command line change needed by Raghu

Comment 16 Vikas Gorur 2010-04-28 20:15:55 UTC
(In reply to comment #1)
> PATCH: http://patches.gluster.com/patch/3100 in master (fuse: change behavior
> of direct io mode.)

This patch introduces a regression.

With a simple fuse+posix configuration:

# strace ./a.out
execve("./a.out", ["./a.out"], [/* 25 vars */]) = -1 EFAULT (Bad address)

It works fine at the previous commit:

commit 7cb8982cbbe8298cd1bdd35055f7d3818f4a136f
Author: Amar Tumballi <amar>
Date:   Wed Apr 7 04:19:48 2010 +0000

    backword compatibility with 3.0.x releases - restored

but breaks if you apply:

commit 9c2bc1bc61af888192bde18170b113b4f6f8b4ca
Author: Anand Avati <avati>
Date:   Mon Apr 5 13:35:45 2010 +0000

    fuse: change behavior of direct io mode.

Interestingly, this only affects the 'master' branch. The test works fine even with this patch on 3.0.4.

Comment 17 Amar Tumballi 2010-05-04 08:33:13 UTC
in default mode, only write should have 'direct-io' enabled.

Comment 18 Vikas Gorur 2010-05-14 13:43:08 UTC
*** Bug 929 has been marked as a duplicate of this bug. ***

Comment 19 Csaba Henk 2010-05-17 12:20:26 UTC
(In reply to comment #16)
> With a simple fuse+posix configuration:
> 
> # strace ./a.out
> execve("./a.out", ["./a.out"], [/* 25 vars */]) = -1 EFAULT (Bad address)
[...]
> but breaks if you apply:
> 
> commit 9c2bc1bc61af888192bde18170b113b4f6f8b4ca

Tried both simple execution and kernel make menuconfig with 9c2bc1bc61. It succeeds here, on 2.6.32. What system do you have?

Comment 20 Raghavendra G 2010-05-17 14:42:39 UTC
(In reply to comment #19)

root@booradley:/home/raghu# uname -a
Linux booradley 2.6.24.5-smp #2 SMP Wed Apr 30 13:41:38 CDT 2008 i686 Intel(R) Pentium(R) Dual  CPU  T2330  @ 1.60GHz GenuineIntel GNU/Linux

Comment 21 Raghavendra G 2010-08-20 03:56:22 UTC
starting glusterfs with --direct-io-mode=off will solve this problem.

Comment 22 Raghavendra G 2010-08-20 03:59:23 UTC
*** Bug 1029 has been marked as a duplicate of this bug. ***

Comment 23 Vijay Bellur 2010-09-02 13:50:14 UTC
PATCH: http://patches.gluster.com/patch/4348 in master (fuse: introduce pre-test micro-framework, check for execve-over-direct-IO)

Comment 24 Raghavendra G 2010-09-03 01:35:43 UTC
patch http://patches.gluster.com/patch/4348 results in Segfault on my laptop. Below is the backtrace.

Program received signal SIGSEGV, Segmentation fault.
0xb7f2796d in inode_ref (inode=0xffffffff) at ../../../libglusterfs/src/inode.c:476
476             table = inode->table;
(gdb) bt
#0  0xb7f2796d in inode_ref (inode=0xffffffff) at ../../../libglusterfs/src/inode.c:476
#1  0xb71e897e in fuse_ino_to_inode (ino=4294967295, fuse=0x8072538)
    at ../../../../../xlators/mount/fuse/src/fuse-helpers.c:178
#2  0xb71e8b33 in fuse_loc_fill (loc=0x8143f34, state=0x8143f28, ino=4294967295, par=0, name=0x0)
    at ../../../../../xlators/mount/fuse/src/fuse-helpers.c:230
#3  0xb71f3f15 in fuse_open (this=0x8072538, finh=0x8143e68, msg=0x8143e90)
    at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:1633
#4  0xb71fbf55 in fuse_std_fallback (this=0x8072538, finh=0x8143e68, msg=0x8143e90)
    at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:3489
#5  0xb71fc27f in fuse_pre_open (this=0x8072538, finh=0x8143e68, msg=0x8143e90)
    at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:3572
#6  0xb71fb6af in fuse_thread_proc (data=0x8072538) at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:3195
#7  0xb7ec7383 in start_thread () from /lib/libpthread.so.0
#8  0xb7e4c05e in clone () from /lib/libc.so.6
(gdb) f 5
#5  0xb71fc27f in fuse_pre_open (this=0x8072538, finh=0x8143e68, msg=0x8143e90)
    at ../../../../../xlators/mount/fuse/src/fuse-bridge.c:3572
3572            FUSE_PRE_TEST_TEST;
(gdb) p fuse->nodeid
No symbol "fuse" in current context.
(gdb) p finh->nodeid
$6 = 4294967295
(gdb) p (uint64_t) -1
$7 = 18446744073709551615

As we can see finh->nodeid is not equal to the one set in fuse_pre_lookup, hence it is falling back to fuse_std_fallback.

Comment 25 Vijay Bellur 2010-09-03 13:56:37 UTC
*** Bug 1515 has been marked as a duplicate of this bug. ***

Comment 26 Vijay Bellur 2010-09-21 08:02:42 UTC
PATCH: http://patches.gluster.com/patch/4292 in master (mount/fuse: By default enable direct-io only for fds not openened with O_RDONLY.)

Comment 27 Csaba Henk 2011-03-22 17:02:35 UTC
I reopen this, as our current, defected approach to this problem is the culprit for the recently spotted "ping vs tail" issue (ie. in one shell we do a ping, directed to a file, in other shell we track that file with "tail -f", and what we get is just zeros).

Miklos pointed out, but we didn't listen carefully (most of the blame is mine of course): with the current FUSE kernel code accessing a file with direct mode and buffered mode at the same time leads to inconsistencies (that's why he doesn't want to expose the O_DIRECT open flag to users, rather he lets the fs daemon to control direct/buffered mode, whom is expected to be careful about it).

On the contrary, our chosen I/O policy for the "big writes not available" case -- ie. buffered mode for read-only opens (to support execve) and direct mode in other cases (to get beyond page sized I/O) directly violates this principle. That's why we get those zeros. (And the reason for not seeing the ping vs tail issue on recent kernels is that those kernels support big writes, in which case we have a sane policy; not some mystical kernel fix as I guessed.)

I think the most user-friendly solution would be to store a flag in the inode what type of open (direct/buffered) it got for first time (if there is any), and subsequent opens would be of the same kind (clear the flag on final close). For the initial open we could just use the same policy we have as of now.

Is that feasible, locking-wise, etc.?

Comment 28 Amar Tumballi 2011-04-25 09:32:52 UTC
Please update the status of this bug as its been more than 6months since its filed (bug id < 2000)

Please resolve it with proper resolution if its not valid anymore. If its still valid and not critical, move it to 'enhancement' severity.

Comment 29 Csaba Henk 2011-04-25 10:36:50 UTC
(In reply to comment #28)
> Please update the status of this bug as its been more than 6months since its
> filed (bug id < 2000)
> 
> Please resolve it with proper resolution if its not valid anymore. If its still
> valid and not critical, move it to 'enhancement' severity.

It is not an enhancement, neither a moribund bugzilla entry; it is a valid bug of normal severity which can and should be fixed.

Actually what happened is that we have committed a fix for it some time before. Then recently it turned out that the fix was buggy. My action taken upon it was to reopen this old bug; it should be kept on the close future to-do list now. Was it a bad idea to reopen?

Setting now to normal; if you confirm that the ancient bz id annoys you then will close it and file a new entry for the currently observable irregularity.

Comment 30 Csaba Henk 2011-04-28 10:26:44 UTC
Du, pls see comment #27 on the current issue and the proposed fix.

Comment 31 Csaba Henk 2011-07-05 18:36:14 UTC
Amar, why did you set it to "enhancement"? The reason for this being open is a perfectly reproducible irregular behavior caused by a completely understood error in our code, with a plan for fixing it.

If, by any chance, the reason is that from a release engineering POV, it's annoying to see a "normal" bug w/o a milestone being assigned to... then -- given this is a genuine *bug* -- setting it to "enhancement" is not the right way to make that annoyance go away.

In that case, the proper treatment for the annoyance is to assign a milestone.

Comment 32 Amar Tumballi 2011-07-06 01:51:14 UTC
Done.

Sorry for confusion.

Comment 33 Csaba Henk 2011-07-06 13:49:40 UTC
(In reply to comment #32)
> Done.
> 
> Sorry for confusion.

In fact I have to apologize... retrospectively, it was a bad idea to overload this old bug report with the issue with which we are dealing with now -- that was the root of all the confusion.

Comment 34 Amar Tumballi 2011-09-27 05:49:50 UTC
Planing to keep 3.4.x branch as "internal enhancements" release without any features. So moving these bugs to 3.4.0 target milestone.

Comment 35 Csaba Henk 2011-11-03 21:24:51 UTC
*** Bug 3780 has been marked as a duplicate of this bug. ***

Comment 36 Harshavardhana 2011-11-23 20:58:00 UTC
Patch @ review.gluster.com/20 fixes the Cadence application problem of corrupted output file.

Comment 37 Harshavardhana 2011-11-23 21:19:36 UTC
*** Bug 3800 has been marked as a duplicate of this bug. ***

Comment 38 Anand Avati 2011-12-02 03:02:45 UTC
CHANGE: http://review.gluster.com/55 (When an fd is being opened, it inherits direct-io-mode characterstics) merged in master by Anand Avati (avati)

Comment 39 Raghavendra G 2011-12-02 05:48:32 UTC
Csaba,

Can I mark this bug as resolved/fixed?

regards,
Raghavendra.

Comment 40 Harshavardhana 2011-12-02 05:53:34 UTC
(In reply to comment #39)
> Csaba,
> 
> Can I mark this bug as resolved/fixed?

It can be resolved, fixes one of my issues.

Comment 41 Csaba Henk 2011-12-02 06:44:23 UTC
(In reply to comment #39)
> Csaba,
> 
> Can I mark this bug as resolved/fixed?

Yes, as soon as the fix gets committed for the 3.{1,2} branches -- not just that's what seems to be theoretically correct but the customer need which has put now this issue into focus is concerning 3.1, AFAIK.

Comment 42 Amar Tumballi 2012-02-20 06:52:50 UTC
ON_QA for upstream. Not sure if we will backport the fix to 3.{1,2}.x branches.

Comment 43 Joe Julian 2012-05-27 17:11:00 UTC
3.3.0qa43 does not fix the symptoms I was seeing in bug 765512.

Comment 44 Raghavendra G 2012-06-05 12:52:42 UTC
Hi Joe,

Can you please send us fuse-dump (msgs exchanged b/w fuse-kernel module and glusterfs) and strace of the application? You can use --dump-fuse option of glusterfs for getting fuse-dump. Also, is there any simple test-case which we can use to reproduce the bug locally?

regards,
Raghavendra.

Comment 45 Raghavendra G 2012-06-06 12:28:04 UTC
*** Bug 811919 has been marked as a duplicate of this bug. ***

Comment 46 Anand Avati 2012-06-06 18:00:50 UTC
CHANGE: http://review.gluster.com/3531 (mount/fuse: use correct fdctx to inherit direct-io-values from.) merged in master by Anand Avati (avati)

Comment 47 Amar Tumballi 2012-07-11 11:38:47 UTC
already in master and release-3.3. Please upgrade to 3.3.0

Comment 48 Niels de Vos 2012-08-01 07:25:25 UTC
*** Bug 844837 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.