Bug 1455301 - gluster-block is not working as expected when shard is enabled
Summary: gluster-block is not working as expected when shard is enabled
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: sharding
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On: 1454313
Blocks: 1456225
TreeView+ depends on / blocked
 
Reported: 2017-05-24 17:24 UTC by Pranith Kumar K
Modified: 2017-09-05 17:32 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.12.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1454313
: 1456225 (view as bug list)
Environment:
Last Closed: 2017-09-05 17:32:07 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Comment 1 Pranith Kumar K 2017-05-24 17:25:56 UTC
Description of problem:
Because gluster-block is storing metadata on the same volume as that of data and since metadata updates are multi-client writes, gluster-block create hangs and goes into a loop before it dies.
Reason is that the actual file size and filesize on the mount are differing and gluster-block is not able to understand if the operation succeeded or not.

[root@localhost block-meta]# ls -l /brick1/
block-meta/  block-store/ .glusterfs/  .shard/      .trashcan/   
[root@localhost block-meta]# ls -l /brick1/block-meta/1
-rw-------. 2 root root 52304 May 20 19:36 /brick1/block-meta/1 <<<---- true size.
[root@localhost block-meta]# ls -l 1
-rw-------. 1 root root 101 May 20 19:36 1 <<----- has truncated size.

When a file is opened with O_APPEND, offset gets ignored and the write buffer is always appended to the file. Where as shard doesn't ignore offset when the fd has O_APPEND. This is leading the size to be always stuck at 101 bytes because that is the biggest write that comes on the file:

Thread 2 "gluster-blockd" hit Breakpoint 1, shard_writev (frame=0x61200005391c, 
    this=0x61f00001a4c0, fd=0x61100000b21c, vector=0x60800000cee0, count=1, offset=0, 
    flags=0, iobref=0x60d00001d7c0, xdata=0x0) at shard.c:4827
4827	        shard_common_inode_write_begin (frame, this, GF_FOP_WRITE, fd, vector,
Missing separate debuginfos, use: dnf debuginfo-install json-c-0.12-7.fc24.x86_64 libacl-2.2.52-11.fc24.x86_64 libattr-2.4.47-16.fc24.x86_64 libstdc++-6.2.1-2.fc25.x86_64 sssd-client-1.14.2-1.fc25.x86_64
(gdb) dis 1
(gdb) c
Continuing.
[Switching to Thread 0x7fffe565a700 (LWP 9037)]

Thread 10 "gluster-blockd" hit Breakpoint 2, trace_writev_cbk (frame=0x612000053c1c, 
    cookie=0x61200005391c, this=0x61f0000196c0, op_ret=101, op_errno=0, 
    prebuf=0x61b00001a68c, postbuf=0x61b00001a6fc, xdata=0x611000052d9c) at trace.c:232
232	        char         preopstr[4096]  = {0, };
(gdb) p postbuf.ia_size
$1 = 101
(gdb) en 1
(gdb) c
Continuing.

Thread 10 "gluster-blockd" hit Breakpoint 1, shard_writev (frame=0x61200002841c, 
    this=0x61f00001a4c0, fd=0x61100003cf9c, vector=0x608000020be0, count=1, offset=0, 
    flags=0, iobref=0x60d00003d530, xdata=0x0) at shard.c:4827
4827	        shard_common_inode_write_begin (frame, this, GF_FOP_WRITE, fd, vector,
(gdb) c
Continuing.
[Switching to Thread 0x7fffe0f08700 (LWP 9038)]

Thread 11 "gluster-blockd" hit Breakpoint 2, trace_writev_cbk (frame=0x61200002871c, 
    cookie=0x61200002841c, this=0x61f0000196c0, op_ret=21, op_errno=0, prebuf=0x61b00000cd8c, 
    postbuf=0x61b00000cdfc, xdata=0x611000064bdc) at trace.c:232
232	        char         preopstr[4096]  = {0, };
(gdb) p postbuf.ia_size
$2 = 101
(gdb) c
Continuing.
[New Thread 0x7fffe04e8700 (LWP 9040)]
[New Thread 0x7fffdfcc4700 (LWP 9041)]
[New Thread 0x7fffdf490700 (LWP 9042)]

Thread 11 "gluster-blockd" hit Breakpoint 1, shard_writev (frame=0x61200003dd1c, 
    this=0x61f00001a4c0, fd=0x61100009479c, vector=0x608000032a60, count=1, offset=0, 
    flags=0, iobref=0x60d00006c800, xdata=0x0) at shard.c:4827
4827	        shard_common_inode_write_begin (frame, this, GF_FOP_WRITE, fd, vector,
(gdb) c
Continuing.
[Switching to Thread 0x7fffe565a700 (LWP 9037)]

Thread 10 "gluster-blockd" hit Breakpoint 2, trace_writev_cbk (frame=0x61200003e01c, 
    cookie=0x61200003dd1c, this=0x61f0000196c0, op_ret=33, op_errno=0, prebuf=0x61b00002b78c, 
    postbuf=0x61b00002b7fc, xdata=0x61100007e5dc) at trace.c:232
232	        char         preopstr[4096]  = {0, };
(gdb) p postbuf.ia_size
$3 = 101
(gdb) q
A debugging session is active.

	Inferior 1 [process 9024] will be killed.

After fixing the issue with:

[root@localhost r3]# gluster-block create r3/12 ha 3 192.168.122.61,192.168.122.123,192.168.122.113 1GiB
IQN: iqn.2016-12.org.gluster-block:1aef8052-2547-482e-9316-e41ba0e4b289
PORTAL(S):  192.168.122.61:3260 192.168.122.123:3260 192.168.122.113:3260
RESULT: SUCCESS
[root@localhost r3]# ls -l /brick1/block-meta/12
-rw-------. 2 root root 315 May 24 22:52 /brick1/block-meta/12
[root@localhost r3]# ls -l /mnt/block-meta/12
-rw-------. 1 root root 315 May 24 22:52 /mnt/block-meta/12

Comment 2 Worker Ant 2017-05-24 17:29:24 UTC
REVIEW: https://review.gluster.org/17387 (features/shard: Handle offset in appending writes) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Worker Ant 2017-05-25 16:49:30 UTC
REVIEW: https://review.gluster.org/17387 (features/shard: Handle offset in appending writes) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 4 Worker Ant 2017-05-25 16:53:48 UTC
REVIEW: https://review.gluster.org/17387 (features/shard: Handle offset in appending writes) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 5 Worker Ant 2017-05-26 09:46:57 UTC
REVIEW: https://review.gluster.org/17387 (features/shard: Handle offset in appending writes) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 6 Worker Ant 2017-05-27 16:00:54 UTC
COMMIT: https://review.gluster.org/17387 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit bea02e26a3967a6e679e30fbb77ecfeff1e71f37
Author: Pranith Kumar K <pkarampu>
Date:   Wed May 24 22:30:29 2017 +0530

    features/shard: Handle offset in appending writes
    
    When a file is opened with append, all writes are appended at the end of file
    irrespective of the offset given in the write syscall. This needs to be
    considered in shard size update function and also for choosing which shard to
    write to.
    
    At the moment shard piggybacks on queuing from write-behind
    xlator for ordering of the operations. So if write-behind is disabled and
    two parallel appending-writes come both of which can increase the file size
    beyond shard-size the file will be corrupted.
    
    BUG: 1455301
    Change-Id: I9007e6a39098ab0b5d5386367bd07eb5f89cb09e
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: https://review.gluster.org/17387
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Krutika Dhananjay <kdhananj>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 7 Shyamsundar 2017-09-05 17:32:07 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.