Bug 1324809 - arbiter volume write performance is bad.
Summary: arbiter volume write performance is bad.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: arbiter
Version: 3.7.10
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ravishankar N
QA Contact:
URL:
Whiteboard:
Depends On: 1324004
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-07 10:49 UTC by Ravishankar N
Modified: 2016-04-19 07:13 UTC (History)
1 user (show)

Fixed In Version: glusterfs-3.7.11
Doc Type: Bug Fix
Doc Text:
Clone Of: 1324004
Environment:
Last Closed: 2016-04-19 07:13:37 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Ravishankar N 2016-04-07 10:49:03 UTC
+++ This bug was initially created as a clone of Bug #1324004 +++

Reported by Robert Rauch @ https://bugzilla.redhat.com/show_bug.cgi?id=1309462#c50 and Russel Purinton @ http://www.spinics.net/lists/gluster-users/msg26311.html

Replica-3:
0:root@vm2 glusterfs$ gluster v create testvol replica 3  127.0.0.2:/bricks/brick{1..3} force
volume create: testvol: success: please start the volume to access data
0:root@vm2 glusterfs$ gluster v start testvol
volume start: testvol: success
0:root@vm2 glusterfs$ mount -t glusterfs 127.0.0.2:testvol /mnt/fuse_mnt
0:root@vm2 glusterfs$ cd /mnt/fuse_mnt/
0:root@vm2 fuse_mnt$ dd if=/dev/zero of=file bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.87984 s, 55.8 MB/s


Arbiter:
0:root@vm2 ~$ gluster v create testvol replica 3  arbiter 1 127.0.0.2:/bricks/brick{1..3} force
volume create: testvol: success: please start the volume to access data
0:root@vm2 ~$ gluster v start testvol
volume start: testvol: success
0:root@vm2 ~$ mount -t glusterfs 127.0.0.2:testvol /mnt/fuse_mnt
0:root@vm2 ~$ cd /mnt/fuse_mnt/
0:root@vm2 fuse_mnt$ dd if=/dev/zero of=file bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 7.51857 s, 13.9 MB/s

--- Additional comment from Vijay Bellur on 2016-04-05 06:51:37 EDT ---

REVIEW: http://review.gluster.org/13906 (arbiter: write performance improvement) posted (#1) for review on master by Ravishankar N (ravishankar@redhat.com)

--- Additional comment from Ravishankar N on 2016-04-05 20:46:41 EDT ---

Note: With the patch applied, here is the throughput I get:

Arbiter:
0:root@vm2 ~$ gluster v create testvol replica 3  arbiter 1 127.0.0.2:/bricks/brick{1..3} forcevolume create: testvol: success: please start the volume to access data
0:root@vm2 ~$ gluster v start testvol
volume start: testvol: success
0:root@vm2 ~$ mount -t glusterfs 127.0.0.2:testvol /mnt/fuse_mnt
0:root@vm2 ~$ dd if=/dev/zero of=/mnt/fuse_mnt/file bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.25445 s, 83.6 MB/s

--- Additional comment from Vijay Bellur on 2016-04-07 06:38:07 EDT ---

REVIEW: http://review.gluster.org/13906 (arbiter: write performance improvement) posted (#2) for review on master by Ravishankar N (ravishankar@redhat.com)

Comment 1 Vijay Bellur 2016-04-07 10:51:56 UTC
REVIEW: http://review.gluster.org/13925 (arbiter: write performance improvement) posted (#1) for review on release-3.7 by Ravishankar N (ravishankar@redhat.com)

Comment 2 Vijay Bellur 2016-04-09 06:41:42 UTC
REVIEW: http://review.gluster.org/13925 (arbiter: write performance improvement) posted (#2) for review on release-3.7 by Ravishankar N (ravishankar@redhat.com)

Comment 3 Vijay Bellur 2016-04-11 12:04:12 UTC
COMMIT: http://review.gluster.org/13925 committed in release-3.7 by Kaushal M (kaushal@redhat.com) 
------
commit c9c2c08d34003f49bc3a509757a135665fb20518
Author: Ravishankar N <ravishankar@redhat.com>
Date:   Tue Apr 5 15:16:52 2016 +0530

    arbiter: write performance improvement
    
    Backport of: http://review.gluster.org/#/c/13906
    
    Problem: The throughput for a 'dd' workload was much less for arbiter
    configuration when compared to normal replica-3 volume. There were 2
    issues:
    
    i)arbiter_writev was using the request dict as response dict while
    unwinding, leading to incorect GLUSTERFS_WRITE_IS_APPEND and
    GLUSTERFS_OPEN_FD_COUNT values (=4), leading to immediate post-ops
    because is_afr_delayed_changelog_post_op_needed() failed due to
    afr_are_multiple_fds_opened() check.
    
    ii) The arbiter code in afr was setting local->transaction.{start and len} =0
    to take full file locks. What this meant was even for simultaenous but
    non-overlapping writevs, afr_transaction_eager_lock_init() was not
    happening because afr_locals_overlap() always stays true. Consequently
    is_afr_delayed_changelog_post_op_needed() failed due to
    local->delayed_post_op not being set.
    
    Fix:
    i) Send appropriate response dict values in arbiter_writev.
    ii) Modify flock params instead of local->transaction.{start and len} to
    take full file locks in the transaction.
    
    Also changed _fill_writev_xdata() in posix to fill rsp_xdata for
    whatever key is requested for.
    
    Change-Id: I1c5fc5e98aba49ade540bb441a022e65b753432a
    BUG: 1324809
    Signed-off-by: Ravishankar N <ravishankar@redhat.com>
    Reported-by: Robert Rauch <robert.rauch@gns-systems.de>
    Reported-by: Russel Purinton <russell.purinton@gmail.com>
    Reviewed-on: http://review.gluster.org/13925
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>

Comment 4 Kaushal 2016-04-19 07:13:37 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.11, please open a new bug report.

glusterfs-3.7.11 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-April/026321.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.