Bug 1635979 - Writes taking very long time leading to system hogging
Summary: Writes taking very long time leading to system hogging
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 4.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
: 1635977 (view as bug list)
Depends On: 1591208 1625961
Blocks: 1635975 1635977
TreeView+ depends on / blocked
 
Reported: 2018-10-04 07:19 UTC by Pranith Kumar K
Modified: 2018-11-29 15:25 UTC (History)
6 users (show)

Fixed In Version: glusterfs-4.1.6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1625961
Environment:
Last Closed: 2018-11-29 15:25:05 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Comment 1 Pranith Kumar K 2018-10-04 07:25:12 UTC
I/O load generating script:
==================================
#!/usr/bin/perl
$offset=0;
#$len=2;
#while($offset<1000)
#open( $h, '>>', "$ARGV[0]" ) or die $!;
open( $h, '>', "$ARGV[0]" ) or die $!;

while(1)
{
#open( $h, '>>', "$ARGV[0]" ) or die $!;  #opening file for write in append mode, filename is fed at cli argument

seek( $h, $offset, 1 );

#print $h "offset:$offset \n ";
print $h "offset:$offset ";
#$offset=$offset + 20;
$offset=$offset + 1;

}
==================================

1.create a 1x3 volume mount it on fuse client
2.run the perl script above on an input file where IOs must happen(infinite loop where writes)
eg: script.pl <some_filename>
3. now reboot brick

Machine becomes slow and operations will be very very slow.

Comment 2 Worker Ant 2018-10-04 07:31:28 UTC
REVIEW: https://review.gluster.org/21339 (cluster/afr: Batch writes in same lock even when multiple fds are open) posted (#1) for review on release-4.1 by Pranith Kumar Karampuri

Comment 3 Worker Ant 2018-10-05 14:39:21 UTC
COMMIT: https://review.gluster.org/21339 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana@redhat.com> with a commit message- cluster/afr: Batch writes in same lock even when multiple fds are open

Problem:
When eager-lock is disabled because of multiple-fds opened and app
writes come on conflicting regions, the number of locks grows very
fast leading to all the CPU being spent just in locking and unlocking
by traversing huge queues in locks xlator for granting locks.

Fix:
Reduce the number of locks in transit by bundling the writes in the
same lock and disable delayed piggy-pack when we learn that multiple
fds are open on the file. This will reduce the size of queues in the
locks xlator.  This also reduces the number of network calls like
inodelk/fxattrop.

Please note that this problem can still happen if eager-lock is
disabled as the writes will not be bundled in the same lock.

fixes bz#1635979
Change-Id: I8fd1cf229aed54ce5abd4e6226351a039924dd91
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>

Comment 4 Pranith Kumar K 2018-10-22 14:06:48 UTC
*** Bug 1635977 has been marked as a duplicate of this bug. ***

Comment 5 Shyamsundar 2018-11-29 15:25:05 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.1.6, please open a new bug report.

glusterfs-4.1.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-November/000116.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.