Bug 1635979

Summary: Writes taking very long time leading to system hogging
Product: [Community] GlusterFS Reporter: Pranith Kumar K <pkarampu>
Component: replicateAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1CC: bugs, nchilaka, pasik, rhs-bugs, sankarshan, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-4.1.6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1625961 Environment:
Last Closed: 2018-11-29 15:25:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1591208, 1625961    
Bug Blocks: 1635975, 1635977    

Comment 1 Pranith Kumar K 2018-10-04 07:25:12 UTC
I/O load generating script:
==================================
#!/usr/bin/perl
$offset=0;
#$len=2;
#while($offset<1000)
#open( $h, '>>', "$ARGV[0]" ) or die $!;
open( $h, '>', "$ARGV[0]" ) or die $!;

while(1)
{
#open( $h, '>>', "$ARGV[0]" ) or die $!;  #opening file for write in append mode, filename is fed at cli argument

seek( $h, $offset, 1 );

#print $h "offset:$offset \n ";
print $h "offset:$offset ";
#$offset=$offset + 20;
$offset=$offset + 1;

}
==================================

1.create a 1x3 volume mount it on fuse client
2.run the perl script above on an input file where IOs must happen(infinite loop where writes)
eg: script.pl <some_filename>
3. now reboot brick

Machine becomes slow and operations will be very very slow.

Comment 2 Worker Ant 2018-10-04 07:31:28 UTC
REVIEW: https://review.gluster.org/21339 (cluster/afr: Batch writes in same lock even when multiple fds are open) posted (#1) for review on release-4.1 by Pranith Kumar Karampuri

Comment 3 Worker Ant 2018-10-05 14:39:21 UTC
COMMIT: https://review.gluster.org/21339 committed in release-4.1 by "Shyamsundar Ranganathan" <srangana> with a commit message- cluster/afr: Batch writes in same lock even when multiple fds are open

Problem:
When eager-lock is disabled because of multiple-fds opened and app
writes come on conflicting regions, the number of locks grows very
fast leading to all the CPU being spent just in locking and unlocking
by traversing huge queues in locks xlator for granting locks.

Fix:
Reduce the number of locks in transit by bundling the writes in the
same lock and disable delayed piggy-pack when we learn that multiple
fds are open on the file. This will reduce the size of queues in the
locks xlator.  This also reduces the number of network calls like
inodelk/fxattrop.

Please note that this problem can still happen if eager-lock is
disabled as the writes will not be bundled in the same lock.

fixes bz#1635979
Change-Id: I8fd1cf229aed54ce5abd4e6226351a039924dd91
Signed-off-by: Pranith Kumar K <pkarampu>

Comment 4 Pranith Kumar K 2018-10-22 14:06:48 UTC
*** Bug 1635977 has been marked as a duplicate of this bug. ***

Comment 5 Shyamsundar 2018-11-29 15:25:05 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-4.1.6, please open a new bug report.

glusterfs-4.1.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-November/000116.html
[2] https://www.gluster.org/pipermail/gluster-users/