849526 – High write operations over NFS causes client mount lockup

Bug 849526 - High write operations over NFS causes client mount lockup

Summary: High write operations over NFS causes client mount lockup

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	nfs
Sub Component:
Version:	3.3.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	rjoseph
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-20 05:22 UTC by Lance Albertson
Modified:	2013-08-28 11:39 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-08-28 11:39:48 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace output of glusterfsd running at 100% (24.50 KB, application/x-gzip) 2012-08-20 05:22 UTC, Lance Albertson	no flags	Details
nfs profiling output (2.11 KB, text/plain) 2012-08-21 17:22 UTC, Lance Albertson	no flags	Details
View All

Description Lance Albertson 2012-08-20 05:22:15 UTC

Created attachment 605574 [details]
strace output of glusterfsd running at 100%

Description of problem:

I've noticed that when I try to do a large write operation (such as rsyncing 200M-1G files) the client's NFS mount eventually locks up. This volume is mounted via nfs and is a replica between two nodes. When I look on the nodes I found the glusterfsd process for the brick on the replica machine running at 100%.

So far my only workaround has been restart the volume.

I have attempted several different tweaks from increasing the cache-size for the volume, to sysctl tweaks, to disabling transparent huge pages. 

Version-Release number of selected component (if applicable):

gluster version 3.3.0

CentOS 6.3 x86_64

How reproducible:

Setup a two node gluster cluster running CentOS 6.3. Create a replica volume and mount it using nfs on another machine. Attempt to rsync files from 300M-1G in size and wait.
  
Actual results:

High write rates on nfs mount causes mount to lockup.

Expected results:

Not lock up the NFS mount completely when doing high writes.

Additional info:

I have tried to replicate this on another cluster that is setup nearly identical to this one. However I couldn't replicate it and the primary difference is the other cluster has four times more memory on it (16G vs. 64G). 

I have also noticed that if this goes untouched (such as overnight), the glusterfsd process will have a memory leak and eventually oom the machine. I have attached an strace output of the process running at 100%.

I have talked with Joe Julian on IRC about it over the last week and finally decided to make a bug for it. I can't seem to find a bug that fits this exactly that has a fix that works.

Comment 1 Krishna Srinivas 2012-08-20 08:28:38 UTC

Lance, thanks for testing it on the 64G machines and confirming that it can not be reproduced on this setup.
Is the NFS client machine different from the storage machines? 
are there any messages in "dmesg"? (on clients/servers)
When it is on 100%, can you do "/opt/glusterfs/sbin/gluster vol prof <volname> start nfs" and after a minute do "/opt/glusterfs/sbin/gluster vol prof <volname> info nfs" and give us the results?

Is the overnight memleak problem seen on 64G machines too?

Comment 2 Joe Julian 2012-08-20 14:43:52 UTC

To clarify, he's saying that the overnight memleak happens on the machine whose process is at 100%.

Comment 3 Lance Albertson 2012-08-21 17:22:17 UTC

Created attachment 605993 [details]
nfs profiling output

(In reply to comment #1)
> Lance, thanks for testing it on the 64G machines and confirming that it can
> not be reproduced on this setup.

I'm going to continue doing more tests on the 64G machines to verify this. I did a couple of tests but I probably should do some more.

> Is the NFS client machine different from the storage machines?

Yes, the NFS client machines are virtual machines running inside of KVM.

> are there any messages in "dmesg"? (on clients/servers)

Early on there was some messages about huge page but I have not seen this anymore after the tweaks I made to vm.vfs_cache_pressure and vm.swappiness. During this last test I saw no dmesg output on either of the storage nodes or the client.

> When it is on 100%, can you do "/opt/glusterfs/sbin/gluster vol prof
> <volname> start nfs" and after a minute do "/opt/glusterfs/sbin/gluster vol
> prof <volname> info nfs" and give us the results?

See attached file. This was ran while still doing an rsync while the process was running at 100%.

> Is the overnight memleak problem seen on 64G machines too?

No however they don't have the same type of workload that would typically trigger this. Also these machines have yet to show the 100% cpu problem.

Comment 4 Eco 2012-08-30 19:35:52 UTC

Spoke to Lance, he also tested setting vm.dirty_background_ratio and vm.dirty_ratio to lower values (< 10), the issue still occurred.

Lance, is it possible to attach sosreports from the server where the fail occurs and one of the 64GB machines where the issue isn't observed?  I don't think the sos command is on CentOS by default but it should exist in the default yum repos.

Comment 5 Krishna Srinivas 2012-10-22 10:19:54 UTC

Are there any more updates on this bug?

The profile output indicates around 4K lookups. which is not much to cause 100% cpu. Looks like glusterfsd goes into an infinite loop. When this happens can you attach gdb to it and do a "backtrace" ? we will know exactly where it enters the infinite loop.

Note You need to log in before you can comment on or make changes to this bug.