Red Hat Bugzilla – Bug 849526
High write operations over NFS causes client mount lockup
Last modified: 2013-08-28 07:39:48 EDT
Created attachment 605574 [details]
strace output of glusterfsd running at 100%
Description of problem:
I've noticed that when I try to do a large write operation (such as rsyncing 200M-1G files) the client's NFS mount eventually locks up. This volume is mounted via nfs and is a replica between two nodes. When I look on the nodes I found the glusterfsd process for the brick on the replica machine running at 100%.
So far my only workaround has been restart the volume.
I have attempted several different tweaks from increasing the cache-size for the volume, to sysctl tweaks, to disabling transparent huge pages.
Version-Release number of selected component (if applicable):
gluster version 3.3.0
CentOS 6.3 x86_64
Setup a two node gluster cluster running CentOS 6.3. Create a replica volume and mount it using nfs on another machine. Attempt to rsync files from 300M-1G in size and wait.
High write rates on nfs mount causes mount to lockup.
Not lock up the NFS mount completely when doing high writes.
I have tried to replicate this on another cluster that is setup nearly identical to this one. However I couldn't replicate it and the primary difference is the other cluster has four times more memory on it (16G vs. 64G).
I have also noticed that if this goes untouched (such as overnight), the glusterfsd process will have a memory leak and eventually oom the machine. I have attached an strace output of the process running at 100%.
I have talked with Joe Julian on IRC about it over the last week and finally decided to make a bug for it. I can't seem to find a bug that fits this exactly that has a fix that works.
Lance, thanks for testing it on the 64G machines and confirming that it can not be reproduced on this setup.
Is the NFS client machine different from the storage machines?
are there any messages in "dmesg"? (on clients/servers)
When it is on 100%, can you do "/opt/glusterfs/sbin/gluster vol prof <volname> start nfs" and after a minute do "/opt/glusterfs/sbin/gluster vol prof <volname> info nfs" and give us the results?
Is the overnight memleak problem seen on 64G machines too?
To clarify, he's saying that the overnight memleak happens on the machine whose process is at 100%.
Created attachment 605993 [details]
nfs profiling output
(In reply to comment #1)
> Lance, thanks for testing it on the 64G machines and confirming that it can
> not be reproduced on this setup.
I'm going to continue doing more tests on the 64G machines to verify this. I did a couple of tests but I probably should do some more.
> Is the NFS client machine different from the storage machines?
Yes, the NFS client machines are virtual machines running inside of KVM.
> are there any messages in "dmesg"? (on clients/servers)
Early on there was some messages about huge page but I have not seen this anymore after the tweaks I made to vm.vfs_cache_pressure and vm.swappiness. During this last test I saw no dmesg output on either of the storage nodes or the client.
> When it is on 100%, can you do "/opt/glusterfs/sbin/gluster vol prof
> <volname> start nfs" and after a minute do "/opt/glusterfs/sbin/gluster vol
> prof <volname> info nfs" and give us the results?
See attached file. This was ran while still doing an rsync while the process was running at 100%.
> Is the overnight memleak problem seen on 64G machines too?
No however they don't have the same type of workload that would typically trigger this. Also these machines have yet to show the 100% cpu problem.
Spoke to Lance, he also tested setting vm.dirty_background_ratio and vm.dirty_ratio to lower values (< 10), the issue still occurred.
Lance, is it possible to attach sosreports from the server where the fail occurs and one of the 64GB machines where the issue isn't observed? I don't think the sos command is on CentOS by default but it should exist in the default yum repos.
Are there any more updates on this bug?
The profile output indicates around 4K lookups. which is not much to cause 100% cpu. Looks like glusterfsd goes into an infinite loop. When this happens can you attach gdb to it and do a "backtrace" ? we will know exactly where it enters the infinite loop.