Red Hat Bugzilla – Bug 1305205
NFS mount hangs on a tiered volume
Last modified: 2017-10-11 03:37:45 EDT
Description of problem:
on a 16 node setup, using 3 clients to run IO (dd, linux untar and mkdir, one from each client) and within a day one of the client is hitting throttling issue. rpc outstanding requests by default is 16.
volume info :
[root@rhs-client17 ~]# gluster v info ec_tier
Volume Name: ec_tier
Volume ID: 84855431-e6cf-41e9-9cfc-7a735f2685ed
Number of Bricks: 44
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 4 x 2 = 8
Cold Tier Type : Distributed-Disperse
Number of Bricks: 3 x (8 + 4) = 36
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create an EC volume (8+4) and attach a dist-rep tier volume
2. NFS mount on the client and run the IO.(dd and linux untar)
client mount hang
IO should not hang
This issue doesn't seem to exactly relate to throttling but most likely arises only when it is enabled. Have requested QE to confirm that by reproducing this issue with throttling limit set to 'zero'.
I tried 2 scenarios setting throttling to 0.
1. Serial IO from 2 clients (dd and linux untar)
The memory of nfs process has shot up to 7GB (in a 16GB node) within 30 mins and stays there though there's not much requests from clients. I am seeing NFS server not responding, still trying and OK messages.
2. Parallel IO from 1 client (dd and linux untar)
The memory consumption has gone up to 2GB and is constant at it.
There's no crash seen though in both cases but likely to be hit if the IO is smooth from clients.
it confirms that it is not related to throttling issue. Will take pkt trace and statedump of the processes and update further.
The work-around for this issue is to re-start gluster-NFS server.