Description of problem: Ganesha mount hangs during read_large fs-sanity tool on tiered volume. Version-Release number of selected component (if applicable): 3.7.9-1 How reproducible: Once Steps to Reproduce: 1.Configure nfs-ganesha on cluster nodes. 2.Create a tiered volume and mount it with version 4 3.Execute fs-sanity test suite on the mount point 4.Observe that during read_large test tool, ganesha mount hangs with "server not responding messages" 5.ganesha service is active and running on the mounted node. [root@dhcp37-180 tmp]# service nfs-ganesha status Redirecting to /bin/systemctl status nfs-ganesha.service ● nfs-ganesha.service - NFS-Ganesha file server Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled) Active: active (running) since Sat 2016-04-02 02:27:15 IST; 12h ago Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki Main PID: 21795 (ganesha.nfsd) 6. Showmount gives below error message on the mounted node [root@dhcp37-180 tmp]# showmount -e localhost rpc mount export: RPC: Timed out 7. cd /mnt and df on the client hangs Actual results: Ganesha mount hangs during read_large fs-sanity tool on tiered volume. Expected results: read_large test suite should pass and should not be hanged. Additional info:
sosreport, ganesha log and packet trace are placed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1323423
From the pkt trace, I do not see any NFS calls but can see lots of gluster traffic. Could you please run ' read_large test' alone and check the behaviour along with pkt_trace. Thanks!
Please provide the results of executing read_large test alone.
and also please provide setup if u hit the issue again. I use following steps to reproduce the issue : 1.) create replicated volume(1x2) and start it 2.) export the volume using ganesha 3.) mount the volume using v4 4.) created directory dir and a large file using dd know as "src" inside it 5.) perform attach tier, wait for its completion. 6.) cat src > dst (what read_large do) I didn't get hang when I perform above steps in a sequential order. I hit hang when I perform 4 and 5(may be include 6 also) together only twice in my entire runs. (I didn't debug hang at that point of time, because I was analyzing BZ1323424 and hang didn't happen while i tried to reproduce issue again)
Executed read_large test suite individually for around 10 times but didn't hit the hang issue. Will keep an eye on this bug during testing cycle and update bugzilla accordingly.