Bug 991443
Summary: | nfs: dd blocked for more than 120 seconds. nfs server not responding | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | spandura |
Component: | glusterfs | Assignee: | santosh pradhan <spradhan> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | spandura |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 2.1 | CC: | rhs-bugs, shaines, spandura, vagarwal, vbellur |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-09-24 15:53:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
spandura
2013-08-02 12:09:36 UTC
1. Not able to reproduce the issue in my setup (I followed the steps mentioned above). 2. There could be multiple reasons why dd hanged for 120 seconds. If the flushing of the caches in the server side takes more than 120 secs, then client ll see this traces in syslog. Either the size of the cache is high (VM tunables) or the disk performance is slow (takes longer to write the data). Or there could be network issues which congests the NFS traffic. (nfsstat -rc will confirm if there is more retransmissions). Also the netstat -s ll tell if there is any network issue. Because there ll be lots of network packets going on (dd on fuse and nfs). The server should be able to handle the network pressure. Will work with Shwetha to reproduce the issue in my setup. Again I would like to know if the dd failed ? If not, it's not an issue. Hi Shwetha, I guess server (where NFS server runs) might be with low memory (~4GB) or so and the disk I/O is not faster. As you are doing I/O (write) both from FUSE and NFS mount point, the I/O is not getting flushed faster (as we would like) and builds back pressure on the NFS client. If the flushing of the caches in server takes more than 120 seconds, then all subsequent NFS write op would be in sync behaviour which ll make the performance even worse and dumps the stacktraces in the syslog as you got. (In server side)Also you need to check netstat -s output if the pressure is too much in the TCP/IP stack which would result of packets pruned or dropped or data lost. In client side, nfsstat -v3 would show rpc retransmissions which means server is not writing the data fast enough to disk and causing the issues. I would suggest you try NFS and FUSE I/O separately to narrow down the issue. Let me know if you still see the issue. Also I would like to know if the dd on NFS mount point failed? Because both mount point ll write the data to same bricks/disks. Thanks, Santosh This has been lying in "needinfo" state since a long time. Can I have a update please? Executed the case on build : glusterfs 3.4.0.33rhs built on Sep 8 2013 13:20:26 In this run executed the case only on nfs mount. Test case:- =============== 1. Create a 1 x 2 replicate volume with 2 storage nodes and 1 brick per storage node. set the background-self-heal-count to 0, data-self-heal "off", self-heal-daemon off 2. create 4 nfs mount. 3. from nfs mount cd to nfs directory and execute "dd if=/dev/urandom of=test bs=1M count=10240" 4. while the dd on both the mount points is in progress , killall the gluster process from storage_node2 5. delete the brick directory and recreate the brick directory on storage_node2. 6. after a while when dd is still in progress, start glusterd on storage_node2. (this will start the brick, nfs process) Unable to recreate the issue. Lets close the issue then. |