Bug 808079
Summary: | Intermittent failures of bonnie and iozone on nfs | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | shylesh <shmohan> |
Component: | nfs | Assignee: | Rajesh <rajesh> |
Status: | CLOSED WORKSFORME | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | pre-release | CC: | gluster-bugs, vagarwal, vbellur |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-04-25 05:39:40 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
shylesh
2012-03-29 14:04:44 UTC
I ran into this issue again. I could get the kernel stack trace of the gNFS process (/proc/<pid>/stack) [root@QA-49 ~]# cat /proc/16733/stack [<ffffffffa03f9f24>] nfs_wait_bit_killable+0x24/0x40 [nfs] [<ffffffffa0408f49>] nfs_commit_inode+0xa9/0x250 [nfs] [<ffffffffa03f60c6>] nfs_release_page+0x86/0xa0 [nfs] [<ffffffff8110fe60>] try_to_release_page+0x30/0x60 [<ffffffff8112a2a1>] shrink_page_list.clone.0+0x4f1/0x5c0 [<ffffffff8112a66b>] shrink_inactive_list+0x2fb/0x740 [<ffffffff8112b37f>] shrink_zone+0x38f/0x520 [<ffffffff8112b60e>] do_try_to_free_pages+0xfe/0x520 [<ffffffff8112bc1d>] try_to_free_pages+0x9d/0x130 [<ffffffff81123b9d>] __alloc_pages_nodemask+0x40d/0x940 [<ffffffff81158c7a>] alloc_pages_vma+0x9a/0x150 [<ffffffff81171bb5>] do_huge_pmd_anonymous_page+0x145/0x370 [<ffffffff8113c52a>] handle_mm_fault+0x25a/0x2b0 [<ffffffff81042b39>] __do_page_fault+0x139/0x480 [<ffffffff814f248e>] do_page_fault+0x3e/0xa0 [<ffffffff814ef845>] page_fault+0x25/0x30 [<ffffffff814260c9>] skb_copy_datagram_iovec+0x159/0x2c0 [<ffffffff81472235>] tcp_recvmsg+0xca5/0xe90 [<ffffffff8141c1b9>] sock_common_recvmsg+0x39/0x50 [<ffffffff8141bf51>] sock_aio_read+0x181/0x190 [<ffffffff8117619b>] do_sync_readv_writev+0xfb/0x140 [<ffffffff8117722f>] do_readv_writev+0xcf/0x1f0 [<ffffffff81177563>] vfs_readv+0x43/0x60 [<ffffffff81177691>] sys_readv+0x51/0xb0 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff ----------------------------------------------------------- From the above BT, looks like a readv call on reading network data caused the process to end up in "D" state. [root@QA-49 ~]# ps uaxww | grep nfs root 16733 0.9 10.0 537752 206564 ? Dsl Mar28 17:46 /usr/local/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /etc/glusterd/nfs/run/nfs.pid -l /usr/local/var/log/glusterfs/nfs.log -S /tmp/77970b6a4863ead1be3b3b44b26f2dc5.socket Since the process is in "D" state it can't be attached with gdb, strace and the likes. I'll try to run the test again and see if the issues reproduces with the same BT. When the client and server are in the same host, this deadlock is known to occur. Can you reproduce this with the two of them in separate machines? Iff the same problem is seen, i request the following: 1. relevant logs (bricks, gNFS) 2. stack trace of bonnie++, gNFS, and brick(s) processes. 3. free -m and top output. with client and server on different hosts, this issue is not seen |