Bug 616833
Summary: | task nfsd:2351 blocked for more than 120 seconds | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Scott Leerssen <scott> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 5.5 | CC: | bfields, jacekm, jlayton, rwheeler, sprabhu, steved, tom, trepancito, wnefal+redhatbugzilla |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-07-22 14:49:40 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Scott Leerssen
2010-07-21 13:57:14 UTC
The message INFO: task python:10025 blocked for more than 120 seconds. printed out when the code has been waiting on a semaphore/mutex for a long time is relatively recent and was only introduced in RHEL 5.5. > Our management software uses a lot of NFS mounts to access it's storage and
> sometimes that storage is local, so the NFS connections occur over loopback to
> the same server (don't ask).
This is a known problematic configuration and there is really no fix for it.
----------------[snip]----------------
INFO: task nfsd:2357 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd D 0000E308 2192 2357 1 2358 2356 (L-TLB)
f53c89b8 00000046 9ad3cfdd 0000e308 f53c7aa0 f7dab000 003c0ad4 0000000a
f53c7aa0 9ad94097 0000e308 000570ba 00000000 f53c7bac c1807200 cdc46ac0
00000100 0000e308 eca3e13c 00000000 1dc27d48 c042c811 c1807244 1dc27d48
Call Trace:
[<c042c811>] getnstimeofday+0x30/0xb6
[<c061d16e>] io_schedule+0x36/0x59
[<f8c725d1>] nfs_wait_bit_uninterruptible+0x5/0x8 [nfs]
[<c061d345>] __wait_on_bit+0x33/0x58
[<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs]
[<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs]
[<c061d3cc>] out_of_line_wait_on_bit+0x62/0x6a
[<c0436094>] wake_bit_function+0x0/0x3c
[<f8c725c5>] nfs_wait_on_request+0x1b/0x22 [nfs]
[<f8c758b8>] nfs_wait_on_requests_locked+0x61/0xa5 [nfs]
[<f8c766a4>] nfs_sync_inode_wait+0x4c/0x1ab [nfs]
[<f8c6d7aa>] nfs_release_page+0x1e/0x40 [nfs]
[<f8c6d78c>] nfs_release_page+0x0/0x40 [nfs]
[<c04772d2>] try_to_release_page+0x34/0x46
[<c04608a6>] shrink_inactive_list+0x4a4/0x7d2
[<c0460cca>] shrink_zone+0xf6/0x15b
[<c046169f>] try_to_free_pages+0x15b/0x26c
[<c045d41a>] __alloc_pages+0x1ce/0x2cf
[<c0458ddd>] grab_cache_page_write_begin+0x5c/0x91
[<f887d9e6>] ext3_write_begin+0x55/0x19e [ext3]
[<c0459f12>] generic_file_buffered_write+0x101/0x58b
[<c05bde3e>] memcpy_toiovec+0x27/0x4a
[<c05be51c>] skb_copy_datagram_iovec+0x108/0x1ca
[<c045a842>] __generic_file_aio_write_nolock+0x4a6/0x52a
[<c048c57b>] iput+0x3d/0x66
[<c048b550>] d_alloc_anon+0x17/0xd3
[<f8adf350>] find_exported_dentry+0x79/0x483 [exportfs]
[<c045a9f6>] __generic_file_write_nolock+0x86/0x9a
[<c0436067>] autoremove_wake_function+0x0/0x2d
[<c0436067>] autoremove_wake_function+0x0/0x2d
[<c061d420>] mutex_lock+0xb/0x19
[<c045aa41>] generic_file_writev+0x37/0x96
[<c045aa0a>] generic_file_writev+0x0/0x96
[<c0476061>] do_readv_writev+0x149/0x247
[<c0475944>] do_sync_write+0x0/0xf1
[<c0430e43>] set_current_groups+0x15a/0x166
[<c0476196>] vfs_writev+0x37/0x43
[<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd]
[<c04746be>] __dentry_open+0xea/0x1ab
[<c04747c4>] dentry_open+0x45/0x4b
[<f8bf38cb>] nfsd_write+0x96/0xab [nfsd]
[<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd]
[<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd]
[<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc]
[<f8bf068c>] nfsd+0x17e/0x286 [nfsd]
[<f8bf050e>] nfsd+0x0/0x286 [nfsd]
[<c0405c53>] kernel_thread_helper+0x7/0x10
nfsd needs memory in order to do its work. When the system is flush with dirty NFS pages and low on free memory, the VM subsystem will prefer to flush out those dirty NFS pages in order to free memory so it can give it to nfsd. But there's a chicken and egg problem...those writes can't complete without memory and you can't get memory until the writes are completed.
The best I can offer is a "don't do that" -- use bind mounts instead.
The other thing you can do is to try and tune the VM subsystem such that it writes out dirty nfs pages more aggressively, but even that won't save you if you're dirtying pages fast enough.
I'm going to go ahead and close this as WONTFIX. Please reopen if you want to discuss it further.
(In reply to comment #2) > > Our management software uses a lot of NFS mounts to access it's storage and > > sometimes that storage is local, so the NFS connections occur over loopback to > > the same server (don't ask). > > This is a known problematic configuration and there is really no fix for it. If this operation, mounting NFS over loopback, leads to unreliable behavior, then why does RHEL permit it? Why is it not rejected at the kernel level? I have the same problem, with a RHEL 5.5 I can`t understand why don't have a solution for this problem.... |