Bug 616833

Summary:	task nfsd:2351 blocked for more than 120 seconds
Product:	Red Hat Enterprise Linux 5	Reporter:	Scott Leerssen <scott>
Component:	kernel	Assignee:	Jeff Layton <jlayton>
Status:	CLOSED WONTFIX	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	high	Docs Contact:
Priority:	low
Version:	5.5	CC:	bfields, jacekm, jlayton, rwheeler, sprabhu, steved, tom, trepancito, wnefal+redhatbugzilla
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-07-22 14:49:40 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Scott Leerssen 2010-07-21 13:57:14 UTC

Description of problem:

We are seeing NFS tasks being blocked indefinitely in the kernel after updating to 5.5.  5.4 did not seem to exhibit the same behavior.

Version-Release number of selected component (if applicable):

Linux devmws5.racemi.com 2.6.18-194.8.1.el5 #1 SMP Thu Jul 1 19:07:06 EDT 2010 i686 i686 i386 GNU/Linux

How reproducible:

Intermittent, but usually occurs after moderate load.

Steps to Reproduce:
1.
2.
3.
  
Actual results:

NFS connections block and never free.

Expected results:

No blocked NFS connections.

Additional info:

Our management software uses a lot of NFS mounts to access it's storage and sometimes that storage is local, so the NFS connections occur over loopback to the same server (don't ask).  These connections seem to exacerbate the issue.

Stack trace messages in dmesg:

nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
INFO: task kswapd0:94 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kswapd0       D 0000E308  2732    94      7            95    93 (L-TLB)
       f7f6ad68 00000046 91b3668c 0000e308 f0ccedcc 00000282 c042d7d5 0000000a 
       f7f6f000 91b7e77b 0000e308 000480ef 00000000 f7f6f10c c1807200 cdc46ac0 
       00000086 c0694b80 c1206140 f7cce260 1dc27c16 c042c811 c18f4c3c 1dc27c16 
Call Trace:
 [<c042d7d5>] __mod_timer+0xda/0xe4
 [<c042c811>] getnstimeofday+0x30/0xb6
 [<c061d16e>] io_schedule+0x36/0x59
 [<f8c725d1>] nfs_wait_bit_uninterruptible+0x5/0x8 [nfs]
 [<c061d345>] __wait_on_bit+0x33/0x58
 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs]
 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs]
 [<c061d3cc>] out_of_line_wait_on_bit+0x62/0x6a
 [<c0436094>] wake_bit_function+0x0/0x3c
 [<f8c725c5>] nfs_wait_on_request+0x1b/0x22 [nfs]
 [<f8c758b8>] nfs_wait_on_requests_locked+0x61/0xa5 [nfs]
 [<f8c766a4>] nfs_sync_inode_wait+0x4c/0x1ab [nfs]
 [<f8c6d7aa>] nfs_release_page+0x1e/0x40 [nfs]
 [<f8c6d78c>] nfs_release_page+0x0/0x40 [nfs]
 [<c04772d2>] try_to_release_page+0x34/0x46
 [<c04608a6>] shrink_inactive_list+0x4a4/0x7d2
 [<c045ff03>] isolate_lru_pages+0x73/0x8c
 [<c04602f7>] shrink_active_list+0x3db/0x3e3
 [<f8c6bc9e>] nfs_access_cache_shrinker+0x20/0x18f [nfs]
 [<c049fd36>] mb_cache_shrink_fn+0x1e/0xb7
 [<c0460cca>] shrink_zone+0xf6/0x15b
 [<c0461171>] kswapd+0x28a/0x3ab
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c0460ee7>] kswapd+0x0/0x3ab
 [<c0435fa3>] kthread+0xc0/0xed
 [<c0435ee3>] kthread+0x0/0xed
 [<c0405c53>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task nfsd:2351 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000E308  2192  2351      1          2352  2349 (L-TLB)
       f5324dcc 00000046 9ae7646c 0000e308 00000040 d8662998 f8895460 0000000a 
       c1ae8000 9ae87a0c 0000e308 000115a0 00000000 c1ae810c c1807200 cdc46ac0 
       c1ae8000 00000000 00000000 00000002 c1ae8000 c0436067 c1807244 f533eaa0 
Call Trace:
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c041eef4>] enqueue_task+0x29/0x39
 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c
 [<c061d583>] .text.lock.mutex+0xf/0x14
 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd]
 [<c045aa32>] generic_file_writev+0x28/0x96
 [<c045aa0a>] generic_file_writev+0x0/0x96
 [<c0476061>] do_readv_writev+0x149/0x247
 [<c0475944>] do_sync_write+0x0/0xf1
 [<c0430e43>] set_current_groups+0x15a/0x166
 [<c0476196>] vfs_writev+0x37/0x43
 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd]
 [<c04746be>] __dentry_open+0xea/0x1ab
 [<c04747c4>] dentry_open+0x45/0x4b
 [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd]
 [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd]
 [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd]
 [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc]
 [<f8bf068c>] nfsd+0x17e/0x286 [nfsd]
 [<f8bf050e>] nfsd+0x0/0x286 [nfsd]
 [<c0405c53>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task nfsd:2352 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000E308  2192  2352      1          2353  2351 (L-TLB)
       f5255dcc 00000046 9aeede7c 0000e308 00000040 d8662998 f8895460 0000000a 
       f5180550 9af0004a 0000e308 000121ce 00000000 f518065c c1807200 cdc46ac0 
       f5180550 00000000 00000000 00000002 f5180550 c0436067 c1807244 f53c7550 
Call Trace:
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c041eef4>] enqueue_task+0x29/0x39
 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c
 [<c061d583>] .text.lock.mutex+0xf/0x14
 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd]
 [<c045aa32>] generic_file_writev+0x28/0x96
 [<c045aa0a>] generic_file_writev+0x0/0x96
 [<c0476061>] do_readv_writev+0x149/0x247
 [<c0475944>] do_sync_write+0x0/0xf1
 [<c0430e43>] set_current_groups+0x15a/0x166
 [<c0476196>] vfs_writev+0x37/0x43
 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd]
 [<c04746be>] __dentry_open+0xea/0x1ab
 [<c04747c4>] dentry_open+0x45/0x4b
 [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd]
 [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd]
 [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd]
 [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc]
 [<f8bf068c>] nfsd+0x17e/0x286 [nfsd]
 [<f8bf050e>] nfsd+0x0/0x286 [nfsd]
 [<c0405c53>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task nfsd:2353 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000E308  2044  2353      1          2354  2352 (L-TLB)
       f522ddcc 00000046 9ae3f9a9 0000e308 00000040 d8662998 f8895460 0000000a 
       f5180000 9ae5090d 0000e308 00010f64 00000000 f518010c c1807200 cdc46ac0 
       f5180000 00000000 00000000 00000002 f5180000 c0436067 c1807244 c1ae8000 
Call Trace:
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c041eef4>] enqueue_task+0x29/0x39
 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c
 [<c061d583>] .text.lock.mutex+0xf/0x14
 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd]
 [<c045aa32>] generic_file_writev+0x28/0x96
 [<c045aa0a>] generic_file_writev+0x0/0x96
 [<c0476061>] do_readv_writev+0x149/0x247
 [<c0475944>] do_sync_write+0x0/0xf1
 [<c0430e43>] set_current_groups+0x15a/0x166
 [<c0476196>] vfs_writev+0x37/0x43
 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd]
 [<c04746be>] __dentry_open+0xea/0x1ab
 [<c04747c4>] dentry_open+0x45/0x4b
 [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd]
 [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd]
 [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd]
 [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc]
 [<f8bf068c>] nfsd+0x17e/0x286 [nfsd]
 [<f8bf050e>] nfsd+0x0/0x286 [nfsd]
 [<c0405c53>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task nfsd:2354 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000E308  2120  2354      1          2355  2353 (L-TLB)
       f533fdcc 00000046 9aeadf21 0000e308 00000040 d8662998 f8895460 0000000a 
       f533eaa0 9aebeea8 0000e308 00010f87 00000000 f533ebac c1807200 cdc46ac0 
       f533eaa0 00000000 00000000 00000002 f533eaa0 c0436067 c1807244 f5180550 
Call Trace:
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c041eef4>] enqueue_task+0x29/0x39
 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c
 [<c061d583>] .text.lock.mutex+0xf/0x14
 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd]
 [<c045aa32>] generic_file_writev+0x28/0x96
 [<c045aa0a>] generic_file_writev+0x0/0x96
 [<c0476061>] do_readv_writev+0x149/0x247
 [<c0475944>] do_sync_write+0x0/0xf1
 [<c0430e43>] set_current_groups+0x15a/0x166
 [<c0476196>] vfs_writev+0x37/0x43
 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd]
 [<c04746be>] __dentry_open+0xea/0x1ab
 [<c04747c4>] dentry_open+0x45/0x4b
 [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd]
 [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd]
 [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd]
 [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc]
 [<f8bf068c>] nfsd+0x17e/0x286 [nfsd]
 [<f8bf050e>] nfsd+0x0/0x286 [nfsd]
 [<c0405c53>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task nfsd:2355 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000E308  2120  2355      1          2356  2354 (L-TLB)
       f53addcc 00000046 9ae09cb4 0000e308 00000040 d8662998 f8895460 0000000a 
       f533e550 9ae1ac7c 0000e308 00010fc8 00000000 f533e65c c1807200 cdc46ac0 
       f533e550 00000000 00000000 00000002 f533e550 c0436067 c1807244 f5180000 
Call Trace:
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c041eef4>] enqueue_task+0x29/0x39
 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c
 [<c061d583>] .text.lock.mutex+0xf/0x14
 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd]
 [<c045aa32>] generic_file_writev+0x28/0x96
 [<c045aa0a>] generic_file_writev+0x0/0x96
 [<c0476061>] do_readv_writev+0x149/0x247
 [<c0475944>] do_sync_write+0x0/0xf1
 [<c0430e43>] set_current_groups+0x15a/0x166
 [<c0476196>] vfs_writev+0x37/0x43
 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd]
 [<c04746be>] __dentry_open+0xea/0x1ab
 [<c04747c4>] dentry_open+0x45/0x4b
 [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd]
 [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd]
 [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd]
 [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc]
 [<f8bf068c>] nfsd+0x17e/0x286 [nfsd]
 [<f8bf050e>] nfsd+0x0/0x286 [nfsd]
 [<c0405c53>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task nfsd:2356 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000E308  2192  2356      1          2357  2355 (L-TLB)
       f53badcc 00000046 9adc8f7f 0000e308 00000040 d8662998 f8895460 0000000a 
       f533e000 9ade03cb 0000e308 0001744c 00000000 f533e10c c1807200 cdc46ac0 
       f533e000 00000000 00000000 00000000 f533e000 c0436067 c1807244 f533e550 
Call Trace:
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c041eef4>] enqueue_task+0x29/0x39
 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c
 [<c061d583>] .text.lock.mutex+0xf/0x14
 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd]
 [<c045aa32>] generic_file_writev+0x28/0x96
 [<c045aa0a>] generic_file_writev+0x0/0x96
 [<c0476061>] do_readv_writev+0x149/0x247
 [<c0475944>] do_sync_write+0x0/0xf1
 [<c0430e43>] set_current_groups+0x15a/0x166
 [<c0476196>] vfs_writev+0x37/0x43
 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd]
 [<c04746be>] __dentry_open+0xea/0x1ab
 [<c04747c4>] dentry_open+0x45/0x4b
 [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd]
 [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd]
 [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd]
 [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc]
 [<f8bf068c>] nfsd+0x17e/0x286 [nfsd]
 [<f8bf050e>] nfsd+0x0/0x286 [nfsd]
 [<c0405c53>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task nfsd:2357 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000E308  2192  2357      1          2358  2356 (L-TLB)
       f53c89b8 00000046 9ad3cfdd 0000e308 f53c7aa0 f7dab000 003c0ad4 0000000a 
       f53c7aa0 9ad94097 0000e308 000570ba 00000000 f53c7bac c1807200 cdc46ac0 
       00000100 0000e308 eca3e13c 00000000 1dc27d48 c042c811 c1807244 1dc27d48 
Call Trace:
 [<c042c811>] getnstimeofday+0x30/0xb6
 [<c061d16e>] io_schedule+0x36/0x59
 [<f8c725d1>] nfs_wait_bit_uninterruptible+0x5/0x8 [nfs]
 [<c061d345>] __wait_on_bit+0x33/0x58
 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs]
 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs]
 [<c061d3cc>] out_of_line_wait_on_bit+0x62/0x6a
 [<c0436094>] wake_bit_function+0x0/0x3c
 [<f8c725c5>] nfs_wait_on_request+0x1b/0x22 [nfs]
 [<f8c758b8>] nfs_wait_on_requests_locked+0x61/0xa5 [nfs]
 [<f8c766a4>] nfs_sync_inode_wait+0x4c/0x1ab [nfs]
 [<f8c6d7aa>] nfs_release_page+0x1e/0x40 [nfs]
 [<f8c6d78c>] nfs_release_page+0x0/0x40 [nfs]
 [<c04772d2>] try_to_release_page+0x34/0x46
 [<c04608a6>] shrink_inactive_list+0x4a4/0x7d2
 [<c0460cca>] shrink_zone+0xf6/0x15b
 [<c046169f>] try_to_free_pages+0x15b/0x26c
 [<c045d41a>] __alloc_pages+0x1ce/0x2cf
 [<c0458ddd>] grab_cache_page_write_begin+0x5c/0x91
 [<f887d9e6>] ext3_write_begin+0x55/0x19e [ext3]
 [<c0459f12>] generic_file_buffered_write+0x101/0x58b
 [<c05bde3e>] memcpy_toiovec+0x27/0x4a
 [<c05be51c>] skb_copy_datagram_iovec+0x108/0x1ca
 [<c045a842>] __generic_file_aio_write_nolock+0x4a6/0x52a
 [<c048c57b>] iput+0x3d/0x66
 [<c048b550>] d_alloc_anon+0x17/0xd3
 [<f8adf350>] find_exported_dentry+0x79/0x483 [exportfs]
 [<c045a9f6>] __generic_file_write_nolock+0x86/0x9a
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c061d420>] mutex_lock+0xb/0x19
 [<c045aa41>] generic_file_writev+0x37/0x96
 [<c045aa0a>] generic_file_writev+0x0/0x96
 [<c0476061>] do_readv_writev+0x149/0x247
 [<c0475944>] do_sync_write+0x0/0xf1
 [<c0430e43>] set_current_groups+0x15a/0x166
 [<c0476196>] vfs_writev+0x37/0x43
 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd]
 [<c04746be>] __dentry_open+0xea/0x1ab
 [<c04747c4>] dentry_open+0x45/0x4b
 [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd]
 [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd]
 [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd]
 [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc]
 [<f8bf068c>] nfsd+0x17e/0x286 [nfsd]
 [<f8bf050e>] nfsd+0x0/0x286 [nfsd]
 [<c0405c53>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task nfsd:2358 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000E308  2192  2358      1          2361  2357 (L-TLB)
       f53d4dcc 00000046 9af0f9cb 0000e308 00000040 d8662998 f8895460 0000000a 
       f53c7550 9af26865 0000e308 00016e9a 00000000 f53c765c c1807200 cdc46ac0 
       f53c7550 00000000 00000000 00000002 f53c7550 c0436067 f53d4db0 f53d4db0 
Call Trace:
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c
 [<c061d583>] .text.lock.mutex+0xf/0x14
 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd]
 [<c045aa32>] generic_file_writev+0x28/0x96
 [<c045aa0a>] generic_file_writev+0x0/0x96
 [<c0476061>] do_readv_writev+0x149/0x247
 [<c0475944>] do_sync_write+0x0/0xf1
 [<c0430e43>] set_current_groups+0x15a/0x166
 [<c0476196>] vfs_writev+0x37/0x43
 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd]
 [<c04746be>] __dentry_open+0xea/0x1ab
 [<c04747c4>] dentry_open+0x45/0x4b
 [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd]
 [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd]
 [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd]
 [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc]
 [<f8bf068c>] nfsd+0x17e/0x286 [nfsd]
 [<f8bf050e>] nfsd+0x0/0x286 [nfsd]
 [<c0405c53>] kernel_thread_helper+0x7/0x10
 =======================
INFO: task python:10025 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
python        D 0000E308  1936 10025   2665                     (NOTLB)
       c9897eb8 00000086 9c5a6bec 0000e308 00000000 00000000 00000002 00000009 
       c8138aa0 9c9a2284 0000e308 003fb698 00000000 c8138bac c1807200 cdc46ac0 
       00000000 c891acd4 00000000 f14f2604 1dc27d84 c042c811 c1807244 1dc27d84 
Call Trace:
 [<c042c811>] getnstimeofday+0x30/0xb6
 [<c061d16e>] io_schedule+0x36/0x59
 [<f8c725d1>] nfs_wait_bit_uninterruptible+0x5/0x8 [nfs]
 [<c061d345>] __wait_on_bit+0x33/0x58
 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs]
 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs]
 [<c061d3cc>] out_of_line_wait_on_bit+0x62/0x6a
 [<c0436094>] wake_bit_function+0x0/0x3c
 [<f8c725c5>] nfs_wait_on_request+0x1b/0x22 [nfs]
 [<f8c758b8>] nfs_wait_on_requests_locked+0x61/0xa5 [nfs]
 [<f8c766a4>] nfs_sync_inode_wait+0x4c/0x1ab [nfs]
 [<f8c6d6f7>] nfs_do_fsync+0x19/0x39 [nfs]
 [<f8c6de9d>] nfs_file_flush+0x67/0x87 [nfs]
 [<c04744fa>] filp_close+0x2f/0x54
 [<c0475752>] sys_close+0x71/0xa0
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
nfs: server 10.5.1.1 not responding, still trying
rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket
nfsd: last server has exited
nfsd: unexporting all filesystems
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period

Comment 1 Sachin Prabhu 2010-07-22 14:41:45 UTC

The message 

INFO: task python:10025 blocked for more than 120 seconds.

printed out when the code has been waiting on a semaphore/mutex for a long time is relatively recent and was only introduced in RHEL 5.5.

Comment 2 Jeff Layton 2010-07-22 14:48:51 UTC

> Our management software uses a lot of NFS mounts to access it's storage and
> sometimes that storage is local, so the NFS connections occur over loopback to
> the same server (don't ask).

This is a known problematic configuration and there is really no fix for it. 
----------------[snip]----------------

INFO: task nfsd:2357 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd          D 0000E308  2192  2357      1          2358  2356 (L-TLB)
       f53c89b8 00000046 9ad3cfdd 0000e308 f53c7aa0 f7dab000 003c0ad4 0000000a 
       f53c7aa0 9ad94097 0000e308 000570ba 00000000 f53c7bac c1807200 cdc46ac0 
       00000100 0000e308 eca3e13c 00000000 1dc27d48 c042c811 c1807244 1dc27d48 
Call Trace:
 [<c042c811>] getnstimeofday+0x30/0xb6
 [<c061d16e>] io_schedule+0x36/0x59
 [<f8c725d1>] nfs_wait_bit_uninterruptible+0x5/0x8 [nfs]
 [<c061d345>] __wait_on_bit+0x33/0x58
 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs]
 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs]
 [<c061d3cc>] out_of_line_wait_on_bit+0x62/0x6a
 [<c0436094>] wake_bit_function+0x0/0x3c
 [<f8c725c5>] nfs_wait_on_request+0x1b/0x22 [nfs]
 [<f8c758b8>] nfs_wait_on_requests_locked+0x61/0xa5 [nfs]
 [<f8c766a4>] nfs_sync_inode_wait+0x4c/0x1ab [nfs]
 [<f8c6d7aa>] nfs_release_page+0x1e/0x40 [nfs]
 [<f8c6d78c>] nfs_release_page+0x0/0x40 [nfs]
 [<c04772d2>] try_to_release_page+0x34/0x46
 [<c04608a6>] shrink_inactive_list+0x4a4/0x7d2
 [<c0460cca>] shrink_zone+0xf6/0x15b
 [<c046169f>] try_to_free_pages+0x15b/0x26c
 [<c045d41a>] __alloc_pages+0x1ce/0x2cf
 [<c0458ddd>] grab_cache_page_write_begin+0x5c/0x91
 [<f887d9e6>] ext3_write_begin+0x55/0x19e [ext3]
 [<c0459f12>] generic_file_buffered_write+0x101/0x58b
 [<c05bde3e>] memcpy_toiovec+0x27/0x4a
 [<c05be51c>] skb_copy_datagram_iovec+0x108/0x1ca
 [<c045a842>] __generic_file_aio_write_nolock+0x4a6/0x52a
 [<c048c57b>] iput+0x3d/0x66
 [<c048b550>] d_alloc_anon+0x17/0xd3
 [<f8adf350>] find_exported_dentry+0x79/0x483 [exportfs]
 [<c045a9f6>] __generic_file_write_nolock+0x86/0x9a
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c0436067>] autoremove_wake_function+0x0/0x2d
 [<c061d420>] mutex_lock+0xb/0x19
 [<c045aa41>] generic_file_writev+0x37/0x96
 [<c045aa0a>] generic_file_writev+0x0/0x96
 [<c0476061>] do_readv_writev+0x149/0x247
 [<c0475944>] do_sync_write+0x0/0xf1
 [<c0430e43>] set_current_groups+0x15a/0x166
 [<c0476196>] vfs_writev+0x37/0x43
 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd]
 [<c04746be>] __dentry_open+0xea/0x1ab
 [<c04747c4>] dentry_open+0x45/0x4b
 [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd]
 [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd]
 [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd]
 [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc]
 [<f8bf068c>] nfsd+0x17e/0x286 [nfsd]
 [<f8bf050e>] nfsd+0x0/0x286 [nfsd]
 [<c0405c53>] kernel_thread_helper+0x7/0x10

nfsd needs memory in order to do its work. When the system is flush with dirty NFS pages and low on free memory, the VM subsystem will prefer to flush out those dirty NFS pages in order to free memory so it can give it to nfsd. But there's a chicken and egg problem...those writes can't complete without memory and you can't get memory until the writes are completed.

The best I can offer is a "don't do that" -- use bind mounts instead.

The other thing you can do is to try and tune the VM subsystem such that it writes out dirty nfs pages more aggressively, but even that won't save you if you're dirtying pages fast enough.

I'm going to go ahead and close this as WONTFIX. Please reopen if you want to discuss it further.

Comment 3 Jacek Masiulaniec 2011-04-04 09:01:26 UTC

(In reply to comment #2)
> > Our management software uses a lot of NFS mounts to access it's storage and
> > sometimes that storage is local, so the NFS connections occur over loopback to
> > the same server (don't ask).
> 
> This is a known problematic configuration and there is really no fix for it.

If this operation, mounting NFS over loopback, leads to unreliable behavior, then why does RHEL permit it?  Why is it not rejected at the kernel level?

Comment 4 trepancito 2011-10-13 11:30:42 UTC

I have the same problem, with a RHEL 5.5
I can`t understand why don't have a solution for this problem....