Description of problem: We are seeing NFS tasks being blocked indefinitely in the kernel after updating to 5.5. 5.4 did not seem to exhibit the same behavior. Version-Release number of selected component (if applicable): Linux devmws5.racemi.com 2.6.18-194.8.1.el5 #1 SMP Thu Jul 1 19:07:06 EDT 2010 i686 i686 i386 GNU/Linux How reproducible: Intermittent, but usually occurs after moderate load. Steps to Reproduce: 1. 2. 3. Actual results: NFS connections block and never free. Expected results: No blocked NFS connections. Additional info: Our management software uses a lot of NFS mounts to access it's storage and sometimes that storage is local, so the NFS connections occur over loopback to the same server (don't ask). These connections seem to exacerbate the issue. Stack trace messages in dmesg: nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying INFO: task kswapd0:94 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kswapd0 D 0000E308 2732 94 7 95 93 (L-TLB) f7f6ad68 00000046 91b3668c 0000e308 f0ccedcc 00000282 c042d7d5 0000000a f7f6f000 91b7e77b 0000e308 000480ef 00000000 f7f6f10c c1807200 cdc46ac0 00000086 c0694b80 c1206140 f7cce260 1dc27c16 c042c811 c18f4c3c 1dc27c16 Call Trace: [<c042d7d5>] __mod_timer+0xda/0xe4 [<c042c811>] getnstimeofday+0x30/0xb6 [<c061d16e>] io_schedule+0x36/0x59 [<f8c725d1>] nfs_wait_bit_uninterruptible+0x5/0x8 [nfs] [<c061d345>] __wait_on_bit+0x33/0x58 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs] [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs] [<c061d3cc>] out_of_line_wait_on_bit+0x62/0x6a [<c0436094>] wake_bit_function+0x0/0x3c [<f8c725c5>] nfs_wait_on_request+0x1b/0x22 [nfs] [<f8c758b8>] nfs_wait_on_requests_locked+0x61/0xa5 [nfs] [<f8c766a4>] nfs_sync_inode_wait+0x4c/0x1ab [nfs] [<f8c6d7aa>] nfs_release_page+0x1e/0x40 [nfs] [<f8c6d78c>] nfs_release_page+0x0/0x40 [nfs] [<c04772d2>] try_to_release_page+0x34/0x46 [<c04608a6>] shrink_inactive_list+0x4a4/0x7d2 [<c045ff03>] isolate_lru_pages+0x73/0x8c [<c04602f7>] shrink_active_list+0x3db/0x3e3 [<f8c6bc9e>] nfs_access_cache_shrinker+0x20/0x18f [nfs] [<c049fd36>] mb_cache_shrink_fn+0x1e/0xb7 [<c0460cca>] shrink_zone+0xf6/0x15b [<c0461171>] kswapd+0x28a/0x3ab [<c0436067>] autoremove_wake_function+0x0/0x2d [<c0460ee7>] kswapd+0x0/0x3ab [<c0435fa3>] kthread+0xc0/0xed [<c0435ee3>] kthread+0x0/0xed [<c0405c53>] kernel_thread_helper+0x7/0x10 ======================= INFO: task nfsd:2351 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000E308 2192 2351 1 2352 2349 (L-TLB) f5324dcc 00000046 9ae7646c 0000e308 00000040 d8662998 f8895460 0000000a c1ae8000 9ae87a0c 0000e308 000115a0 00000000 c1ae810c c1807200 cdc46ac0 c1ae8000 00000000 00000000 00000002 c1ae8000 c0436067 c1807244 f533eaa0 Call Trace: [<c0436067>] autoremove_wake_function+0x0/0x2d [<c041eef4>] enqueue_task+0x29/0x39 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c [<c061d583>] .text.lock.mutex+0xf/0x14 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd] [<c045aa32>] generic_file_writev+0x28/0x96 [<c045aa0a>] generic_file_writev+0x0/0x96 [<c0476061>] do_readv_writev+0x149/0x247 [<c0475944>] do_sync_write+0x0/0xf1 [<c0430e43>] set_current_groups+0x15a/0x166 [<c0476196>] vfs_writev+0x37/0x43 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd] [<c04746be>] __dentry_open+0xea/0x1ab [<c04747c4>] dentry_open+0x45/0x4b [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd] [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd] [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd] [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc] [<f8bf068c>] nfsd+0x17e/0x286 [nfsd] [<f8bf050e>] nfsd+0x0/0x286 [nfsd] [<c0405c53>] kernel_thread_helper+0x7/0x10 ======================= INFO: task nfsd:2352 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000E308 2192 2352 1 2353 2351 (L-TLB) f5255dcc 00000046 9aeede7c 0000e308 00000040 d8662998 f8895460 0000000a f5180550 9af0004a 0000e308 000121ce 00000000 f518065c c1807200 cdc46ac0 f5180550 00000000 00000000 00000002 f5180550 c0436067 c1807244 f53c7550 Call Trace: [<c0436067>] autoremove_wake_function+0x0/0x2d [<c041eef4>] enqueue_task+0x29/0x39 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c [<c061d583>] .text.lock.mutex+0xf/0x14 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd] [<c045aa32>] generic_file_writev+0x28/0x96 [<c045aa0a>] generic_file_writev+0x0/0x96 [<c0476061>] do_readv_writev+0x149/0x247 [<c0475944>] do_sync_write+0x0/0xf1 [<c0430e43>] set_current_groups+0x15a/0x166 [<c0476196>] vfs_writev+0x37/0x43 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd] [<c04746be>] __dentry_open+0xea/0x1ab [<c04747c4>] dentry_open+0x45/0x4b [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd] [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd] [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd] [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc] [<f8bf068c>] nfsd+0x17e/0x286 [nfsd] [<f8bf050e>] nfsd+0x0/0x286 [nfsd] [<c0405c53>] kernel_thread_helper+0x7/0x10 ======================= INFO: task nfsd:2353 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000E308 2044 2353 1 2354 2352 (L-TLB) f522ddcc 00000046 9ae3f9a9 0000e308 00000040 d8662998 f8895460 0000000a f5180000 9ae5090d 0000e308 00010f64 00000000 f518010c c1807200 cdc46ac0 f5180000 00000000 00000000 00000002 f5180000 c0436067 c1807244 c1ae8000 Call Trace: [<c0436067>] autoremove_wake_function+0x0/0x2d [<c041eef4>] enqueue_task+0x29/0x39 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c [<c061d583>] .text.lock.mutex+0xf/0x14 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd] [<c045aa32>] generic_file_writev+0x28/0x96 [<c045aa0a>] generic_file_writev+0x0/0x96 [<c0476061>] do_readv_writev+0x149/0x247 [<c0475944>] do_sync_write+0x0/0xf1 [<c0430e43>] set_current_groups+0x15a/0x166 [<c0476196>] vfs_writev+0x37/0x43 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd] [<c04746be>] __dentry_open+0xea/0x1ab [<c04747c4>] dentry_open+0x45/0x4b [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd] [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd] [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd] [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc] [<f8bf068c>] nfsd+0x17e/0x286 [nfsd] [<f8bf050e>] nfsd+0x0/0x286 [nfsd] [<c0405c53>] kernel_thread_helper+0x7/0x10 ======================= INFO: task nfsd:2354 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000E308 2120 2354 1 2355 2353 (L-TLB) f533fdcc 00000046 9aeadf21 0000e308 00000040 d8662998 f8895460 0000000a f533eaa0 9aebeea8 0000e308 00010f87 00000000 f533ebac c1807200 cdc46ac0 f533eaa0 00000000 00000000 00000002 f533eaa0 c0436067 c1807244 f5180550 Call Trace: [<c0436067>] autoremove_wake_function+0x0/0x2d [<c041eef4>] enqueue_task+0x29/0x39 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c [<c061d583>] .text.lock.mutex+0xf/0x14 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd] [<c045aa32>] generic_file_writev+0x28/0x96 [<c045aa0a>] generic_file_writev+0x0/0x96 [<c0476061>] do_readv_writev+0x149/0x247 [<c0475944>] do_sync_write+0x0/0xf1 [<c0430e43>] set_current_groups+0x15a/0x166 [<c0476196>] vfs_writev+0x37/0x43 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd] [<c04746be>] __dentry_open+0xea/0x1ab [<c04747c4>] dentry_open+0x45/0x4b [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd] [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd] [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd] [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc] [<f8bf068c>] nfsd+0x17e/0x286 [nfsd] [<f8bf050e>] nfsd+0x0/0x286 [nfsd] [<c0405c53>] kernel_thread_helper+0x7/0x10 ======================= INFO: task nfsd:2355 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000E308 2120 2355 1 2356 2354 (L-TLB) f53addcc 00000046 9ae09cb4 0000e308 00000040 d8662998 f8895460 0000000a f533e550 9ae1ac7c 0000e308 00010fc8 00000000 f533e65c c1807200 cdc46ac0 f533e550 00000000 00000000 00000002 f533e550 c0436067 c1807244 f5180000 Call Trace: [<c0436067>] autoremove_wake_function+0x0/0x2d [<c041eef4>] enqueue_task+0x29/0x39 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c [<c061d583>] .text.lock.mutex+0xf/0x14 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd] [<c045aa32>] generic_file_writev+0x28/0x96 [<c045aa0a>] generic_file_writev+0x0/0x96 [<c0476061>] do_readv_writev+0x149/0x247 [<c0475944>] do_sync_write+0x0/0xf1 [<c0430e43>] set_current_groups+0x15a/0x166 [<c0476196>] vfs_writev+0x37/0x43 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd] [<c04746be>] __dentry_open+0xea/0x1ab [<c04747c4>] dentry_open+0x45/0x4b [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd] [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd] [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd] [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc] [<f8bf068c>] nfsd+0x17e/0x286 [nfsd] [<f8bf050e>] nfsd+0x0/0x286 [nfsd] [<c0405c53>] kernel_thread_helper+0x7/0x10 ======================= INFO: task nfsd:2356 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000E308 2192 2356 1 2357 2355 (L-TLB) f53badcc 00000046 9adc8f7f 0000e308 00000040 d8662998 f8895460 0000000a f533e000 9ade03cb 0000e308 0001744c 00000000 f533e10c c1807200 cdc46ac0 f533e000 00000000 00000000 00000000 f533e000 c0436067 c1807244 f533e550 Call Trace: [<c0436067>] autoremove_wake_function+0x0/0x2d [<c041eef4>] enqueue_task+0x29/0x39 [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c [<c061d583>] .text.lock.mutex+0xf/0x14 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd] [<c045aa32>] generic_file_writev+0x28/0x96 [<c045aa0a>] generic_file_writev+0x0/0x96 [<c0476061>] do_readv_writev+0x149/0x247 [<c0475944>] do_sync_write+0x0/0xf1 [<c0430e43>] set_current_groups+0x15a/0x166 [<c0476196>] vfs_writev+0x37/0x43 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd] [<c04746be>] __dentry_open+0xea/0x1ab [<c04747c4>] dentry_open+0x45/0x4b [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd] [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd] [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd] [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc] [<f8bf068c>] nfsd+0x17e/0x286 [nfsd] [<f8bf050e>] nfsd+0x0/0x286 [nfsd] [<c0405c53>] kernel_thread_helper+0x7/0x10 ======================= INFO: task nfsd:2357 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000E308 2192 2357 1 2358 2356 (L-TLB) f53c89b8 00000046 9ad3cfdd 0000e308 f53c7aa0 f7dab000 003c0ad4 0000000a f53c7aa0 9ad94097 0000e308 000570ba 00000000 f53c7bac c1807200 cdc46ac0 00000100 0000e308 eca3e13c 00000000 1dc27d48 c042c811 c1807244 1dc27d48 Call Trace: [<c042c811>] getnstimeofday+0x30/0xb6 [<c061d16e>] io_schedule+0x36/0x59 [<f8c725d1>] nfs_wait_bit_uninterruptible+0x5/0x8 [nfs] [<c061d345>] __wait_on_bit+0x33/0x58 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs] [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs] [<c061d3cc>] out_of_line_wait_on_bit+0x62/0x6a [<c0436094>] wake_bit_function+0x0/0x3c [<f8c725c5>] nfs_wait_on_request+0x1b/0x22 [nfs] [<f8c758b8>] nfs_wait_on_requests_locked+0x61/0xa5 [nfs] [<f8c766a4>] nfs_sync_inode_wait+0x4c/0x1ab [nfs] [<f8c6d7aa>] nfs_release_page+0x1e/0x40 [nfs] [<f8c6d78c>] nfs_release_page+0x0/0x40 [nfs] [<c04772d2>] try_to_release_page+0x34/0x46 [<c04608a6>] shrink_inactive_list+0x4a4/0x7d2 [<c0460cca>] shrink_zone+0xf6/0x15b [<c046169f>] try_to_free_pages+0x15b/0x26c [<c045d41a>] __alloc_pages+0x1ce/0x2cf [<c0458ddd>] grab_cache_page_write_begin+0x5c/0x91 [<f887d9e6>] ext3_write_begin+0x55/0x19e [ext3] [<c0459f12>] generic_file_buffered_write+0x101/0x58b [<c05bde3e>] memcpy_toiovec+0x27/0x4a [<c05be51c>] skb_copy_datagram_iovec+0x108/0x1ca [<c045a842>] __generic_file_aio_write_nolock+0x4a6/0x52a [<c048c57b>] iput+0x3d/0x66 [<c048b550>] d_alloc_anon+0x17/0xd3 [<f8adf350>] find_exported_dentry+0x79/0x483 [exportfs] [<c045a9f6>] __generic_file_write_nolock+0x86/0x9a [<c0436067>] autoremove_wake_function+0x0/0x2d [<c0436067>] autoremove_wake_function+0x0/0x2d [<c061d420>] mutex_lock+0xb/0x19 [<c045aa41>] generic_file_writev+0x37/0x96 [<c045aa0a>] generic_file_writev+0x0/0x96 [<c0476061>] do_readv_writev+0x149/0x247 [<c0475944>] do_sync_write+0x0/0xf1 [<c0430e43>] set_current_groups+0x15a/0x166 [<c0476196>] vfs_writev+0x37/0x43 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd] [<c04746be>] __dentry_open+0xea/0x1ab [<c04747c4>] dentry_open+0x45/0x4b [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd] [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd] [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd] [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc] [<f8bf068c>] nfsd+0x17e/0x286 [nfsd] [<f8bf050e>] nfsd+0x0/0x286 [nfsd] [<c0405c53>] kernel_thread_helper+0x7/0x10 ======================= INFO: task nfsd:2358 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000E308 2192 2358 1 2361 2357 (L-TLB) f53d4dcc 00000046 9af0f9cb 0000e308 00000040 d8662998 f8895460 0000000a f53c7550 9af26865 0000e308 00016e9a 00000000 f53c765c c1807200 cdc46ac0 f53c7550 00000000 00000000 00000002 f53c7550 c0436067 f53d4db0 f53d4db0 Call Trace: [<c0436067>] autoremove_wake_function+0x0/0x2d [<c061d545>] __mutex_lock_slowpath+0x4d/0x7c [<c061d583>] .text.lock.mutex+0xf/0x14 [<f8bf619e>] expkey_match+0x49/0x58 [nfsd] [<c045aa32>] generic_file_writev+0x28/0x96 [<c045aa0a>] generic_file_writev+0x0/0x96 [<c0476061>] do_readv_writev+0x149/0x247 [<c0475944>] do_sync_write+0x0/0xf1 [<c0430e43>] set_current_groups+0x15a/0x166 [<c0476196>] vfs_writev+0x37/0x43 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd] [<c04746be>] __dentry_open+0xea/0x1ab [<c04747c4>] dentry_open+0x45/0x4b [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd] [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd] [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd] [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc] [<f8bf068c>] nfsd+0x17e/0x286 [nfsd] [<f8bf050e>] nfsd+0x0/0x286 [nfsd] [<c0405c53>] kernel_thread_helper+0x7/0x10 ======================= INFO: task python:10025 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. python D 0000E308 1936 10025 2665 (NOTLB) c9897eb8 00000086 9c5a6bec 0000e308 00000000 00000000 00000002 00000009 c8138aa0 9c9a2284 0000e308 003fb698 00000000 c8138bac c1807200 cdc46ac0 00000000 c891acd4 00000000 f14f2604 1dc27d84 c042c811 c1807244 1dc27d84 Call Trace: [<c042c811>] getnstimeofday+0x30/0xb6 [<c061d16e>] io_schedule+0x36/0x59 [<f8c725d1>] nfs_wait_bit_uninterruptible+0x5/0x8 [nfs] [<c061d345>] __wait_on_bit+0x33/0x58 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs] [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs] [<c061d3cc>] out_of_line_wait_on_bit+0x62/0x6a [<c0436094>] wake_bit_function+0x0/0x3c [<f8c725c5>] nfs_wait_on_request+0x1b/0x22 [nfs] [<f8c758b8>] nfs_wait_on_requests_locked+0x61/0xa5 [nfs] [<f8c766a4>] nfs_sync_inode_wait+0x4c/0x1ab [nfs] [<f8c6d6f7>] nfs_do_fsync+0x19/0x39 [nfs] [<f8c6de9d>] nfs_file_flush+0x67/0x87 [nfs] [<c04744fa>] filp_close+0x2f/0x54 [<c0475752>] sys_close+0x71/0xa0 [<c0404f17>] syscall_call+0x7/0xb ======================= nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying nfs: server 10.5.1.1 not responding, still trying rpc-srv/tcp: nfsd: got error -104 when sending 140 bytes - shutting down socket nfsd: last server has exited nfsd: unexporting all filesystems NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period
The message INFO: task python:10025 blocked for more than 120 seconds. printed out when the code has been waiting on a semaphore/mutex for a long time is relatively recent and was only introduced in RHEL 5.5.
> Our management software uses a lot of NFS mounts to access it's storage and > sometimes that storage is local, so the NFS connections occur over loopback to > the same server (don't ask). This is a known problematic configuration and there is really no fix for it. ----------------[snip]---------------- INFO: task nfsd:2357 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D 0000E308 2192 2357 1 2358 2356 (L-TLB) f53c89b8 00000046 9ad3cfdd 0000e308 f53c7aa0 f7dab000 003c0ad4 0000000a f53c7aa0 9ad94097 0000e308 000570ba 00000000 f53c7bac c1807200 cdc46ac0 00000100 0000e308 eca3e13c 00000000 1dc27d48 c042c811 c1807244 1dc27d48 Call Trace: [<c042c811>] getnstimeofday+0x30/0xb6 [<c061d16e>] io_schedule+0x36/0x59 [<f8c725d1>] nfs_wait_bit_uninterruptible+0x5/0x8 [nfs] [<c061d345>] __wait_on_bit+0x33/0x58 [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs] [<f8c725cc>] nfs_wait_bit_uninterruptible+0x0/0x8 [nfs] [<c061d3cc>] out_of_line_wait_on_bit+0x62/0x6a [<c0436094>] wake_bit_function+0x0/0x3c [<f8c725c5>] nfs_wait_on_request+0x1b/0x22 [nfs] [<f8c758b8>] nfs_wait_on_requests_locked+0x61/0xa5 [nfs] [<f8c766a4>] nfs_sync_inode_wait+0x4c/0x1ab [nfs] [<f8c6d7aa>] nfs_release_page+0x1e/0x40 [nfs] [<f8c6d78c>] nfs_release_page+0x0/0x40 [nfs] [<c04772d2>] try_to_release_page+0x34/0x46 [<c04608a6>] shrink_inactive_list+0x4a4/0x7d2 [<c0460cca>] shrink_zone+0xf6/0x15b [<c046169f>] try_to_free_pages+0x15b/0x26c [<c045d41a>] __alloc_pages+0x1ce/0x2cf [<c0458ddd>] grab_cache_page_write_begin+0x5c/0x91 [<f887d9e6>] ext3_write_begin+0x55/0x19e [ext3] [<c0459f12>] generic_file_buffered_write+0x101/0x58b [<c05bde3e>] memcpy_toiovec+0x27/0x4a [<c05be51c>] skb_copy_datagram_iovec+0x108/0x1ca [<c045a842>] __generic_file_aio_write_nolock+0x4a6/0x52a [<c048c57b>] iput+0x3d/0x66 [<c048b550>] d_alloc_anon+0x17/0xd3 [<f8adf350>] find_exported_dentry+0x79/0x483 [exportfs] [<c045a9f6>] __generic_file_write_nolock+0x86/0x9a [<c0436067>] autoremove_wake_function+0x0/0x2d [<c0436067>] autoremove_wake_function+0x0/0x2d [<c061d420>] mutex_lock+0xb/0x19 [<c045aa41>] generic_file_writev+0x37/0x96 [<c045aa0a>] generic_file_writev+0x0/0x96 [<c0476061>] do_readv_writev+0x149/0x247 [<c0475944>] do_sync_write+0x0/0xf1 [<c0430e43>] set_current_groups+0x15a/0x166 [<c0476196>] vfs_writev+0x37/0x43 [<f8bf30a9>] nfsd_vfs_write+0xca/0x28a [nfsd] [<c04746be>] __dentry_open+0xea/0x1ab [<c04747c4>] dentry_open+0x45/0x4b [<f8bf38cb>] nfsd_write+0x96/0xab [nfsd] [<f8bf95a9>] nfsd3_proc_write+0xd1/0xeb [nfsd] [<f8bf01a4>] nfsd_dispatch+0xbb/0x1a9 [nfsd] [<f8b0e689>] svc_process+0x3c8/0x633 [sunrpc] [<f8bf068c>] nfsd+0x17e/0x286 [nfsd] [<f8bf050e>] nfsd+0x0/0x286 [nfsd] [<c0405c53>] kernel_thread_helper+0x7/0x10 nfsd needs memory in order to do its work. When the system is flush with dirty NFS pages and low on free memory, the VM subsystem will prefer to flush out those dirty NFS pages in order to free memory so it can give it to nfsd. But there's a chicken and egg problem...those writes can't complete without memory and you can't get memory until the writes are completed. The best I can offer is a "don't do that" -- use bind mounts instead. The other thing you can do is to try and tune the VM subsystem such that it writes out dirty nfs pages more aggressively, but even that won't save you if you're dirtying pages fast enough. I'm going to go ahead and close this as WONTFIX. Please reopen if you want to discuss it further.
(In reply to comment #2) > > Our management software uses a lot of NFS mounts to access it's storage and > > sometimes that storage is local, so the NFS connections occur over loopback to > > the same server (don't ask). > > This is a known problematic configuration and there is really no fix for it. If this operation, mounting NFS over loopback, leads to unreliable behavior, then why does RHEL permit it? Why is it not rejected at the kernel level?
I have the same problem, with a RHEL 5.5 I can`t understand why don't have a solution for this problem....