Hide Forgot
Description of problem: I am mounting several directories using NFSv4 (system authentication) from NFS server (NetApp NAS filer) and I am experiencing frequent system hangs when the following is printed on the system console: 2011-03-20T20:07:52.657919+01:00 pisces kernel: INFO: task rpciod/6:833 blocked for more than 120 seconds. 2011-03-20T20:07:52.658055+01:00 pisces kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2011-03-20T20:07:52.670631+01:00 pisces kernel: rpciod/6 D ffff8102eb520050 0 833 307 834 832 (L-TLB) 2011-03-20T20:07:52.677520+01:00 pisces kernel: ffff81087ce75c40 0000000000000046 0000000000000006 ffffffff800fb1de 2011-03-20T20:07:52.683742+01:00 pisces kernel: ffff8102f1ab02f0 000000000000000a ffff81087de7d7e0 ffff81087c612820 2011-03-20T20:07:52.689972+01:00 pisces kernel: 000e2aa8f01d9b39 000000000000098e ffff81087de7d9c8 0000000600000001 2011-03-20T20:07:52.690047+01:00 pisces kernel: Call Trace: 2011-03-20T20:07:52.695671+01:00 pisces kernel: [<ffffffff800fb1de>] ep_poll_callback+0x64/0x8b 2011-03-20T20:07:52.695739+01:00 pisces kernel: [<ffffffff8002e283>] __wake_up+0x38/0x4f 2011-03-20T20:07:52.704515+01:00 pisces kernel: [<ffffffff880be996>] :nfs:nfs_idmap_id+0x189/0x209 2011-03-20T20:07:52.704596+01:00 pisces kernel: [<ffffffff8008e40a>] default_wake_function+0x0/0xe 2011-03-20T20:07:52.714428+01:00 pisces kernel: [<ffffffff880b97fc>] :nfs:decode_getfattr+0x73c/0xc5e 2011-03-20T20:07:52.719541+01:00 pisces kernel: [<ffffffff880ba9b4>] :nfs:nfs4_xdr_dec_write+0x0/0xbd 2011-03-20T20:07:52.724834+01:00 pisces kernel: [<ffffffff8803dca0>] :sunrpc:rpc_async_schedule+0x0/0x5 2011-03-20T20:07:52.730049+01:00 pisces kernel: [<ffffffff880baa67>] :nfs:nfs4_xdr_dec_write+0xb3/0xbd 2011-03-20T20:07:52.735071+01:00 pisces kernel: [<ffffffff8803890d>] :sunrpc:call_decode+0x4b7/0x509 2011-03-20T20:07:52.735137+01:00 pisces kernel: [<ffffffff8803dae4>] :sunrpc:__rpc_execute+0x92/0x24e 2011-03-20T20:07:52.744531+01:00 pisces kernel: [<ffffffff8004d7d0>] run_workqueue+0x99/0xf6 2011-03-20T20:07:52.744608+01:00 pisces kernel: [<ffffffff8004a018>] worker_thread+0x0/0x122 2011-03-20T20:07:52.753842+01:00 pisces kernel: [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4 2011-03-20T20:07:52.753908+01:00 pisces kernel: [<ffffffff8004a108>] worker_thread+0xf0/0x122 2011-03-20T20:07:52.763107+01:00 pisces kernel: [<ffffffff8008e40a>] default_wake_function+0x0/0xe 2011-03-20T20:07:52.763175+01:00 pisces kernel: [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4 2011-03-20T20:07:52.773137+01:00 pisces kernel: [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4 2011-03-20T20:07:52.773203+01:00 pisces kernel: [<ffffffff80032996>] kthread+0xfe/0x132 2011-03-20T20:07:52.780948+01:00 pisces kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 2011-03-20T20:07:52.781016+01:00 pisces kernel: [<ffffffff800a269c>] keventd_create_kthread+0x0/0xc4 2011-03-20T20:07:52.789760+01:00 pisces kernel: [<ffffffff80032898>] kthread+0x0/0x132 2011-03-20T20:07:52.789817+01:00 pisces kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 2011-03-20T20:07:52.794055+01:00 pisces kernel: 2011-03-20T20:07:52.799621+01:00 pisces kernel: INFO: task adclient:3918 blocked for more than 120 seconds. 2011-03-20T20:07:52.799690+01:00 pisces kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2011-03-20T20:07:52.812675+01:00 pisces kernel: adclient D ffff810001036b20 0 3918 1 3919 3917 (NOTLB) 2011-03-20T20:07:52.819828+01:00 pisces kernel: ffff8108775efbb8 0000000000000086 ffff810697135148 ffff81087cf9dc00 2011-03-20T20:07:52.826198+01:00 pisces kernel: 0000000000000286 000000000000000a ffff81087b85b100 ffff81011de58080 2011-03-20T20:07:52.832659+01:00 pisces kernel: 000e2aafbbd86a3d 00000000000067a8 ffff81087b85b2e8 0000000600000286 2011-03-20T20:07:52.832777+01:00 pisces kernel: Call Trace: 2011-03-20T20:07:52.838891+01:00 pisces kernel: [<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90 2011-03-20T20:07:52.844801+01:00 pisces kernel: [<ffffffff880a6941>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd 2011-03-20T20:07:52.844864+01:00 pisces kernel: [<ffffffff800637ca>] io_schedule+0x3f/0x67 2011-03-20T20:07:52.854822+01:00 pisces kernel: [<ffffffff880a694a>] :nfs:nfs_wait_bit_uninterruptible+0x9/0xd 2011-03-20T20:07:52.854925+01:00 pisces kernel: [<ffffffff800639f6>] __wait_on_bit+0x40/0x6e 2011-03-20T20:07:52.859120+01:00 pisces kernel: [<ffffffff8009a446>] recalc_sigpending+0xe/0x25 2011-03-20T20:07:52.875885+01:00 pisces kernel: [<ffffffff880a6941>] :nfs:nfs_wait_bit_uninterruptible+0x0/0xd 2011-03-20T20:07:52.875980+01:00 pisces kernel: [<ffffffff80063a90>] out_of_line_wait_on_bit+0x6c/0x78 2011-03-20T20:07:52.885654+01:00 pisces kernel: [<ffffffff800a28e2>] wake_bit_function+0x0/0x23 ... The system becomes unresponsive and the only help is to reboot the system. How reproducible: There is no clear way how to reproduce the error. Additional info:
I believe I am experiencing the same behavior as http://www.spinics.net/lists/linux-nfs/msg13022.html The NFS client is also diskless in my case but it is using NFSv3 to mount the root FS so it should not cause idmapper to hang.
The problem has also been described here: https://bugzilla.redhat.com/show_bug.cgi?id=483365 I wonder why was the fix declined?
Created attachment 487391 [details] diskless machine pisces log (2 occurences)
Created attachment 487392 [details] diskless machine sagitta log (1 occurence)
Attached logs from the affected machines (3 occurences in total). Note - the process 'adclient' is a daemon for the NSS library - provides username/uid mappings (so it is indirectly being used by the rpc.idmapd). No system vmcore available unfortunately.
I am able to replicate the problem on RHEL-6 (kernel Linux data 2.6.32-71.el6.x86_64) too. In fact RHEL-6 in the similar configuration hangs almost immediately. SysRq trace is attached. Could you tell me if we are experiencing a similar issue or is this one completely different?
Created attachment 496471 [details] /var/log/messages from a diskless system running RHEL-6 /var/log/messages file containing the SysRq traces
Hi Ondrej, Redhat's bugzilla is not really meant to be used to go around the normal Red Hat support channel. You should talk with your Red Hat service people and open a ticket so they can help gather information and prioritize the response. If you are not a red hat customer, please report this as well to the upstream nfs developer list. Thanks!
Hi Ric, I am a RedHat customer and I have opened service case (#00441148) already few months ago. No satisfactory update yet. This case has been opened so that other people can see the problem. Ondrej
Ondrej, bz 609252 deals with the issue you reported on RHEL 5. Sachin Prabhu
There's little info to go on here, but this is almost certainly a duplicate of bug 609252. Marking it as such... *** This bug has been marked as a duplicate of bug 609252 ***