Bug 653286 - [5.6][REG]for some uses of 'nfsservctl' system call, the kernel crashes.
[5.6][REG]for some uses of 'nfsservctl' system call, the kernel crashes.
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.6
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Jeff Layton
yanfu,wang
: Regression, ZStream
Depends On:
Blocks: 590060 688156 707606 714729
  Show dependency treegraph
 
Reported: 2010-11-15 03:18 EST by Masayoshi Yamazaki
Modified: 2012-02-09 07:12 EST (History)
20 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Under certain circumstances, a crash in the kernel could occur due to a race condition in the lockd_down function, which did not wait for the lockd process to come down. With this update, the lockd_down function has been fixed, and the kernel no longer crashes.
Story Points: ---
Clone Of:
: 709190 (view as bug list)
Environment:
Last Closed: 2011-07-21 05:52:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
potential patch (1.14 KB, patch)
2010-11-26 18:19 EST, Frank Ch. Eigler
no flags Details | Diff
potential patch #2 (12.99 KB, patch)
2010-12-15 16:52 EST, Frank Ch. Eigler
no flags Details | Diff
patch -- make lockd_down wait indefinitely for lockd to come down (3.04 KB, patch)
2011-03-10 15:26 EST, Jeff Layton
no flags Details | Diff

  None (edit)
Comment 1 Frank Ch. Eigler 2010-11-19 21:07:00 EST
Can you describe the hardware on which this occurs, specifically the number and type of processors/cores?

We've been tracking an intermittent occurrence of what may be the same problem, and would appreciate as much information as possible to allow us to reproduce the problem more reliably.
Comment 3 Frank Ch. Eigler 2010-11-26 13:50:00 EST
I have so far been unable to reproduce a crash with the same version of systemtap & kernel.  I'll continue to investigate.
Comment 4 Frank Ch. Eigler 2010-11-26 18:19:54 EST
Created attachment 463163 [details]
potential patch

Please apply this trial patch to your copy of the systemtap runtime:

# cd /usr/share/systemtap
# patch -p1 < foo.patch

Then rerun your test case (with --poison-cache one time, in order
to invalidate prior cached versions of the .ko's with the old runtime):

# stap --poison-cache test.stp -c ./test

Add a   -DDEBUG_TRANS and/or -DSTAP_DEBUG_PRINTK  if the problem recurs, 
to assist the next round of debugging.
Comment 10 Frank Ch. Eigler 2010-11-30 13:09:34 EST
Another item to test is whether the crash occurs near startup vs. shutdown.
Please try 

  stap test.stp -c 'sleep 100'
Comment 12 Frank Ch. Eigler 2010-12-01 23:50:08 EST
These tests should be run on whatever machine displays the original problem.
Another few variants to test, with the patched runtime:

stap test.stp -c 'sleep 2'
stap test.stp -c '/bin/true'   # to represent 0 time
Comment 16 Frank Ch. Eigler 2010-12-02 12:51:18 EST
Can you describe the NFS client/server setup on the machine where this problem occurs, such as NFS daemons running, exports/mounts configured, which of them connected?
Comment 17 Frank Ch. Eigler 2010-12-02 20:43:15 EST
It would be helpful to compare the probes generated from the previous (working?) systemtap-1.1 version and the new 1.3 version.  Could you run

stap -k -p4 ....

on the same machine, both using the previous systemtap and the current one,
then pack up the two /tmp/stap* directories into bugzilla attachments?
Comment 19 Eric Bachalo 2010-12-07 10:37:48 EST
At this point any fix will likely not make rhel 5.6 release timeframe but issue will continue to be worked and this issue will move to z-stream so that fix will be made available for rhel 5.6 when completed.
Comment 20 Frank Ch. Eigler 2010-12-07 16:16:00 EST
It appears that the residual crash may be related to the delayed_work
instance used by runtime/transport/transport.c, associated with the
procfs .ctl file.
Comment 26 Frank Ch. Eigler 2010-12-15 16:52:45 EST
Created attachment 468977 [details]
potential patch #2

Elements of this patch have been committed to upstream systemtap, and found to help the problem.  Please test it and let us know.
Comment 28 Frank Ch. Eigler 2010-12-16 20:47:37 EST
OK, thanks for testing.
Can you attach a new console panic log to this bug?

Can you describe the workload and NFS configuration of the machine, at the time the test script is run?  Do you run the script in isolation (with no other scripts running near the same time)?
Comment 31 Frank Ch. Eigler 2011-01-04 09:41:21 EST
We have never been able to reproduce this problem without a highly concurrent
execution of systemtap, so if indeed the newest patch showed the problem still,
we don't have a fix yet.

(I believe QA would have a similar difficulty in verifying a fix.)

I'm trying to work out some other monitoring script/tool to run on the
affected machine at the same time as this problem triggers, to gather more
information.  (It would be ideal to get remote logon access to it.)
Comment 32 Frank Ch. Eigler 2011-01-04 14:12:08 EST
Does kdump work on the affected machine(s), so that we could collect a crash image?
Comment 34 Frank Ch. Eigler 2011-01-21 13:04:44 EST
Since local reproduction of this exact problem has still not been successful,
I'm hoping to supply you with another timer-event-monitoring script to run
in the background, while the problematic script run is attempted.  The hope
is that this secondary script would save some output in at least the kernel
crash buffer, that we could later try to extract.

One possible timer-monitoring-script could be something like:

stap -t -v -e '
probe kernel.trace("timer*") { 
      log($$name . " " . $$parms$$ . " " . symdata($timer->function))
}' -d kernel -o LOGFILE -c 'SHELL SCRIPT RUNNING OTHER STAP SCRIPT'
Comment 36 Frank Ch. Eigler 2011-01-25 07:20:42 EST
The log file would be interesting for a non-crashing case, but mainly for
comparing with a crashing case.  Is the kernel panic not reproducible
with the timer-monitoring script also running?
Comment 55 Frank Ch. Eigler 2011-02-19 12:21:46 EST
I am continuing to analyze this problem on another RH internal rhts machine
(sun-x4200-01.rhts).  I found an interesting combination of load that triggers
what looks like the same problem.

In this scenario, the testpro/test program is run in a tight shell loop, for
a couple of minutes.  No apparent problem.  Then, in a separate window, I
started a pretty vanilla systemtap script:

     probe timer.s(1) { println (%{ jiffies %}) }

A minute or two later, the same timer-list corruption BUG_ON showed up.
It appears that there is a problem with the nfslock's and systemtap's
frequent kernel timer operations, but only in unison (!).

Then, a further test, with the test binary being run in an even tighter
loop 

    # while true; do date; ./testpro/test; done

showed a kernel crash, with no systemtap activity whatsoever.  This
is finally evidence that, while systemtap's presence seems to amplify the
problem, it is not actually responsible.

See /var/crash/2011-02-19-12:12 on that machine, whose RHTS reservation
is now extended till about Feb. 22.  I'll archive the vmcore
somewhere else too.

I'm reassigning this bug to the kernel.
Comment 60 Frank Ch. Eigler 2011-03-08 23:13:04 EST
Changing the summary line, to remove the possibly-distracting mention of systemtap.
Comment 63 Jeff Layton 2011-03-10 11:30:54 EST
I've been able to reproduce this too by running the reproducer in a tight loop:

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at lib/list_debug.c:31
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 1 
Modules linked in: nfsd exportfs nfs_acl autofs4 hidp rfcomm l2cap bluetooth rpcsec_gss_krb5 auth_rpcgss testmgr_cipher testmgr aead crypto_blkcipher crypto_algapi des lockd sunrpc ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy virtio_net ata_piix libata i2c_piix4 scsi_mod i2c_core pcspkr virtio_balloon tpm_tis tpm serio_raw tpm_bios dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod virtio_blk virtio_pci virtio_ring virtio ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 25099, comm: lockd Tainted: GF     2.6.18-247.el5.jtltest.136debug #1
RIP: 0010:[<ffffffff80165c28>]  [<ffffffff80165c28>] __list_add+0x49/0x65
RSP: 0000:ffff8100081a5ee0  EFLAGS: 00010086
RAX: 0000000000000058 RBX: ffff81003ffb8000 RCX: 0000000000000058
RDX: ffff810007a8cec0 RSI: 0000000000000000 RDI: ffffffff803327e0
RBP: ffffffff885b50a0 R08: 0000000000000002 R09: ffffffff80017f85
R10: ffffffff80017f85 R11: 0000000000000000 R12: ffff81003ffb8000
R13: 000000010595e254 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8100028cf4c8(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000036df6d4670 CR3: 0000000012166000 CR4: 00000000000006e0
Process lockd (pid: 25099, threadinfo ffff8100081a4000, task ffff810007a8cec0)
Stack:  ffff81003106fa48 ffffffff8001df6d ffff81003106fa48 0000000000000286
 000000010595e254 ffff81003df2d6d0 ffff81003106fa48 0000000000000003
 ffffffff8859e37e ffffffff8859e493 ffff81003df2d6d0 0000000000000000
Call Trace:
 [<ffffffff8001df6d>] __mod_timer+0xf2/0x10e
 [<ffffffff8859e37e>] :lockd:lockd+0x0/0x2c6
 [<ffffffff8859e493>] :lockd:lockd+0x115/0x2c6
 [<ffffffff80061079>] child_rip+0xa/0x11
 [<ffffffff800606a8>] restore_args+0x0/0x30
 [<ffffffff800d9697>] zone_statistics+0x3e/0x6d
 [<ffffffff8859e37e>] :lockd:lockd+0x0/0x2c6
 [<ffffffff8006106f>] child_rip+0x0/0x11

-------------------------------[snip]--------------------------------

Turning up nfsd_debug and nlm_debug and running the reproducer shows this:

nfsd: creating service: port 2049 tcp 0 udp 0
nfsd: allocating 32 readahead buffers.
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
NFS locking service started (ver 0.5).
lockd: shutting down host module
lockd: nuking all hosts...
lockd: host garbage collection
lockd: nlmsvc_mark_resources
nfsd: freeing readahead buffers.
nfsd: last server has exited
nfsd: unexporting all filesystems

...so it looks like even though we're not asking to start any threads, we still end up starting up and then shutting down lockd anyway. That should probably be fixed, but doesn't seem immediately harmful.

What's less clear is how we're corrupting the timer list. lockd does set a timer to handle the grace period so perhaps the problem is in how that is being handled.

That said, this is a legacy interface and we isn't generally used under normal circumstances. The exception is when the machine is first booted. If nfsd.ko is not yet plugged in, then we'll use nfsctl() from nfsd. All subsequent accesses generally are done via /proc/fs/nfsd however. nfsctl() is restricted root as well, so an unprivileged user shouldn't be able to exploit this.

I simply don't see why this bug is marked urgent. Can you explain the possible attack vector or normal usecase that would cause someone to trip over this?
Comment 65 Jeff Layton 2011-03-10 13:32:26 EST
Ok, I went and commented out the timer ops that touch the nlm_grace_period_timer, rebuilt lockd.ko and reran the test in a tight loop. It survived for about an hour and a half, which means that the race is probably in that stuff somewhere.

I wonder if we're getting into a state where a new lockd is being started before the old lockd has finished going down. lockd_down does this:

        /*
         * Wait for the lockd process to exit, but since we're holding
         * the lockd semaphore, we can't wait around forever ...
         */
        clear_thread_flag(TIF_SIGPENDING);
        wait_event_timeout(lockd_exit, nlmsvc_pid == 0, HZ);

...so I think it's possible but it seems like we'd probably get a printk if that happens.
Comment 66 Jeff Layton 2011-03-10 14:03:09 EST
Yes, when I change that wait_event_timeout to a wait_event, it seems to survive. Also, when I looked back through the log before the panic, I see this:

    lockd_down: lockd failed to exit, clearing pid

...so yeah, it looks like the old lockd wasn't all the way down before the new one came up. I think the right thing to do is just to not time out while waiting for lockd to come down. I think it's better to risk having lockd_down() hang than to have two lockd's going at the same time.

An alternate idea would be to move the nlm_grace_period_timer to the stack. If each lockd had its own private copy, then this would be unlikely to happen. But, we could still end up with races. The old timer could pop just after the new one is added, and you'd end up with the grace period being far too short.
Comment 67 Jeff Layton 2011-03-10 15:26:47 EST
Created attachment 483556 [details]
patch -- make lockd_down wait indefinitely for lockd to come down

This patch seems to fix the problem for me. The idea is to get rid of the timeout when lockd_down is waiting for lockd to come down. Would you be able to test this against your reproducer and see if it fixes the problem for you too?
Comment 73 Jarod Wilson 2011-03-23 17:44:11 EDT
Patch(es) available in kernel-2.6.18-250.el5
Detailed testing feedback is always welcomed.
Comment 77 yanfu,wang 2011-04-14 04:35:11 EDT
hi,
If someone know any stable reproduce methond? I try to reproduce on RHEL5.6GA using below steps:

# gcc -g -o test test.c (get it from attachment)
# cat test.stp 
probe syscall.nfsservctl {
	log(sprint(cmd))
	log(sprint(argp_uaddr))
	log(sprint(resp_uaddr))
}

# cat test.sh
#!/bin/bash
stap -v test.stp -c ./test

# while true; do date; stap -t -v -e ' probe kernel.trace("timer*") { log($$name . " " . $$parms$$ . " " . symdata($timer->function))} ' -d kernel -o LOGFILE -c ./test.sh; done

It's still running and not crash kernel yet, so is it correct reproducer?

# uname -a
Linux athlon4.rhts.eng.bos.redhat.com 2.6.18-238.el5 #1 SMP Sun Dec 19 14:24:47 EST 2010 i686 athlon i386 GNU/Linux
You have new mail in /var/spool/mail/root
# rpm -qa|grep systemtap
systemtap-runtime-1.3-4.el5
systemtap-1.3-4.el5
Comment 78 yanfu,wang 2011-04-14 04:58:53 EDT
(In reply to comment #77)
> hi,
> If someone know any stable reproduce methond? I try to reproduce on RHEL5.6GA
> using below steps:
> 
> # gcc -g -o test test.c (get it from attachment)
> # cat test.stp 
> probe syscall.nfsservctl {
>  log(sprint(cmd))
>  log(sprint(argp_uaddr))
>  log(sprint(resp_uaddr))
> }
> 
> # cat test.sh
> #!/bin/bash
> stap -v test.stp -c ./test
> 
> # while true; do date; stap -t -v -e ' probe kernel.trace("timer*") {
> log($$name . " " . $$parms$$ . " " . symdata($timer->function))} ' -d kernel -o
> LOGFILE -c ./test.sh; done
> 
> It's still running and not crash kernel yet, so is it correct reproducer?
> 
> # uname -a
> Linux athlon4.rhts.eng.bos.redhat.com 2.6.18-238.el5 #1 SMP Sun Dec 19 14:24:47
> EST 2010 i686 athlon i386 GNU/Linux
> You have new mail in /var/spool/mail/root
> # rpm -qa|grep systemtap
> systemtap-runtime-1.3-4.el5
> systemtap-1.3-4.el5

I can answer my question, got kernel crash just now when running it period of time(less than one hour):
list_add corruption. prev->next should be c081fa54, but was 00000000
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:31!
invalid opcode: 0000 [#1]
SMP 
last sysfs file: /module/xfrm_nalgo/sections/__versions
Modules linked in: stap_6f0ab4882faa5855489d842eabf8c101_14436(U) stap_79061a3e611ba6498b7c01ffbe675c4d_6937(U) nfsd exportfs nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh scsi_mod video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi ac lp snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_device floppy snd_timer snd_page_alloc snd_util_mem snd_hwdep snd pcspkr emu10k1_gp soundcore gameport 3c59x i2c_amd756 mii i2c_core tpm_tis tpm parport_pc parport tpm_bios dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<c04f3e19>]    Tainted: G S    VLI
EFLAGS: 00010046   (2.6.18-238.el5 #1) 
EIP is at __list_add+0x39/0x52
eax: 00000048   ebx: c081fa54   ecx: 00000096   edx: 00000000
esi: f0d55550   edi: f0d55550   ebp: 00d66178   esp: dc2bdfa0
ds: 007b   es: 007b   ss: 0068
Process lockd (pid: 13363, ti=dc2bd000 task=dbe61550 task.ti=dc2bd000)
Stack: c0650890 c081fa54 00000000 c081ef80 f0d55550 c081ef80 c042e064 00000000 
       00000282 00d66178 d7fe1200 dc3808e0 00000000 f0d47f56 f0d47e69 00000000 
       00000000 c0405c87 d7fe1200 00000000 00000000 00000000 00000000 00000000 
Call Trace:
 [<c042e064>] __mod_timer+0xcf/0xe4
 [<f0d47f56>] lockd+0xed/0x26f [lockd]
 [<f0d47e69>] lockd+0x0/0x26f [lockd]
 [<c0405c87>] kernel_thread_helper+0x7/0x10
 =======================
Code: 74 17 50 52 68 53 08 65 c0 e8 bf 20 f3 ff 0f 0b 1a 00 05 08 65 c0 83 c4 0c 8b 06 39 d8 74 17 50 53 68 90 08 65 c0 e8 a2 20 f3 ff <0f> 0b 1f 00 05 08 65 c0 83 c4 0c 89 7b 04 89 1f 89 77 04 89 3e 
EIP: [<c04f3e19>] __list_add+0x39/0x52 SS:ESP 0068:dc2bdfa0
 <0>Kernel panic - not syncing: Fatal exception


So I will test on new kernel and paste results later.
Comment 80 yanfu,wang 2011-04-14 07:03:38 EDT
verified result:
I run the reproducer of comment #77 with almost two hours on i386 and x86_64
arch, servers work fine and no other issue found till now. 
If need any extra testing pls let me know.
[root@athlon4 ~]# uname -a
Linux athlon4.rhts.eng.bos.redhat.com 2.6.18-256.el5 #1 SMP Thu Apr 7 19:59:40
EDT 2011 i686 athlon i386 GNU/Linux
Comment 84 yanfu,wang 2011-05-30 06:18:44 EDT
hi,
I re-test the newest kernel 2.6.18-264.el5 on RHEL5.7-Server-20110525.0 x86_64, and got kernel crash again, the console shown as below, pls help to check:

BUG: soft lockup - CPU#2 stuck for 60s! [init:1]
CPU 2:
Modules linked in: stap_ce81c73dec031bff99dc861d2e5c0fd8_14132(U) stap_b1a252a1d53a35fa01915881e7637775_6997(U) nfsd exportfs nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport i2c_piix4 i2c_core tpm_tis tpm amd64_edac_mod bnx2 k10temp tpm_bios serio_raw edac_mc sg hwmon pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 1, comm: init Tainted: G     ---- 2.6.18-264.el5 #1
RIP: 0010:[<ffffffff80064bca>]  [<ffffffff80064bca>] .text.lock.spinlock+0x0/0x30
RSP: 0018:ffff810107b73f18  EFLAGS: 00000286
RAX: 000000000000bbe3 RBX: ffffffff80321a00 RCX: ffff810107b3a000
RDX: 0000000000000000 RSI: ffff81000100ea80 RDI: ffffffff80321a80
RBP: ffff810107b73e90 R08: ffff810219fe3cf8 R09: ffff8103cdb1e000
R10: 0000000000000001 R11: ffff810107b3bb20 R12: ffffffff8005dc8e
R13: ffff81000100ea80 R14: ffffffff80079390 R15: ffff810107b73e90
FS:  00002b16274b25b0(0000) GS:ffff810107b193c0(0000) knlGS:00000000f7f166c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000042ae4000 CR3: 0000000217805000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff800a0e3a>] __rcu_process_callbacks+0xe4/0x1a1
 [<ffffffff80096d3d>] tasklet_action+0x8f/0x12b
 [<ffffffff80012562>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006d636>] do_softirq+0x2c/0x7d
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff887d432b>] :stap_b1a252a1d53a35fa01915881e7637775_6997:lookup_bad_addr+0x35/0xbf
 [<ffffffff887d432b>] :stap_b1a252a1d53a35fa01915881e7637775_6997:lookup_bad_addr+0x35/0xbf
 [<ffffffff887d68f4>] :stap_b1a252a1d53a35fa01915881e7637775_6997:function__dwarf_pretty_print_0+0x167/0x5f6
 [<ffffffff887d8e56>] :stap_b1a252a1d53a35fa01915881e7637775_6997:probe_1996+0xb0/0x2e8
 [<ffffffff887d7809>] :stap_b1a252a1d53a35fa01915881e7637775_6997:enter_tracepoint_probe_2+0x17c/0x307
 [<ffffffff80048482>] init_timer+0x23/0x51
 [<ffffffff8006387c>] schedule_timeout+0x77/0xad
 [<ffffffff80049603>] add_wait_queue+0x24/0x34
 [<ffffffff80099dc9>] process_timeout+0x0/0x5
 [<ffffffff8002dbe4>] pipe_poll+0x2d/0x90
 [<ffffffff80011ba9>] do_select+0x3dc/0x43c
 [<ffffffff8001ed44>] __pollwait+0x0/0xe2
 [<ffffffff8008e85a>] default_wake_function+0x0/0xe
 [<ffffffff8012d57a>] avc_has_perm_noaudit+0x35f/0x38a
 [<ffffffff8012e039>] avc_has_perm+0x46/0x58
 [<ffffffff8012e039>] avc_has_perm+0x46/0x58
 [<ffffffff8012e90d>] inode_has_perm+0x56/0x63
 [<ffffffff8000a851>] __link_path_walk+0xfa8/0xfd1
 [<ffffffff800ec0e9>] core_sys_select+0x1bc/0x265
 [<ffffffff8002cc54>] mntput_no_expire+0x19/0x89
 [<ffffffff8001b035>] cp_new_stat+0xe5/0xfd
 [<ffffffff80016a61>] sys_select+0x153/0x17c
 [<ffffffff8005d116>] system_call+0x7e/0x83

BUG: soft lockup - CPU#3 stuck for 60s! [swapper:0]
CPU 3:
Modules linked in: stap_ce81c73dec031bff99dc861d2e5c0fd8_14132(U) stap_b1a252a1d53a35fa01915881e7637775_6997(U) nfsd exportfs nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport i2c_piix4 i2c_core tpm_tis tpm amd64_edac_mod bnx2 k10temp tpm_bios serio_raw edac_mc sg hwmon pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Tainted: G     ---- 2.6.18-264.el5 #1
RIP: 0010:[<ffffffff80064bcc>]  [<ffffffff80064bcc>] .text.lock.spinlock+0x2/0x30
RSP: 0000:ffff810107b9ff18  EFLAGS: 00000286
RAX: 000000000000bbe3 RBX: ffffffff80321a00 RCX: 00000000000000ff
RDX: 0000000000000000 RSI: ffff81023706ba80 RDI: ffffffff80321a80
RBP: ffff810107b9fe90 R08: 000000000301fa9c R09: ffff810107b97e48
R10: 0000000000000001 R11: 0000000000000010 R12: ffffffff8005dc8e
R13: ffff81023706ba80 R14: ffffffff80079390 R15: ffff810107b9fe90
FS:  00002b7694a366e0(0000) GS:ffff8102371cc6c0(0000) knlGS:00000000f7f698d0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000399027b110 CR3: 0000000000201000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff800a0e3a>] __rcu_process_callbacks+0xe4/0x1a1
 [<ffffffff80096d3d>] tasklet_action+0x8f/0x12b
 [<ffffffff80012562>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006d636>] do_softirq+0x2c/0x7d
 [<ffffffff8006bdf3>] default_idle+0x0/0x50
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8006be1c>] default_idle+0x29/0x50
 [<ffffffff80048fc8>] cpu_idle+0x95/0xb8
 [<ffffffff80078a9a>] start_secondary+0x479/0x488

BUG: soft lockup - CPU#6 stuck for 60s! [swapper:0]
CPU 6:
Modules linked in: stap_ce81c73dec031bff99dc861d2e5c0fd8_14132(U) stap_b1a252a1d53a35fa01915881e7637775_6997(U) nfsd exportfs nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport i2c_piix4 i2c_core tpm_tis tpm amd64_edac_mod bnx2 k10temp tpm_bios serio_raw edac_mc sg hwmon pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Tainted: G     ---- 2.6.18-264.el5 #1
RIP: 0010:[<ffffffff80064bcf>]  [<ffffffff80064bcf>] .text.lock.spinlock+0x5/0x30
RSP: 0018:ffff81022fc43f18  EFLAGS: 00000286
RAX: 000000000000bbe3 RBX: ffffffff80321a00 RCX: ffff81022fc38fd8
RDX: 0000000000000000 RSI: ffff81000101f780 RDI: ffffffff80321a80
RBP: ffff81022fc43e90 R08: 000000000301fa9e R09: ffff81022fc3fe48
R10: 0000000000000001 R11: 0000000000000010 R12: ffffffff8005dc8e
R13: ffff81000101f780 R14: ffffffff80079390 R15: ffff81022fc43e90
FS:  00002abf63f4e4a0(0000) GS:ffff810107bf5dc0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002abf63249980 CR3: 0000000000201000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff800a0e3a>] __rcu_process_callbacks+0xe4/0x1a1
 [<ffffffff80096d3d>] tasklet_action+0x8f/0x12b
 [<ffffffff80012562>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006d636>] do_softirq+0x2c/0x7d
 [<ffffffff8006bdf3>] default_idle+0x0/0x50
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8006be1c>] default_idle+0x29/0x50
 [<ffffffff80048fc8>] cpu_idle+0x95/0xb8
 [<ffffffff80078a9a>] start_secondary+0x479/0x488

BUG: soft lockup - CPU#0 stuck for 60s! [swapper:0]
CPU 0:
Modules linked in: stap_ce81c73dec031bff99dc861d2e5c0fd8_14132(U) stap_b1a252a1d53a35fa01915881e7637775_6997(U) nfsd exportfs nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport i2c_piix4 i2c_core tpm_tis tpm amd64_edac_mod bnx2 k10temp tpm_bios serio_raw edac_mc sg hwmon pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Tainted: G     ---- 2.6.18-264.el5 #1
RIP: 0010:[<ffffffff80064bcf>]  [<ffffffff80064bcf>] .text.lock.spinlock+0x5/0x30
RSP: 0000:ffffffff804a9f18  EFLAGS: 00000286
RAX: 000000000000bbe3 RBX: ffffffff80321a00 RCX: 00000000000000ff
RDX: 0000000000000000 RSI: ffff810001006400 RDI: ffffffff80321a80
RBP: ffffffff804a9e90 R08: 000000000301fa9c R09: ffffffff8045bee8
R10: 0000000000000001 R11: 0000000000000010 R12: ffffffff8005dc8e
R13: ffff810001006400 R14: ffffffff80079390 R15: ffffffff804a9e90
FS:  000000004171e940(0000) GS:ffffffff8042a000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000042ae2000 CR3: 000000042addb000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff800a0e3a>] __rcu_process_callbacks+0xe4/0x1a1
 [<ffffffff80096d3d>] tasklet_action+0x8f/0x12b
 [<ffffffff80012562>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006d636>] do_softirq+0x2c/0x7d
 [<ffffffff8006bdf3>] default_idle+0x0/0x50
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8006be1c>] default_idle+0x29/0x50
 [<ffffffff80048fc8>] cpu_idle+0x95/0xb8
 [<ffffffff80465809>] start_kernel+0x220/0x225
 [<ffffffff8046522f>] _sinittext+0x22f/0x236

BUG: soft lockup - CPU#5 stuck for 60s! [swapper:0]
CPU 5:
Modules linked in: stap_ce81c73dec031bff99dc861d2e5c0fd8_14132(U) stap_b1a252a1d53a35fa01915881e7637775_6997(U) nfsd exportfs nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport i2c_piix4 i2c_core tpm_tis tpm amd64_edac_mod bnx2 k10temp tpm_bios serio_raw edac_mc sg hwmon pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Tainted: G     ---- 2.6.18-264.el5 #1
RIP: 0010:[<ffffffff80064bcc>]  [<ffffffff80064bcc>] .text.lock.spinlock+0x2/0x30
RSP: 0018:ffff81022fc17f18  EFLAGS: 00000286
RAX: 000000000000bbe3 RBX: ffffffff80321a00 RCX: ffff810237234fd8
RDX: 0000000000000000 RSI: ffff810237074100 RDI: ffffffff80321a80
RBP: ffff81022fc17e90 R08: 000000000301fa9e R09: ffff81022fc13e48
R10: 0000000000000001 R11: 0000000000000246 R12: ffffffff8005dc8e
R13: ffff810237074100 R14: ffffffff80079390 R15: ffff81022fc17e90
FS:  00002aab92c1caf0(0000) GS:ffff810237255e40(0000) knlGS:00000000f7fbe6c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b013a08da08 CR3: 000000022e4ac000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff800a0e3a>] __rcu_process_callbacks+0xe4/0x1a1
 [<ffffffff80096d3d>] tasklet_action+0x8f/0x12b
 [<ffffffff80012562>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006d636>] do_softirq+0x2c/0x7d
 [<ffffffff8006bdf3>] default_idle+0x0/0x50
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8006be1c>] default_idle+0x29/0x50
 [<ffffffff80048fc8>] cpu_idle+0x95/0xb8
 [<ffffffff80078a9a>] start_secondary+0x479/0x488

BUG: soft lockup - CPU#4 stuck for 60s! [kstopmachine:7930]
CPU 4:
Modules linked in: stap_ce81c73dec031bff99dc861d2e5c0fd8_14132(U) stap_b1a252a1d53a35fa01915881e7637775_6997(U) nfsd exportfs nfs_acl auth_rpcgss autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table mperf be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi loop dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport i2c_piix4 i2c_core tpm_tis tpm amd64_edac_mod bnx2 k10temp tpm_bios serio_raw edac_mc sg hwmon pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 7930, comm: kstopmachine Tainted: G     ---- 2.6.18-264.el5 #1
RIP: 0010:[<ffffffff80064bcc>]  [<ffffffff80064bcc>] .text.lock.spinlock+0x2/0x30
RSP: 0000:ffff810107bcff18  EFLAGS: 00000286
RAX: 000000000000bbe3 RBX: ffffffff80321a00 RCX: 00000000000000ff
RDX: 0000000000000000 RSI: ffff810001017100 RDI: ffffffff80321a80
RBP: ffff810107bcfe90 R08: 000000000301fa9c R09: ffffffff8005e2fc
R10: 0000000000000001 R11: ffff810418706860 R12: ffffffff8005dc8e
R13: ffff810001017100 R14: ffffffff80079390 R15: ffff810107bcfe90
FS:  00002b536f20daf0(0000) GS:ffff810107b19140(0000) knlGS:00000000f7cd46c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000424cf958 CR3: 000000021631e000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff800a0e3a>] __rcu_process_callbacks+0xe4/0x1a1
 [<ffffffff80096d3d>] tasklet_action+0x8f/0x12b
 [<ffffffff80012562>] __do_softirq+0x89/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006d636>] do_softirq+0x2c/0x7d
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8000ad2a>] kmem_cache_alloc+0x6e/0x76
 [<ffffffff8001f9a6>] copy_process+0x7d0/0x15eb
 [<ffffffff800a0b1a>] alloc_pid+0x228/0x2c4
 [<ffffffff80031074>] do_fork+0x69/0x1c1
 [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80063177>] wait_for_completion+0x8f/0xa2
 [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8005df3d>] kernel_thread+0x81/0xeb
 [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800b3713>] stopmachine+0x0/0x62
 [<ffffffff8005dfa7>] child_rip+0x0/0x11
 [<ffffffff800b35f7>] do_stop+0x75/0x191
 [<ffffffff800b3582>] do_stop+0x0/0x191
 [<ffffffff80032732>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800a2be7>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032634>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11
Comment 85 yanfu,wang 2011-05-30 06:22:07 EDT
additional info: I run the reproducer in a tight loop about three hours.
Comment 86 Frank Ch. Eigler 2011-05-30 06:26:22 EDT
I suspect this is a separate problem, because timers don't appear
in any of the backtraces.  I'd appreciate it if you opened a new
bug to track this, along with the two systemtap scripts apparently
being run together.

This part hints at a locking problem somewhere else in the system:

 :stap_b1a252a1d53a35fa01915881e7637775_6997:lookup_bad_addr+0x35/0xbf
 [<ffffffff887d432b>]
Comment 87 Jeff Layton 2011-05-30 07:14:29 EDT
Agreed. This doesn't look related to the original issue.

Note that it's not actually necessary to use systemtap to reproduce the original problem. I reproduced it without systemtap by running the script in a tight loop. systemtap might make the problem more reproducible by changing the timing, but it really has nothing to do with this particular bug.
Comment 88 yanfu,wang 2011-05-30 21:58:59 EDT
change the status to Verified and file a new bug 709190 to track the locking problem according to above comments.
Comment 89 Martin Prpič 2011-06-02 09:30:27 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Under certain circumstances, a crash in the kernel could occur due to a race condition in the lockd_down function, which did not wait for the lockd process to come down. With this update, the lockd_down function has been fixed, and the kernel no longer crashes.
Comment 95 errata-xmlrpc 2011-07-21 05:52:55 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html

Note You need to log in before you can comment on or make changes to this bug.