Hide Forgot
Description of problem: System appears to just stop/pause Version-Release number of selected component (if applicable): 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Not yet able to trigger Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Multiple lockups in a short period of time: Linux 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux Mar 23 12:13:57 web3 kernel: BUG: soft lockup - CPU#1 stuck for 11s! [pmcd:2172] Mar 23 12:14:09 web3 kernel: CPU 1: Mar 23 12:14:09 web3 kernel: Modules linked in: ipv6 xfrm_nalgo crypto_api bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy 8139too 8139cp ide_cd i2c_piix4 mii pcspkr serio_raw cdrom i2c_core ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Mar 23 12:14:09 web3 kernel: Pid: 2172, comm: pmcd Not tainted 2.6.18-92.1.6.el5 #1 Mar 23 12:14:30 web3 kernel: RIP: 0010:[<ffffffff80011ec7>] [<ffffffff80011ec7>] __do_softirq+0x53/0xd6 Mar 23 12:14:30 web3 kernel: RSP: 0018:ffff81000276bf60 EFLAGS: 00000206 Mar 23 12:14:30 web3 kernel: RAX: 0000000000000022 RBX: ffffffff803b6f80 RCX: 0000000000000500 Mar 23 12:14:30 web3 kernel: RDX: ffff81006be95fd8 RSI: 0000000000000080 RDI: ffff81006c0887e0 Mar 23 12:14:30 web3 kernel: RBP: ffff81000276bee0 R08: 0000000000000001 R09: 000000000000003f Mar 23 12:14:30 web3 kernel: R10: ffff81007ff0c008 R11: 0000000000000000 R12: ffffffff8005dc8e Mar 23 12:14:30 web3 kernel: R13: 0000000000000022 R14: ffffffff80077090 R15: ffff81000276bee0 Mar 23 12:14:30 web3 kernel: FS: 00002ac4a3c64da0(0000) GS:ffff810002741840(0000) knlGS:0000000000000000 Mar 23 12:14:30 web3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 23 12:15:31 web3 kernel: CR2: 00002aaaaabca1e0 CR3: 000000006bdab000 CR4: 00000000000006e0 Mar 23 12:16:47 web3 kernel: Mar 23 12:16:52 web3 kernel: Call Trace: Mar 23 12:17:07 web3 kernel: <IRQ> [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 Mar 23 12:17:23 web3 kernel: [<ffffffff8006c6e4>] do_softirq+0x2c/0x85 Mar 23 12:17:26 web3 kernel: [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c Mar 23 12:17:27 web3 kernel: <EOI> [<ffffffff800a5184>] s_show+0x167/0x293 Mar 23 12:17:27 web3 kernel: [<ffffffff800a509e>] s_show+0x81/0x293 Mar 23 12:17:29 web3 kernel: [<ffffffff8003f209>] seq_read+0x1b8/0x28c Mar 23 12:17:31 web3 kernel: [<ffffffff8000b337>] vfs_read+0xcb/0x171 Mar 23 12:17:31 web3 kernel: [<ffffffff80011715>] sys_read+0x45/0x6e Mar 23 12:17:31 web3 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Mar 23 12:17:38 web3 kernel: Mar 23 12:17:38 web3 kernel: BUG: soft lockup - CPU#1 stuck for 50s! [pmcd:2172] Mar 23 12:17:39 web3 kernel: CPU 1: Mar 23 12:17:39 web3 kernel: Modules linked in: ipv6 xfrm_nalgo crypto_api bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy 8139too 8139cp ide_cd i2c_piix4 mii pcspkr serio_raw cdrom i2c_core ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Mar 23 12:17:42 web3 kernel: Pid: 2172, comm: pmcd Not tainted 2.6.18-92.1.6.el5 #1 Mar 23 12:17:43 web3 kernel: RIP: 0010:[<ffffffff800a5184>] [<ffffffff800a5184>] s_show+0x167/0x293 Mar 23 12:17:45 web3 kernel: RSP: 0018:ffff81006be95e58 EFLAGS: 00000206 Mar 23 12:17:45 web3 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81007ffeeee0 Mar 23 12:17:47 web3 kernel: RDX: 0000000000000001 RSI: 0000000000000028 RDI: ffff81007ffeef00 Mar 23 12:17:47 web3 kernel: RBP: 000000008005d67c R08: 0000000000000006 R09: 0000000000000020 Mar 23 12:17:48 web3 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 Mar 23 12:17:48 web3 kernel: R13: 00002ac4a42c3000 R14: 0000000000000020 R15: 0000000000000000 Mar 23 12:17:48 web3 kernel: FS: 00002ac4a3c64da0(0000) GS:ffff810002741840(0000) knlGS:0000000000000000 Mar 23 12:17:49 web3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 23 12:17:50 web3 kernel: CR2: 00002aaaaabca1e0 CR3: 000000006bdab000 CR4: 00000000000006e0 Mar 23 12:17:50 web3 kernel: Mar 23 12:17:51 web3 kernel: Call Trace: Mar 23 12:17:51 web3 kernel: [<ffffffff800a509e>] s_show+0x81/0x293 Mar 23 12:17:52 web3 kernel: [<ffffffff8003f209>] seq_read+0x1b8/0x28c Mar 23 12:17:55 web3 kernel: [<ffffffff8000b337>] vfs_read+0xcb/0x171 Mar 23 12:17:57 web3 kernel: [<ffffffff80011715>] sys_read+0x45/0x6e Mar 23 12:18:00 web3 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Mar 23 12:18:01 web3 kernel: Mar 23 12:18:03 web3 kernel: BUG: soft lockup - CPU#1 stuck for 17s! [pmcd:2172] Mar 23 12:18:08 web3 kernel: CPU 1: Mar 23 12:18:12 web3 kernel: Modules linked in: ipv6 xfrm_nalgo crypto_api bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy 8139too 8139cp ide_cd i2c_piix4 mii pcspkr serio_raw cdrom i2c_core ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Mar 23 12:18:14 web3 kernel: Pid: 2172, comm: pmcd Not tainted 2.6.18-92.1.6.el5 #1 Mar 23 12:18:18 web3 kernel: RIP: 0010:[<ffffffff800a5184>] [<ffffffff800a5184>] s_show+0x167/0x293 Mar 23 12:18:20 web3 kernel: RSP: 0018:ffff81006be95e58 EFLAGS: 00000206 Mar 23 12:18:21 web3 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81007fccb6e0 Mar 23 12:18:24 web3 kernel: RDX: 0000000000000001 RSI: 0000000000000005 RDI: ffff81007fccb700 Mar 23 12:18:24 web3 kernel: RBP: 0000000000000000 R08: 0000000000000006 R09: 0000000000000020 Mar 23 12:18:26 web3 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff81007dc33740 Mar 23 12:18:26 web3 kernel: R13: 00002ac4a42c3000 R14: 0000000000000020 R15: 0000000000000000 Mar 23 12:18:27 web3 kernel: FS: 00002ac4a3c64da0(0000) GS:ffff810002741840(0000) knlGS:0000000000000000 Mar 23 12:18:28 web3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 23 12:18:28 web3 kernel: CR2: 00002ac4a42c5000 CR3: 000000006bdab000 CR4: 00000000000006e0 Mar 23 12:18:28 web3 kernel: Mar 23 12:18:28 web3 kernel: Call Trace: Mar 23 12:18:29 web3 kernel: [<ffffffff800a509e>] s_show+0x81/0x293 Mar 23 12:18:29 web3 kernel: [<ffffffff8003f209>] seq_read+0x1b8/0x28c Mar 23 12:18:30 web3 kernel: [<ffffffff8000b337>] vfs_read+0xcb/0x171 Mar 23 12:18:30 web3 kernel: [<ffffffff80011715>] sys_read+0x45/0x6e Mar 23 12:18:30 web3 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Mar 23 12:18:31 web3 kernel: Mar 23 12:18:31 web3 kernel: BUG: soft lockup - CPU#1 stuck for 10s! [pmcd:2172] Mar 23 12:18:32 web3 kernel: CPU 1: Mar 23 12:18:34 web3 kernel: Modules linked in: ipv6 xfrm_nalgo crypto_api bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy 8139too 8139cp ide_cd i2c_piix4 mii pcspkr serio_raw cdrom i2c_core ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Mar 23 12:18:35 web3 kernel: Pid: 2172, comm: pmcd Not tainted 2.6.18-92.1.6.el5 #1 Mar 23 12:18:37 web3 kernel: RIP: 0010:[<ffffffff800a5184>] [<ffffffff800a5184>] s_show+0x167/0x293 Mar 23 12:18:41 web3 kernel: RSP: 0018:ffff81006be95e58 EFLAGS: 00000202 Mar 23 12:18:42 web3 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81007ffd07e0 Mar 23 12:18:42 web3 kernel: RDX: 0000000000000001 RSI: 0000000000000007 RDI: ffff81007ffd0800 Mar 23 12:18:43 web3 kernel: RBP: 000000008005d67c R08: 0000000000000006 R09: 0000000000000020 Mar 23 12:18:44 web3 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 Mar 23 12:18:44 web3 kernel: R13: 00002ac4a42c3000 R14: 0000000000000020 R15: 0000000000000000 Mar 23 12:18:44 web3 kernel: FS: 00002ac4a3c64da0(0000) GS:ffff810002741840(0000) knlGS:0000000000000000 Mar 23 12:18:45 web3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 23 12:18:47 web3 kernel: CR2: 00002ac4a42c5000 CR3: 000000006bdab000 CR4: 00000000000006e0 Mar 23 12:18:47 web3 kernel: Mar 23 12:18:48 web3 kernel: Call Trace: Mar 23 12:18:52 web3 kernel: [<ffffffff800a509e>] s_show+0x81/0x293 Mar 23 12:18:54 web3 kernel: [<ffffffff8003f209>] seq_read+0x1b8/0x28c Mar 23 12:18:54 web3 kernel: [<ffffffff8000b337>] vfs_read+0xcb/0x171 Mar 23 12:18:55 web3 kernel: [<ffffffff80011715>] sys_read+0x45/0x6e Mar 23 12:18:56 web3 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Mar 23 12:18:57 web3 kernel: Mar 23 12:18:59 web3 kernel: BUG: soft lockup - CPU#1 stuck for 41s! [pmcd:2172] Mar 23 12:18:59 web3 kernel: CPU 1: Mar 23 12:19:00 web3 kernel: Modules linked in: ipv6 xfrm_nalgo crypto_api bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy 8139too 8139cp ide_cd i2c_piix4 mii pcspkr serio_raw cdrom i2c_core ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Mar 23 12:19:00 web3 kernel: Pid: 2172, comm: pmcd Not tainted 2.6.18-92.1.6.el5 #1 Mar 23 12:19:01 web3 kernel: RIP: 0010:[<ffffffff80011ec7>] [<ffffffff80011ec7>] __do_softirq+0x53/0xd6 Mar 23 12:19:03 web3 kernel: RSP: 0018:ffff81000276bf60 EFLAGS: 00000206 Mar 23 12:19:04 web3 kernel: RAX: 0000000000000022 RBX: ffffffff803b6f80 RCX: 0000000000000300 Mar 23 12:19:06 web3 kernel: RDX: ffff81006be95fd8 RSI: 0000000000000080 RDI: ffff81006c0887e0 Mar 23 12:19:10 web3 kernel: RBP: ffff81000276bee0 R08: 0000000000000001 R09: 000000000000003f Mar 23 12:19:12 web3 kernel: R10: ffff81007ff0c008 R11: 0000000000000000 R12: ffffffff8005dc8e Mar 23 12:19:14 web3 kernel: R13: 0000000000000022 R14: ffffffff80077090 R15: ffff81000276bee0 Mar 23 12:19:15 web3 kernel: FS: 00002ac4a3c64da0(0000) GS:ffff810002741840(0000) knlGS:0000000000000000 Mar 23 12:19:17 web3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 23 12:19:19 web3 kernel: CR2: 00002ac4a42c5000 CR3: 000000006bdab000 CR4: 00000000000006e0 Mar 23 12:19:19 web3 kernel: Mar 23 12:19:20 web3 kernel: Call Trace: Mar 23 12:19:20 web3 kernel: <IRQ> [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 Mar 23 12:19:20 web3 kernel: [<ffffffff8006c6e4>] do_softirq+0x2c/0x85 Mar 23 12:19:20 web3 kernel: [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c Mar 23 12:19:20 web3 kernel: <EOI> [<ffffffff800a5184>] s_show+0x167/0x293 Mar 23 12:19:21 web3 kernel: [<ffffffff800a509e>] s_show+0x81/0x293 Mar 23 12:19:21 web3 kernel: [<ffffffff8003f209>] seq_read+0x1b8/0x28c Mar 23 12:19:22 web3 kernel: [<ffffffff8000b337>] vfs_read+0xcb/0x171 Mar 23 12:19:22 web3 kernel: [<ffffffff80011715>] sys_read+0x45/0x6e Mar 23 12:19:23 web3 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Mar 23 12:19:24 web3 kernel: Mar 23 12:19:27 web3 kernel: BUG: soft lockup - CPU#0 stuck for 37s! [bond0:1393] Mar 23 12:19:28 web3 kernel: CPU 0: Mar 23 12:19:29 web3 kernel: Modules linked in: ipv6 xfrm_nalgo crypto_api bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy 8139too 8139cp ide_cd i2c_piix4 mii pcspkr serio_raw cdrom i2c_core ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Mar 23 12:19:29 web3 kernel: Pid: 1393, comm: bond0 Not tainted 2.6.18-92.1.6.el5 #1 Mar 23 12:19:29 web3 kernel: RIP: 0010:[<ffffffff80064af8>] [<ffffffff80064af8>] _spin_unlock_irqrestore+0x8/0x9 Mar 23 12:19:30 web3 kernel: RSP: 0018:ffff81007b52fd18 EFLAGS: 00000282 Mar 23 12:19:30 web3 kernel: RAX: 0000000000000000 RBX: ffff81007fbd8000 RCX: 000005e1ffffff10 Mar 23 12:19:32 web3 kernel: RDX: 0000000000000100 RSI: 0000000000000282 RDI: ffff81007fbd8510 Mar 23 12:19:33 web3 kernel: RBP: 0000000000000000 R08: ffff81007b52e000 R09: ffff81007ffdb000 Mar 23 12:19:34 web3 kernel: R10: 0000000000000000 R11: ffffffff8011ce11 R12: 0000000000000000 Mar 23 12:19:35 web3 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Mar 23 12:19:35 web3 kernel: FS: 0000000000000000(0000) GS:ffffffff8039f000(0000) knlGS:0000000000000000 Mar 23 12:19:36 web3 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Mar 23 12:19:37 web3 kernel: CR2: 00002ade80f45fe0 CR3: 0000000020857000 CR4: 00000000000006e0 Mar 23 12:19:39 web3 kernel: Mar 23 12:19:39 web3 kernel: Call Trace: Mar 23 12:19:39 web3 kernel: [<ffffffff88153b51>] :8139cp:cp_get_settings+0x42/0x4e Mar 23 12:19:40 web3 kernel: [<ffffffff8826504a>] :bonding:bond_update_speed_duplex+0x3a/0x109 Mar 23 12:19:41 web3 kernel: [<ffffffff80062f4b>] thread_return+0x0/0xdf Mar 23 12:19:43 web3 kernel: [<ffffffff882668f1>] :bonding:__bond_mii_monitor+0x399/0x444 Mar 23 12:19:44 web3 kernel: [<ffffffff882679c2>] :bonding:bond_mii_monitor+0x0/0x8c Mar 23 12:19:45 web3 kernel: [<ffffffff882679ef>] :bonding:bond_mii_monitor+0x2d/0x8c Mar 23 12:19:46 web3 kernel: [<ffffffff8004cea9>] run_workqueue+0x94/0xe4 Mar 23 12:19:47 web3 kernel: [<ffffffff800497be>] worker_thread+0x0/0x122 Mar 23 12:19:48 web3 kernel: [<ffffffff8009dd45>] keventd_create_kthread+0x0/0xc4 Mar 23 12:19:50 web3 kernel: [<ffffffff800498ae>] worker_thread+0xf0/0x122 Mar 23 12:19:50 web3 kernel: [<ffffffff8008ad76>] default_wake_function+0x0/0xe Mar 23 12:19:51 web3 kernel: [<ffffffff8009dd45>] keventd_create_kthread+0x0/0xc4 Mar 23 12:19:52 web3 kernel: [<ffffffff8009dd45>] keventd_create_kthread+0x0/0xc4 Mar 23 12:19:53 web3 kernel: [<ffffffff8003253d>] kthread+0xfe/0x132 Mar 23 12:19:55 web3 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Mar 23 12:19:56 web3 kernel: [<ffffffff8009dd45>] keventd_create_kthread+0x0/0xc4 Mar 23 12:19:57 web3 kernel: [<ffffffff8003243f>] kthread+0x0/0x132 Mar 23 12:19:59 web3 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 Mar 23 12:20:00 web3 kernel:
Note this is a KVM guest system running on host with: Linux 2.6.18-194.17.4.el5 #1 SMP Wed Oct 20 13:03:08 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux a second server sunning on the host also experienced similar soft lockups in the same period: Mar 23 12:14:38 app23 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [puppetd:2118] Mar 23 12:14:39 app23 kernel: CPU 0: Mar 23 12:14:39 app23 kernel: Modules linked in: nfs lockd fscache nfs_acl sunrpc ipv6 xfrm_nalgo crypto_api bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy ide_cd 8139too i2c_piix4 8139cp i2c_core mii serio_raw pcspkr cdrom ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Mar 23 12:14:39 app23 kernel: Pid: 2118, comm: puppetd Not tainted 2.6.18-92.1.6.el5 #1 Mar 23 12:14:39 app23 kernel: RIP: 0010:[<ffffffff8003d278>] [<ffffffff8003d278>] rt_run_flush+0x61/0xb8 Mar 23 12:14:39 app23 kernel: RSP: 0000:ffffffff80418ef0 EFLAGS: 00010282 Mar 23 12:14:39 app23 kernel: RAX: ffff81007f5a1ff8 RBX: 0000000000000000 RCX: ffff8100023783e0 Mar 23 12:14:39 app23 kernel: RDX: 000000000000000b RSI: ffffffff8022d030 RDI: ffff81007ff14ffc Mar 23 12:14:39 app23 kernel: RBP: ffffffff80418e70 R08: 0000000000001000 R09: 0000000000000000 Mar 23 12:14:39 app23 kernel: R10: ffffffff80418cd0 R11: 0000000000000154 R12: ffffffff8005dc8e Mar 23 12:14:39 app23 kernel: R13: 0000000000000ffc R14: ffffffff80077090 R15: ffffffff80418e70 Mar 23 12:14:51 app23 kernel: FS: 00002b120e8b0f50(0000) GS:ffffffff8039f000(0000) knlGS:0000000000000000 Mar 23 12:14:57 app23 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 23 12:15:05 app23 kernel: CR2: 00002b8e83900000 CR3: 0000000072422000 CR4: 00000000000006e0 Mar 23 12:15:09 app23 kernel: Mar 23 12:15:15 app23 kernel: Call Trace: Mar 23 12:15:21 app23 kernel: <IRQ> [<ffffffff8022ea08>] rt_secret_rebuild+0x0/0x26 Mar 23 12:15:21 app23 kernel: [<ffffffff8022ea17>] rt_secret_rebuild+0xf/0x26 Mar 23 12:15:21 app23 kernel: [<ffffffff800953c2>] run_timer_softirq+0x133/0x1af Mar 23 12:15:22 app23 kernel: [<ffffffff80011ed2>] __do_softirq+0x5e/0xd6 Mar 23 12:15:24 app23 kernel: [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 Mar 23 12:15:25 app23 kernel: [<ffffffff8006c6e4>] do_softirq+0x2c/0x85 Mar 23 12:15:26 app23 kernel: [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c Mar 23 12:15:28 app23 kernel: <EOI> Mar 23 12:15:29 app23 kernel: BUG: soft lockup - CPU#0 stuck for 28s! [puppetd:2118] Mar 23 12:15:29 app23 kernel: CPU 0: Mar 23 12:15:29 app23 kernel: Modules linked in: nfs lockd fscache nfs_acl sunrpc ipv6 xfrm_nalgo crypto_api bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy ide_cd 8139too i2c_piix4 8139cp i2c_core mii serio_raw pcspkr cdrom ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Mar 23 12:15:32 app23 kernel: Pid: 2118, comm: puppetd Not tainted 2.6.18-92.1.6.el5 #1 Mar 23 12:15:35 app23 kernel: RIP: 0010:[<ffffffff80010970>] [<ffffffff80010970>] handle_IRQ_event+0x1b/0x58 Mar 23 12:15:41 app23 kernel: RSP: 0000:ffffffff80418dd8 EFLAGS: 00000246 Mar 23 12:15:41 app23 kernel: RAX: 0000000000010000 RBX: 0000000000000e00 RCX: 000000000000000e Mar 23 12:15:42 app23 kernel: RDX: ffff81007f94b840 RSI: ffffffff80418e48 RDI: 000000000000000e Mar 23 12:15:44 app23 kernel: RBP: ffffffff80418d50 R08: 0000000000001000 R09: 0000000000000000 Mar 23 12:15:45 app23 kernel: R10: ffffffff80418cd0 R11: 0000000000000154 R12: ffffffff8005dc8e Mar 23 12:15:46 app23 kernel: R13: ffff81007f94b840 R14: ffffffff80077090 R15: ffffffff80418d50 Mar 23 12:15:46 app23 kernel: FS: 00002b120e8b0f50(0000) GS:ffffffff8039f000(0000) knlGS:0000000000000000 Mar 23 12:15:47 app23 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 23 12:15:49 app23 kernel: CR2: 00002b8e83900000 CR3: 0000000072422000 CR4: 00000000000006e0 Mar 23 12:15:52 app23 kernel: Mar 23 12:15:52 app23 kernel: Call Trace: Mar 23 12:15:53 app23 kernel: <IRQ> [<ffffffff800b7359>] __do_IRQ+0xa4/0x103 Mar 23 12:15:53 app23 kernel: [<ffffffff8006c567>] do_IRQ+0xe7/0xf5 Mar 23 12:15:54 app23 kernel: [<ffffffff8005d615>] ret_from_intr+0x0/0xa Mar 23 12:15:55 app23 kernel: [<ffffffff8022d030>] dst_rcu_free+0x0/0x3a Mar 23 12:15:56 app23 kernel: [<ffffffff8003d278>] rt_run_flush+0x61/0xb8 Mar 23 12:15:57 app23 kernel: [<ffffffff8022ea08>] rt_secret_rebuild+0x0/0x26 Mar 23 12:15:57 app23 kernel: [<ffffffff8022ea17>] rt_secret_rebuild+0xf/0x26 Mar 23 12:15:57 app23 kernel: [<ffffffff800953c2>] run_timer_softirq+0x133/0x1af Mar 23 12:15:58 app23 kernel: [<ffffffff80011ed2>] __do_softirq+0x5e/0xd6 Mar 23 12:15:59 app23 kernel: [<ffffffff8005e2fc>] call_softirq+0x1c/0x28 Mar 23 12:15:59 app23 kernel: [<ffffffff8006c6e4>] do_softirq+0x2c/0x85 Mar 23 12:16:00 app23 kernel: [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c Mar 23 12:16:02 app23 kernel: <EOI> Mar 23 12:16:11 app23 kernel: BUG: soft lockup - CPU#1 stuck for 28s! [klogd:23683] Mar 23 12:16:15 app23 kernel: CPU 1: Mar 23 12:16:18 app23 kernel: Modules linked in: nfs lockd fscache nfs_acl sunrpc ipv6 xfrm_nalgo crypto_api bonding dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy ide_cd 8139too i2c_piix4 8139cp i2c_core mii serio_raw pcspkr cdrom ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Mar 23 12:16:19 app23 kernel: Pid: 23683, comm: klogd Not tainted 2.6.18-92.1.6.el5 #1 Mar 23 12:16:20 app23 kernel: RIP: 0010:[<ffffffff80062460>] [<ffffffff80062460>] __sched_text_start+0x0/0xaeb Mar 23 12:16:21 app23 kernel: RSP: 0000:ffff81004f25bf50 EFLAGS: 00000202 Mar 23 12:16:22 app23 kernel: RAX: ffff81004f25bf58 RBX: 00002aea430a5520 RCX: 0000000000000000 Mar 23 12:16:25 app23 kernel: RDX: 00000000000000a8 RSI: ffff81000276e000 RDI: ffff81004f25c000 Mar 23 12:16:26 app23 kernel: RBP: 0000000000000004 R08: 000000000000002b R09: 00002aea430c1000 Mar 23 12:16:26 app23 kernel: R10: ffffffffffffffff R11: 00002aea430a5920 R12: ffff81004f25a000 Mar 23 12:16:27 app23 kernel: R13: ffff81004d90e860 R14: 0000000000000000 R15: ffffffff800625df Mar 23 12:16:28 app23 kernel: FS: 00002aea436166e0(0000) GS:ffff810002741840(0000) knlGS:0000000000000000 Mar 23 12:16:30 app23 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 23 12:16:32 app23 kernel: CR2: 000000357f4c8fb0 CR3: 0000000039b0b000 CR4: 00000000000006e0 Mar 23 12:16:33 app23 kernel: Mar 23 12:16:33 app23 kernel: Call Trace: Mar 23 12:16:34 app23 kernel: [<ffffffff80064e15>] paranoid_schedule1+0x6/0x9 Mar 23 12:16:35 app23 kernel:
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Avi, Is this related to https://bugzilla.redhat.com/show_bug.cgi?id=782631?
Could be. Was pinning used in this scenario?
Hi avi, Can not reproduce this issue with kvm-83-251.el5 1. bootup two guests, guestA and guestB (both with 2G mem and 2 vcpus, virtio) 2. pin guestA to physical cpu 2, pin guestB to physical cpu 1,3 3. run attached script in guestA 4. keep rebooting in guestB don't reproduce till now (around 4 hours), keep on testing guest: 2.6.18-308.el5 x86_64 host: 2.6.18-308.el5 kvm-83-251.el5 cmd: /usr/libexec/qemu-kvm -name 308_249 -monitor stdio \ -serial unix:/tmp/serial-shuang,server,nowait \ -drive file=/home/bak.qcow2,index=0,if=virtio,media=disk,cache=none,boot=on,format=qcow2 \ -net nic,vlan=0,model=virtio,macaddr=90:01:8a:76:45:a0 -net tap,vlan=0,script=/home/autotest-devel/client/tests/kvm/scripts/qemu-ifup-switch -net nic,vlan=1,model=virtio,macaddr=90:01:8a:76:45:71 -net tap,vlan=1,script=/home/autotest-devel/client/tests/kvm/scripts/qemu-ifup-switch -m 2048 -smp 2,cores=2,threads=1,sockets=1 -soundhw ac97 -vnc :1 -rtc-td-hack -M rhel5.4.0 -boot c -no-kvm-pit-reinjection -usbdevice tablet -balloon virtio
Created attachment 577980 [details] io-idle-race script
hi avi, get the following call trace when pin two guest vcpu to one physical cpu, and run dd, guest not hang INFO: task syslogd:2264 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syslogd D ffff810001004420 0 2264 1 2267 2250 (NOTLB) ffff81012efb9d98 0000000000000082 ffff81012efb9d18 0000000000000246 0000000000000000 0000000000000009 ffff81013a2be7e0 ffffffff80319b60 0000013804c8a315 0000000000005098 ffff81013a2be9c8 00000000880317be Call Trace: [<ffffffff88036e55>] :jbd:log_wait_commit+0xa3/0xf5 [<ffffffff800a3444>] autoremove_wake_function+0x0/0x2e [<ffffffff8803179a>] :jbd:journal_stop+0x1d3/0x203 [<ffffffff8002f997>] __writeback_single_inode+0x1dd/0x31c [<ffffffff800e3fac>] do_readv_writev+0x26e/0x291 [<ffffffff800f73d2>] sync_inode+0x24/0x33 [<ffffffff8804c37e>] :ext3:ext3_sync_file+0xce/0xf8 [<ffffffff8004fe8d>] do_fsync+0x52/0xa4 [<ffffffff800e4839>] __do_fsync+0x23/0x36 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 INFO: task irqbalance:2347 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. irqbalance D ffff810001004420 0 2347 1 2380 2267 (NOTLB) ffff81012e88fc58 0000000000000082 ffff810000019c10 ffff81012e88fc38 0000000000000000 0000000000000007 ffff81013ad3c0c0 ffffffff80319b60 00000141452226fa 0000000000019fc1 ffff81013ad3c2a8 0000000000000000 Call Trace: [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14 [<ffffffff8000d055>] do_lookup+0xd1/0x227 [<ffffffff8000a29d>] __link_path_walk+0x9f4/0xf39 [<ffffffff8000eb31>] link_path_walk+0x45/0xb8 [<ffffffff8000ce04>] do_path_lookup+0x294/0x310 [<ffffffff80130a4b>] selinux_file_alloc_security+0x2a/0x54 [<ffffffff8002384b>] __path_lookup_intent_open+0x56/0x97 [<ffffffff8001b120>] open_namei+0x73/0x6c0 [<ffffffff8004e22e>] lock_hrtimer_base+0x26/0x4c [<ffffffff80027607>] do_filp_open+0x1c/0x38 [<ffffffff80063cf9>] do_nanosleep+0x47/0x70 [<ffffffff80019fd3>] do_sys_open+0x44/0xbe [<ffffffff8005d28d>] tracesys+0xd5/0xe0 INFO: task pcscd:2598 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. pcscd D ffff810001004420 0 2598 1 2602 2586 (NOTLB) ffff81012e653dd8 0000000000000086 ffffffff00000257 ffffffff00000000 0000000300000001 0000000000000007 ffff81013aa5e820 ffffffff80319b60 0000014326023349 0000000000014d02 ffff81013aa5ea08 000000003aa5e820 Call Trace: [<ffffffff8013057f>] inode_has_perm+0x56/0x63 [<ffffffff80064a1b>] __down+0xc3/0xd8 [<ffffffff8008ee72>] default_wake_function+0x0/0xe [<ffffffff800646d9>] __down_failed+0x35/0x3a [<ffffffff80064cdd>] .text.lock.kernel_lock+0xf/0x1f [<ffffffff80041ebc>] do_ioctl+0x3d/0x6b [<ffffffff8002ff2d>] vfs_ioctl+0x457/0x4b9 [<ffffffff8004c26c>] sys_ioctl+0x59/0x78 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
(In reply to comment #12) > hi avi, > get the following call trace when pin two guest vcpu to one physical cpu, > and run dd, guest not hang > This is with kvm-83-251.el5?
(In reply to comment #13) > (In reply to comment #12) > > hi avi, > > get the following call trace when pin two guest vcpu to one physical cpu, > > and run dd, guest not hang > > > > This is with kvm-83-251.el5? yes
Hi avi, any suggestion for this bug, didn't get call trace as comment0, but get call as comment12
If you pin vcpus, this is somewhat expected. If the host is overloaded, this could happen as well. Maybe we should use steal time to detect that the guest is not running all the time. Copying Rik for an opinion.
120 seconds should not be CPU scheduler induced latency.
(In reply to comment #17) > 120 seconds should not be CPU scheduler induced latency. Suppose we're overcommitted, and smp_calling_function_single() within a spinlock. Without PLE, the IPI latency is magnified to the time slice. I don't think this is the case here (and steal time won't help if it were) though.
The time slice is on the order of 30 milliseconds max. That means a 120 second timeout is at least 3600 full time slices. That is why I doubt that the CPU scheduler is causing this 120 second latency.