Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 607443

Summary: soft lockup inside rhel5 guest
Product: Red Hat Enterprise Linux 5 Reporter: Qian Cai <qcai>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED ERRATA QA Contact: Qian Cai <qcai>
Severity: high Docs Contact:
Priority: medium    
Version: 5.5CC: amit.shah, gcosta, mkenneth, quintela, tburke, virt-maint, ypu
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 21:39:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 580954    
Attachments:
Description Flags
guest xml
none
dmesg none

Description Qian Cai 2010-06-24 06:09:21 UTC
Created attachment 426457 [details]
guest xml

Description of problem:
I have noticed that the rhel5 guest is pretty easy to soft lockup under the situations that querying rpm database and install big rpm files like kernel-debugionfo. I have noticed there are other reports mentioned the similar trace,
https://bugzilla.redhat.com/show_bug.cgi?id=580865#c8

# rpm -ql kudzu
BUG: soft lockup - CPU#0 stuck for 19s! [kjournald:398]
CPU 0:
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_piix4 serio_raw i2c_core pcspkr e1000 snd_timer snd virtio_balloon soundcore snd_page_alloc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod virtio_blk virtio_pci virtio_ring virtio ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 398, comm: kjournald Not tainted 2.6.18-194.el5 #1
RIP: 0010:[<ffffffff800123d1>]  [<ffffffff800123d1>] __do_softirq+0x51/0x133
RSP: 0000:ffffffff80448f60  EFLAGS: 00000206
RAX: 0000000000000042 RBX: 0000000000000042 RCX: 0000000000000206
RDX: ffff81003dd9bfd8 RSI: 0000000000000080 RDI: ffff81003f8b97a0
RBP: ffffffff80448ee0 R08: 0000000000000002 R09: ffffffff8005f2fc
R10: 0000000000000001 R11: ffff81003dd9bbc8 R12: ffffffff8005ec8e
R13: 0000000000000046 R14: ffffffff8007922b R15: ffffffff80448ee0
FS:  0000000000000000(0000) GS:ffffffff803cb000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000354ceb6c90 CR3: 0000000024400000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff8005f2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006dba8>] do_softirq+0x2c/0x85
 [<ffffffff8005ec8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff800133f8>] get_request+0x16e/0x34c
 [<ffffffff80028bbd>] get_request_wait+0x21/0x11f
 [<ffffffff8804df25>] :ext3:ext3_get_branch+0x7a/0xd2
 [<ffffffff8000c044>] __make_request+0x33d/0x401
 [<ffffffff8005c6c9>] cache_alloc_refill+0x106/0x186
 [<ffffffff8001c211>] generic_make_request+0x211/0x228
 [<ffffffff881283f2>] :dm_mod:__map_bio+0x4e/0x125
 [<ffffffff88128f39>] :dm_mod:__split_bio+0x176/0x3b0
 [<ffffffff8812994d>] :dm_mod:dm_request+0x115/0x124
 [<ffffffff8001c211>] generic_make_request+0x211/0x228
 [<ffffffff800231ce>] mempool_alloc+0x31/0xe7
 [<ffffffff880312e8>] :jbd:__journal_file_buffer+0x13e/0x243
 [<ffffffff800336ac>] submit_bio+0xe4/0xeb
 [<ffffffff8001a95b>] submit_bh+0xf1/0x111
 [<ffffffff88033d53>] :jbd:journal_commit_transaction+0x8f1/0x1066
 [<ffffffff8003ddd5>] lock_timer_base+0x1b/0x3c
 [<ffffffff880375d3>] :jbd:kjournald+0xc1/0x213
 [<ffffffff800a1ba4>] autoremove_wake_function+0x0/0x2e
 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff88037512>] :jbd:kjournald+0x0/0x213
 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032bdc>] kthread+0xfe/0x132
 [<ffffffff8005efb1>] child_rip+0xa/0x11
 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032ade>] kthread+0x0/0x132
 [<ffffffff8005efa7>] child_rip+0x0/0x11

BUG: soft lockup - CPU#0 stuck for 19s! [swapper:0]
CPU 0:
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_piix4 serio_raw i2c_core pcspkr e1000 snd_timer snd virtio_balloon soundcore snd_page_alloc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod virtio_blk virtio_pci virtio_ring virtio ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-194.el5 #1
RIP: 0010:[<ffffffff8006c389>]  [<ffffffff8006c389>] default_idle+0x29/0x50
RSP: 0018:ffffffff803fdf90  EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000090000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff8030b718
RBP: ffffffff80309d50 R08: ffffffff803fc000 R09: 000000000000003e
R10: ffff810009770038 R11: 0000000000000280 R12: 00000000007d1dbf
R13: 0000008f1dc44638 R14: ffff81003f8b97a0 R15: ffffffff80309b60
FS:  0000000000000000(0000) GS:ffffffff803cb000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006b54b4 CR3: 000000002ec83000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff800497be>] cpu_idle+0x95/0xb8
 [<ffffffff80407807>] start_kernel+0x220/0x225
 [<ffffffff8040722f>] _sinittext+0x22f/0x236

/etc/rc.d/init.dBUG: soft lockup - CPU#1 stuck for 19s! [swapper:0]
CPU 1:
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_piix4 serio_raw i2c_core pcspkr e1000 snd_timer snd virtio_balloon soundcore snd_page_alloc dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod virtio_blk virtio_pci virtio_ring virtio ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-194.el5 #1
RIP: 0010:[<ffffffff8006c389>]  [<ffffffff8006c389>] default_idle+0x29/0x50
RSP: 0018:ffff810009749ef0  EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff8030b718
RBP: ffff81003ffa62f0 R08: ffff810009748000 R09: 000000000000003f
R10: ffff810009770008 R11: ffff81003f19c800 R12: 000000000274df33
R13: 0000008f1dc4d117 R14: ffff810037fee080 R15: ffff81003ffa6100
FS:  0000000000000000(0000) GS:ffff81003ff8e7c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000354a2c4260 CR3: 0000000032536000 CR4: 00000000000006e0

Call Trace:
 [<ffffffff800497be>] cpu_idle+0x95/0xb8
 [<ffffffff80078997>] start_secondary+0x498/0x4a7

Version-Release number of selected component (if applicable):
Host:
kernel-2.6.32-37.el6.x86_64
qemu-kvm-0.12.1.2-2.78.el6.x86_64
libvirt-0.8.1-9.el6.x86_64

Guest:
rhel5.5 x86_64 GA

How reproducible:
usually

Steps to Reproduce:
1. install kernel-debuginfo under the rhel5.5 guest

kernel-debuginfo-2.6.18-202.el5.x86_64.rpm
kernel-debuginfo-2.6.18-203.el5.x86_64.rpm
kernel-debuginfo-common-2.6.18-202.el5.x86_64.rpm
kernel-debuginfo-common-2.6.18-203.el5.x86_64.rpm

Comment 1 Qian Cai 2010-06-24 06:10:03 UTC
The system is an x200 laptop.

Comment 2 Juan Quintela 2010-06-24 09:22:16 UTC
can you try disabling the watch dog in the guest and/or cache=none?

Thanks, Juan.

Comment 3 Qian Cai 2010-06-24 10:18:14 UTC
I can't reproduce the problem again with nmi_watchdog=0 for the guest.

Comment 4 RHEL Program Management 2010-06-27 13:02:57 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 5 Dor Laor 2010-06-29 13:22:58 UTC
Glauber, it seems like not a bug to me. Wanna make sure and close it?

Comment 6 Glauber Costa 2010-06-29 13:35:59 UTC
I am not sure.

with nmi_watchdog=0, it is pretty obvious that the watchdog won't fire.
I am watching at another time problem at the moment, that may be related to it.
Once I clarify, I'll take some action here.

Meanwhile, can you please post your guest dmesg? I am particularly interested in which time source you are using.

Comment 7 Qian Cai 2010-06-29 13:49:42 UTC
Created attachment 427685 [details]
dmesg

Comment 10 Glauber Costa 2010-07-01 18:32:07 UTC
Guest issue, moving component.

Comment 13 RHEL Program Management 2010-07-05 09:48:36 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 16 Jarod Wilson 2010-07-12 15:47:04 UTC
in kernel-2.6.18-206.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 18 Qian Cai 2010-07-20 16:41:16 UTC
*** Bug 616368 has been marked as a duplicate of this bug. ***

Comment 21 errata-xmlrpc 2011-01-13 21:39:02 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html