652187 – BUG: soft lockup - CPU#1 stuck for 13s after saving internal snapshot

Bug 652187 - BUG: soft lockup - CPU#1 stuck for 13s after saving internal snapshot

Summary: BUG: soft lockup - CPU#1 stuck for 13s after saving internal snapshot

Keywords:
Status:	CLOSED DUPLICATE of bug 583059
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kvm
Sub Component:
Version:	5.5.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Zachary Amsden
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Rhel5KvmTier3
TreeView+	depends on / blocked

Reported:	2010-11-11 10:12 UTC by Shirley Zhou
Modified:	2015-03-05 00:52 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-11-24 16:57:56 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg info (22.33 KB, text/plain) 2010-11-11 10:13 UTC, Shirley Zhou	no flags	Details
dmesg info when guest become paused because of no space/input/output error (12.99 KB, text/plain) 2010-11-15 05:00 UTC, Shirley Zhou	no flags	Details
View All

Description Shirley Zhou 2010-11-11 10:12:57 UTC

Description of problem:
"BUG: soft lockup - CPU#1 stuck for 13s" happens after save internal snapshot.

Version-Release number of selected component (if applicable):
kvm-83-207.el5

How reproducible:
100%

Steps to Reproduce:
1.run rhel5.5.z guest
/usr/libexec/qemu-kvm  -M rhel5.6.0 -m 4G -smp 4 -name RHEL5.5-64 -uuid 123465d2-2032-848d-bda0-de7adb141234 -boot cdn -drive file=/dev/vgtest/lvtest1,if=virtio,boot=on,bus=0,unit=0,format=qcow2,cache=off,werror=stop -net nic,macaddr=54:52:00:27:12:15,vlan=0,model=virtio -net tap,vlan=0,script=/etc/qemu-ifup -serial pty -parallel none -usb -usbdevice tablet   -monitor stdio  -spice host=0,ic=on,port=5937,disable-ticketing -qxl 1

2.when guest boot ok, save internal snapshot from monitor
(qemu)savevm s1

3.after savevm finish, check dmesg,message "BUG: soft lockup - CPU#1 stuck for 13s" shows up, and then loadvm
(qemu)loadvm s1

4.catch dmesg info

  
Actual results:
There are "BUG: soft lockup - CPU#1 stuck for 13s" shows up, please see attached file for detail dmesg info.
BUG: soft lockup - CPU#2 stuck for 25s! [swapper:0]
CPU 2:
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport floppy joydev serio_raw ide_cd virtio_net virtio_balloon i2c_piix4 cdrom i2c_core pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod virtio_blk virtio_pci virtio_ring virtio ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-194.26.1.el5 #1
RIP: 0010:[<ffffffff80064b50>]  [<ffffffff80064b50>] _spin_unlock_irqrestore+0x8/0x9
RSP: 0018:ffff81010476be00  EFLAGS: 00000292
RAX: 0000000000000236 RBX: ffff81013f2bb5c0 RCX: 000000000000000c
RDX: 0000000000000060 RSI: 0000000000000292 RDI: ffffffff80348e58
RBP: ffff81010476bd80 R08: 0000000000000003 R09: ffff810104767e48
R10: 0000000000000001 R11: 0000000000000080 R12: ffffffff8005dc8e
R13: 000000000000001d R14: ffffffff80078225 R15: ffff81010476bd80
FS:  00002b4790fcd1f0(0000) GS:ffff81010471cec0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b4a0cf43090 CR3: 0000000128e37000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff8020a34f>] i8042_interrupt+0x92/0x1e9
 [<ffffffff80010c3a>] handle_IRQ_event+0x51/0xa6
 [<ffffffff800bafae>] __do_IRQ+0xa4/0x103
 [<ffffffff8006ca0d>] do_IRQ+0xe7/0xf5
 [<ffffffff8005d615>] ret_from_intr+0x0/0xa
 [<ffffffff8001240b>] __do_softirq+0x51/0x133
 [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006cb8a>] do_softirq+0x2c/0x85
 [<ffffffff8006b342>] default_idle+0x0/0x50
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff8006b36b>] default_idle+0x29/0x50
 [<ffffffff8004923a>] cpu_idle+0x95/0xb8
 [<ffffffff80077991>] start_secondary+0x498/0x4a7


Expected results:
No this call trace shows.

Additional info:

Comment 1 Shirley Zhou 2010-11-11 10:13:30 UTC

Created attachment 459713 [details]
dmesg info

Comment 2 Shirley Zhou 2010-11-15 04:59:53 UTC

This issue also exist when guest become paused because of no space/input/output error. Attach dmesg info for reference.

Comment 3 Shirley Zhou 2010-11-15 05:00:36 UTC

Created attachment 460470 [details]
dmesg info when guest become paused because of no space/input/output error

Comment 5 Zachary Amsden 2010-11-19 00:40:12 UTC

I don't think this is a bug.

Yes, the CPU stops when you pause the guest, and doesn't get interrupts.

Looks like we just need Glauber's patches to avoid softlockup warnings on the 5.5z guest.  I don't think I have bug privs to do this, but the patches should already be in 5.7.

Comment 6 Glauber Costa 2010-11-19 11:00:36 UTC

Zach,

Unless I am understanding something wrong, the softlockup happens after the savevm, but before loadvm. I'd agree it is not a bug if we were stopped for a while, then resumed.

Just issuing a savevm does not sound like a reason for a softlockup, so I am assuming it is a bug.

Could the reporter clarify ?

Comment 7 Shirley Zhou 2010-11-19 11:48:02 UTC

(In reply to comment #6)
> Zach,
> 
> Unless I am understanding something wrong, the softlockup happens after the
> savevm, but before loadvm. I'd agree it is not a bug if we were stopped for a
> while, then resumed.
> 
> Just issuing a savevm does not sound like a reason for a softlockup, so I am
> assuming it is a bug.
> 
> Could the reporter clarify ?

Softlockup happens after savevm (before loadvm), and more softlockup happens after loadvm.

Comment 8 Zachary Amsden 2010-11-24 16:57:56 UTC

This has been reported on the same kernel version previously and is now verified.

*** This bug has been marked as a duplicate of bug 583059 ***

Note You need to log in before you can comment on or make changes to this bug.