Bug 568678 - Guest could not resume from s4
Summary: Guest could not resume from s4
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kvm
Version: 5.5
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Gleb Natapov
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: Rhel5KvmTier2
TreeView+ depends on / blocked
 
Reported: 2010-02-26 10:52 UTC by Amos Kong
Modified: 2015-05-25 00:05 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-05-17 05:50:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
snapshot (30.38 KB, application/x-gzip)
2010-02-26 10:52 UTC, Amos Kong
no flags Details

Description Amos Kong 2010-02-26 10:52:30 UTC
Created attachment 396509 [details]
snapshot

Description of problem:
Boot up a VM and suspend to disk. When resume from s4, guest becomes dead.
The same problem exists on rhel3.9,win7,rhel5.4 guests.

Version-Release number of selected component (if applicable):
(host)# rpm -qa |grep kvm
etherboot-zroms-kvm-5.4.4-13.el5
kvm-83-160.el5
kvm-qemu-img-83-160.el5
kmod-kvm-83-160.el5
kvm-tools-83-160.el5
kvm-debuginfo-83-160.el5

How reproducible:
Can reproduce 100%

Steps to Reproduce:
1. Boot up a VM
2. Suspend guest to disk
3. Resume guest
  
Actual results:
Guest could not resume from s4.

Expected results:
Guest can resumed from s4 successfully.


Additional info:

Commandline of VM:
# qemu-kvm -drive file=/tmp/kvm_autotest_root/images/RHEL-Server-5.4-64.qcow2,if=ide,boot=on  -net nic,vlan=0,model=rtl8139,macaddr=00:19:B7:5E:9A:00 -net tap,vlan=0,ifname=rtl8139_0_6001,script=/etc/qemu-ifup-switch -m 2048 -vnc :0

(guest) # uname -a
Linux localhost.localdomain 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:04 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

(host)# uname -a
Linux localhost.localdomain.englab.nay.redhat.com 2.6.18-189.el5 #1 SMP Tue Feb 16 11:10:22 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment 1 Gleb Natapov 2010-05-11 14:39:04 UTC
Is this from autotest report? Have this been verified manually? win7 and rhel5.4
both work for me. rhel3.9 I am not even sure supports hibernate to disk. Retest _manually_.

Comment 2 Amos Kong 2010-05-14 05:10:21 UTC
(In reply to comment #1)
> Is this from autotest report? Have this been verified manually? win7 and
> rhel5.4
> both work for me. rhel3.9 I am not even sure supports hibernate to disk. Retest
> _manually_.    

Hello gleb, it was found by autotest, but I verified it in manual before reported this bug.

I've re-tested manually with rhel54 guest, bug can be reproduced.

host kernel: 2.6.18-189.el5
guest kernel: 2.6.18-164.2.1.el5

Command line:
qemu-kvm -name 'vm1' -monitor tcp:0:6001,server,nowait -drive file=./RHEL-Server-5.4-64.qcow2,if=ide,cache=none,boot=on -net nic,vlan=0,model=e1000,macaddr=00:FF:B9:FE:01:77 -net tap,vlan=0,ifname=e1000_0_6001,script=/etc/qemu-ifup-switch,downscript=no -m 512 -smp 1 -soundhw ac97 -usbdevice tablet -rtc-td-hack -no-hpet -cpu qemu64,+sse2 -no-kvm-pit-reinjection -redir tcp:5000::22 -vnc :0 -serial unix:/tmp/serial-20100514-124115-Tay4,server,nowait

Output of serial:
# nc -U /tmp/serial-20100514-124115-Tay4
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
type=1404 audit(1273813267.281:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
type=1403 audit(1273813267.555:3): policy loaded auid=4294967295 ses=4294967295
hdc: drive_cmd: status=0x41 { DriveReady Error }
hdc: drive_cmd: error=0x04 { AbortedCommand }
ide: failed opcode was: 0xec
mtrr: type mismatch for c2000000,100000 old: uncachable new: write-combining
mtrr: type mismatch for c2000000,400000 old: uncachable new: write-combining
Disabling non-boot CPUs ...
Stopping tasks: ===========================================================================================================|
Shrinking memory... done (54236 pages freed)
pci_set_power_state(): 0000:00:05.0: state=3, current state=5
swsusp: Need to copy 60889 pages
swsusp: critical section/: done (60889 pages copied)
PCI: Enabling device 0000:00:01.2 (0000 -> 0001)
PCI: Enabling device 0000:00:04.0 (0000 -> 0001)
pnp: Failed to activate device 00:02.
pnp: Failed to activate device 00:03.
pnp: Failed to activate device 00:05.
pnp: Failed to activate device 00:06.
Saving image data pages (61008 pages) ... done
Wrote 244032 kbytes in 7.25 seconds (33.65 MB/s)
S|
Shutdown: hda
Power down.
acpi_power_off called

---------------------------------------------------------------------------

[root@intel-i7-12-3 ~]# nc -U /tmp/serial-20100514-124115-Tay4
Linux version 2.6.18-164.2.1.el5 (mockbuild.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Mon Sep 21 04:37:42 EDT 2009
Command line: ro root=/dev/VolGroup00/LogVol00 rhgb console=ttyS0,115200 console=tty0 
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000010000 - 000000000009f000 (usable)
 BIOS-e820: 000000000009f000 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001fff0000 (usable)
 BIOS-e820: 000000001fff0000 - 0000000020000000 (ACPI data)
 BIOS-e820: 00000000c0000000 - 00000000c1000000 (reserved)
 BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
DMI 2.4 present.
kvm-clock: cpu 0, msr 7eff:80433401, boot clock
No NUMA configuration found
Faking a node at 0000000000000000-000000001fff0000
Bootmem setup node 0 0000000000000000-000000001fff0000
Memory for crash kernel (0x0 to 0x0) notwithin permissible range
disabling kdump
ACPI: PM-Timer IO Port: 0xb008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:6 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] disabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] disabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] disabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] disabled)
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] disabled)
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] disabled)
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] disabled)
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] disabled)
ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] disabled)
ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] disabled)
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] disabled)
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] disabled)
ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] disabled)
ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] disabled)
ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
Setting APIC routing to physical flat
Using ACPI (MADT) for SMP configuration information
Nosave address range: 000000000009f000 - 00000000000a0000
Nosave address range: 00000000000a0000 - 00000000000e8000
Nosave address range: 00000000000e8000 - 0000000000100000
Allocating PCI resources starting at 30000000 (gap: 20000000:a0000000)
SMP: Allowing 16 CPUs, 15 hotplug CPUs
kvm-clock: cpu 0, msr 0:1035401, primary cpu clock
Built 1 zonelists.  Total pages: 127865
Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb console=ttyS0,115200 console=tty0 
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 16384 bytes)
kvm_get_tsc_khz: cpu 0, msr 0:10be001
time.c: Using tsc for timekeeping HZ 1000
Console: colour VGA+ 80x25
Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Checking aperture...
ACPI: DMAR not present
Memory: 506560k/524224k available (2548k kernel code, 17212k reserved, 1292k data, 208k init)
Calibrating delay loop (skipped), value calculated using timer frequency.. 5320.13 BogoMIPS (lpj=2660068)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256
, L1 D cache: 32K
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
Using local APIC timer interrupts.
WARNING calibrate_APIC_clock: the APIC timer calibration may be wrong.
Detected 62.503 MHz APIC timer.
Brought up 1 CPUs
testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (0->0)!
time.c: Using 1.193182 MHz WALL KVM GTOD KVM timer.
time.c: Detected 2660.068 MHz processor.
checking if image is initramfs... it is
Freeing initrd memory: 3279k freed
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI quirk: region b000-b03f claimed by PIIX4 ACPI
PCI quirk: region b100-b10f claimed by PIIX4 SMB
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 7 devices
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
ACPI: DMAR not present
PCI-GART: No AMD northbridge found.
NET: Registered protocol family 2
IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
TCP established hash table entries: 16384 (order: 6, 262144 bytes)
TCP bind hash table entries: 8192 (order: 5, 131072 bytes)
TCP: Hash tables configured (established 16384 bind 8192)
TCP reno registered
audit: initializing netlink socket (disabled)
type=2000 audit(1273813521.619:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
Initializing Cryptographic API
alg: No test for crc32c (crc32c-generic)
ksign: Installing public key data
Loading keyring
- Added public key 44A9ABA9643110BD
- User ID: Red Hat, Inc. (Kernel Module GPG key)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
Limiting direct PCI/PCI transfers.
PCI: PIIX3: Enabling Passive Release on 0000:00:01.0
Activating ISA DMA hang workarounds.
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Real Time Clock Driver v1.12ac
Non-volatile memory driver v1.2
Linux agpgart interface v0.101 (c) Dave Jones
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
�serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
brd: module loaded
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX3: IDE controller at PCI slot 0000:00:01.1
PIIX3: chipset revision 0
PIIX3: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xc000-0xc007, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xc008-0xc00f, BIOS settings: hdc:pio, hdd:pio
hda: QEMU HARDDISK, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: QEMU DVD-ROM, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 512KiB
hda: 41943040 sectors (21474 MB) w/256KiB Cache, CHS=16383/255/63, (U)DMA
hda: cache flushes supported
 hda: hda1 hda2
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
TCP bic registered
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
ACPI: (supports S3 S4 S5)
Initalizing network drop monitor service
Freeing unused kernel memory: 208k freed
Write protecting the kernel read-only data: 497k
input: AT Translated Set 2 keyboard as /class/input/input0
input: ImExPS/2 Generic Explorer Mouse as /class/input/input1
USB Universal Host Controller Interface driver v3.0
ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
ACPI: PCI Interrupt 0000:00:01.2[D] -> Link [LNKD] -> GSI 11 (level, high) -> IRQ 11
uhci_hcd 0000:00:01.2: UHCI Host Controller
uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1
uhci_hcd 0000:00:01.2: irq 11, io base 0x0000c020
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
SCSI subsystem initialized
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel
device-mapper: dm-raid45: initialized v0.2594l
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
ACPI: PCI Interrupt 0000:00:05.0[A] -> Link [LNKA] -> GSI 10 (level, high) -> IRQ 10
usb 1-2: new full speed USB device using uhci_hcd and address 2
usb 1-2: configuration #1 chosen from 1 choice
input: QEMU 0.9.1 QEMU USB Tablet as /class/input/input2
input: USB HID v0.01 Pointer [QEMU 0.9.1 QEMU USB Tablet] on usb-0000:00:01.2-2
Attempting manual resume
Disabling non-boot CPUs ...
Stopping tasks: ======|
Shrinking memory... done (0 pages freed)
Loading image data pages (61008 pages) ... done
Read 244032 kbytes in 5.39 seconds (45.27 MB/s)
pci_set_power_state(): 0000:00:05.0: state=3, current state=5
ACPI: PCI interrupt for device 0000:00:01.2 disabled
BUG: soft lockup - CPU#0 stuck for 600s! [bash:2678]
CPU 0:
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api lp floppy joydev snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 e1000 i2c_core ide_cd parport_pc cdrom parport pcspkr serio_raw virtio_net virtio_blk virtio_pci virtio_ring virtio dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2678, comm: bash Not tainted 2.6.18-164.2.1.el5 #1
RIP: 0010:[<ffffffff80012322>]  [<ffffffff80012322>] __do_softirq+0x51/0x133
RSP: 0018:ffffffff8043df60  EFLAGS: 00000206
RAX: 0000000000000002 RBX: 0000000000000002 RCX: ffffffff8005e2fc
RDX: ffff810016bd9fd8 RSI: 0000000000000080 RDI: ffff8100045427e0
RBP: ffffffff8043dee0 R08: 0000000000000001 R09: 000000000000003f
R10: ffff81001fc10008 R11: 0000000000000050 R12: ffffffff8005dc8e
R13: 0000000000000046 R14: ffffffff80077874 R15: ffffffff8043dee0
FS:  00002ae1796c7dd0(0000) GS:ffffffff803c1000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000002c70036 CR3: 0000000016481000 CR4: 00000000000006e0

Call Trace:
 <IRQ>  [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006cb20>] do_softirq+0x2c/0x85
 [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff800a86ba>] swsusp_suspend+0x4f/0x51
 [<ffffffff800a86b7>] swsusp_suspend+0x4c/0x51
 [<ffffffff800a8afd>] pm_suspend_disk+0x42/0xce
 [<ffffffff800a79e7>] enter_state+0x5e/0x19b
 [<ffffffff800a7b93>] state_store+0x5e/0x79
 [<ffffffff8010ac88>] sysfs_write_file+0xb9/0xe8
 [<ffffffff80016927>] vfs_write+0xce/0x174
 [<ffffffff800171df>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Comment 3 Gleb Natapov 2010-05-14 07:07:58 UTC
Update your guest kernel. Old 2.6.18 kernels were known to have bugs in PM area. If suspend/resume does not work for you in some kernel version check exactly this kernel version with real HW first.

Comment 4 Amos Kong 2010-05-17 05:42:00 UTC
Hello gleb,

I installed rhel54(kernel: 2.6.18-164.2.1.el5) on my real HW, s4 works on it.

----

I also tested with latest guest kernel(both rhel55 and rhel54) for 10 times.
host kernel: 2.6.18-189.el5
# rpm -qa |grep kvm
etherboot-zroms-kvm-5.4.4-12.el5
etherboot-zroms-kvm-5.4.4-10.el5
kvm-qemu-img-83-160.el5
etherboot-zroms-kvm-5.4.4-13.el5
kvm-debuginfo-83-160.el5
kmod-kvm-83-160.el5
kvm-tools-83-160.el5
kvm-83-160.el5


PASS <- rhel55 guest kernel: 2.6.18-196.el5
PASS <- rhel54 guest kernel: 2.6.18-164.18.1.el5

Comment 5 Gleb Natapov 2010-05-17 05:50:00 UTC
(In reply to comment #4)
> Hello gleb,
> 
> I installed rhel54(kernel: 2.6.18-164.2.1.el5) on my real HW, s4 works on it.
Outcome of s4 may depend on specific HW. On real host HW is very different from
virtual one.

> 
> ----
> 
> I also tested with latest guest kernel(both rhel55 and rhel54) for 10 times.
> host kernel: 2.6.18-189.el5
> # rpm -qa |grep kvm
> etherboot-zroms-kvm-5.4.4-12.el5
> etherboot-zroms-kvm-5.4.4-10.el5
> kvm-qemu-img-83-160.el5
> etherboot-zroms-kvm-5.4.4-13.el5
> kvm-debuginfo-83-160.el5
> kmod-kvm-83-160.el5
> kvm-tools-83-160.el5
> kvm-83-160.el5
> 
> 
> PASS <- rhel55 guest kernel: 2.6.18-196.el5
> PASS <- rhel54 guest kernel: 2.6.18-164.18.1.el5    

I am closing the bug then.


Note You need to log in before you can comment on or make changes to this bug.