Bug 1129593

Summary: Guest can't poweroff after finishing installation
Product: Red Hat Enterprise Linux 7 Reporter: Shanzhi Yu <shyu>
Component: qemu-kvm-rhevAssignee: Luiz Capitulino <lcapitulino>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: dyuan, hhuang, jen, juzhang, lcapitulino, mazhang, mrezanin, mst, mzhan, ovasik, shyu, virt-maint, weizhan, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.1.2-7.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 09:54:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shanzhi Yu 2014-08-13 09:54:12 UTC
Description of problem:

Guest can't poweroff after finishing installation

Version-Release number of selected component (if applicable):

3.10.0-142.el7.x86_64
libvirt-1.2.7-1.el7.x86_64
qemu-kvm-rhev-2.1.0-1.el7.x86_64

How reproducible:

100%

Steps to Reproduce:

1 Download initrd.img and vmlinuz with from latest-RHEL-7 tree.

2 Create raw format file as guest source file

3 install guest with ks file

4 Wait for guest finish installation, the guest should shutdown while it hang with some info like 
"
..
Not all DM device detached, 1 left.
Cannot finalize remaining file systems and devices, giving up.
Storage is finalized.
Successfully changed into root pivot.
Returning to initrd...
dracut waring: Killing all remaining processes
Powering off"



Easy reproduce this with below script

# cat test.sh

#!/bin/bash

rm -fr /var/lib/libvirt/boot/*
wget
http://download.englab.nay.redhat.com/pub/rhel/rel-eng/latest-RHEL-7/compose/Server/x86_64/os/images/pxeboot/initrd.img
http://download.englab.nay.redhat.com/pub/rhel/rel-eng/latest-RHEL-7/compose/Server/x86_64/os/images/pxeboot/vmlinuz
-P /var/lib/libvirt/boot/

qemu-img create /var/lib/libvirt/images/rhel7.img
chown qemu:qemu /var/lib/libvirt/images/rhel7.img

virsh create /dev/stdin <<EOF
<domain type='kvm'>
   <name>rhel7</name>
   <memory unit='KiB'>2097152</memory>
   <currentMemory unit='KiB'>2097152</currentMemory>
   <vcpu placement='static'>1</vcpu>
   <resource>
     <partition>/machine</partition>
   </resource>
   <os>
     <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
     <kernel>/var/lib/libvirt/boot/vmlinuz</kernel>
     <initrd>/var/lib/libvirt/boot/initrd.img</initrd>
<cmdline>method=http://download.englab.nay.redhat.com/pub/rhel/rel-eng/latest-RHEL-7/compose/Server/x86_64/os
ks=http://fileshare.englab.nay.redhat.com/pub/section3/run/http-ks/ks-rhel7-x86_64.cfg
</cmdline>
     <boot dev='hd'/>
   </os>
   <features>
     <acpi/>
     <apic/>
     <pae/>
   </features>
   <clock offset='utc'/>
   <on_poweroff>destroy</on_poweroff>
   <on_reboot>restart</on_reboot>
   <on_crash>restart</on_crash>
   <devices>
     <emulator>/usr/libexec/qemu-kvm</emulator>
     <disk type='file' device='disk'>
       <driver name='qemu' type='raw'/>
       <source file='/var/lib/libvirt/images/rhel7.img'/>
       <target dev='vda' bus='virtio'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
function='0x0'/>
     </disk>
       <interface type='network'>
       <mac address='54:52:00:45:c3:8a'/>
       <source network='default'/>
       <model type='virtio'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/>
     </interface>
     <input type='mouse' bus='ps2'/>
     <input type='keyboard' bus='ps2'/>
     <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
   </devices>
</domain>
EOF

Actual results:

Guest should can poweroff after finishing installation

Expected results:

Guest hang after finishing installation

Additional info:

I can't reprodcue this with qemu-kvm-rhev-1.5.3-60.el7ev_0.5.x86_64 in same ENV

Comment 2 Luiz Capitulino 2014-08-19 20:39:12 UTC
I'm trying to reproduce this but it just works for me.

A small difference with my procedure is that I've dropped the ks= option from the kernel command line because your kickstart script is not working for me: I get into the main installation screen and have to perform the installation manually. Because of that, I also have to power off the guest manually when the installation finishes by doing:

1. Switch the guest to shell by entering (from the host):

# virsh send-key rhel7 KEY_LEFTCTRL KEY_LEFTALT KEY_F2

2. In the guest I enter:

# poweroff

And it works.

Now, our qemu-kvm-rhev differs. I'm using: qemu-kvm-rhev-2.1.0-2.el7.x86_64. Can you try this one too and avoid using the kickstart script too just in case?

Comment 5 FuXiangChun 2014-08-20 10:49:53 UTC
according to comment 1, I tested qemu-kvm-rhev-2.1.0-2.el7.x86_64.

result:
didn't hit this issue. qemu-kvm got correct guest status.

(qemu) sendkey alt-ctrl-f2
(qemu) info status
VM status: paused (shutdown)

Comment 6 Shanzhi Yu 2014-08-20 13:08:12 UTC
(In reply to Luiz Capitulino from comment #2)
> I'm trying to reproduce this but it just works for me.
> 
> A small difference with my procedure is that I've dropped the ks= option
> from the kernel command line because your kickstart script is not working
> for me: I get into the main installation screen and have to perform the
> installation manually. Because of that, I also have to power off the guest
> manually when the installation finishes by doing:
> 
> 1. Switch the guest to shell by entering (from the host):
> 
> # virsh send-key rhel7 KEY_LEFTCTRL KEY_LEFTALT KEY_F2
> 
> 2. In the guest I enter:
> 
> # poweroff
> 
> And it works.
> 
> Now, our qemu-kvm-rhev differs. I'm using: qemu-kvm-rhev-2.1.0-2.el7.x86_64.
> Can you try this one too and avoid using the kickstart script too just in
> case?

Hi 

I make a test with qemu-kvm-rhev-2.1.0-2.el7.x86_64, It really work properly.
If I don't use ks=, then install guest by manaul, I can also power off the guest.
I really want to know where is the problem.
I attach the ks file as attachment so that you can have a try(If you have instrest)  with "ks=/path/to/ks.file" in domain xml.

Comment 7 Luiz Capitulino 2014-08-20 17:34:07 UTC
I was able to reproduce the problem with qemu-kvm-rhev-2.1.0-1.el7 and started investigating it, here goes my initial findings:

1. FuXiangChun is right that the problem is that the guest status is "running" after poweroff (it should be "paused). It's more than that actually, QEMU does not send QMP events SHUTDOWN nor STOP when the bug triggers. That's why libvirt is unable to kill the guest after poweroff

2. qemu-kvm-rhev-2.1.0-2.el7 doesn't trig the problem. I bisected it to find the fix and it's the patch for bug 1118665 that makes the bug go away. Problem is, I asked Markus (the author of the fix) and he wouldn't expect his patch to fix this bug. Maybe he's patch is only workarounding the issue, in this case the bug still exists

3. Looks like I can reproduce with upstream (HEAD: 2656eb7c)

4. You don't need to do the installation to reproduce the issue. Just switch to the shell after the kernel boots and type poweroff

My current thinking is that this is a regression which is workarounded by the fix for bug 1118665, but I have a number of things to check. One of them is whether or not I can reproduce without the -kernel option, I'm not sure they are supported.

Comment 8 Luiz Capitulino 2014-08-20 20:21:59 UTC
OK, I found the problem. Here's my upstream reproducer:

1. Download initrd.img and vmlinuz from  a fedora mirror (eg. http://fedora.mirror.lstn.net/releases/20/Fedora/x86_64/os/images/pxeboot/)

2. Run QEMU with:

# qemu -enable-kvm -m 1024 -no-shutdown \
    -boot strict=on -kernel /home/lcapitulino/files/vmlinuz \
    -initrd /home/lcapitulino/files/initrd.img \
    -append method=http://fedora.mirror.lstn.net/releases/20/Fedora/x86_64/os/  \
    -drive file=/var/lib/libvirt/images/rhel7.img,if=virtio \
    -monitor stdio -vnc :0 -qmp tcp:0:4444,nowait,server

3. After the kernel boots, switch to to the installer shell:

(qemu) sendkey ctrl-alt-f2

4. Check if the kernel's ACPI interpreter has been enabled

# dmesg | grep -i interpreter
[    0.148843] ACPI: Interpreter disabled

As the ACPI support in the guest kernel is not functional, we'll never get QMP events on poweroff and the "info status" won't change either (which breaks libvirt).

The ACPI Interpreter is disabled in the guest because something goes bad during ACPI initialization in the guest kernel:

[    0.000000] ACPI: uC\xffffffcdT 000000003ffe1854 2009587B (v49 \xffffffb2?a\xffffffdf?? \xffffffca\xffffffe2???\xffffffa6K\xfffffffc 4CFA21C8 \xffffffa1\xfffffff5\xffffffa5[ 2AA2CAF0)
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:536 __early_ioremap+0x12b/0x1ce()
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.11.10-301.fc20.x86_64 #1
[    0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    0.000000]  0000000000000009 ffffffff81c01d58 ffffffff816441db 0000000000000000
[    0.000000]  ffffffff81c01d90 ffffffff8106715d 0000000000000000 000000003ffe1000
[    0.000000]  0000000000000000 0000000000000854 0000000000000000 ffffffff81c01da0
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff816441db>] dump_stack+0x45/0x56
[    0.000000]  [<ffffffff8106715d>] warn_slowpath_common+0x7d/0xa0
[    0.000000]  [<ffffffff8106723a>] warn_slowpath_null+0x1a/0x20
[    0.000000]  [<ffffffff81d264a1>] __early_ioremap+0x12b/0x1ce
[    0.000000]  [<ffffffff81d26371>] ? __early_set_fixmap+0x97/0x9c
[    0.000000]  [<ffffffff81d2672a>] early_ioremap+0x13/0x15
[    0.000000]  [<ffffffff81d1e173>] __acpi_map_table+0x13/0x18
[    0.000000]  [<ffffffff8163dbaa>] acpi_os_map_memory+0x26/0x14e
[    0.000000]  [<ffffffff81d51037>] acpi_tb_parse_root_table+0x187/0x2c3
[    0.000000]  [<ffffffff813863c9>] ? acpi_find_root_pointer+0x11b/0x15e
[    0.000000]  [<ffffffff81d511ca>] acpi_initialize_tables+0x57/0x59
[    0.000000]  [<ffffffff81d4ef97>] acpi_table_init+0x1b/0x99
[    0.000000]  [<ffffffff81d1e523>] acpi_boot_table_init+0x1e/0x85
[    0.000000]  [<ffffffff81d1609f>] setup_arch+0xbc3/0xcec
[    0.000000]  [<ffffffff81d0ebbc>] start_kernel+0xcf/0x416
[    0.000000]  [<ffffffff81d0e120>] ? early_idt_handlers+0x120/0x120
[    0.000000]  [<ffffffff81d0e5de>] x86_64_start_reservations+0x2a/0x2c
[    0.000000]  [<ffffffff81d0e6e8>] x86_64_start_kernel+0x108/0x117
[    0.000000] ---[ end trace 0d4a133504d48174 ]---

I bisected this and it was introduced by the following commit:

commit 868270f23d8db2cce83e4f082fe75e8625a5fbf9
Author: Michael S. Tsirkin <mst>
Date:   Mon Jul 28 23:07:11 2014 +0200

    acpi-build: tweak acpi migration limits

Indeed, if I revert that commit everything works. Now, two important points:

1. As mentioned in comment 7, this issue can't be reproduced with qemu-kvm-rhev-2.1.0-2.el7.x86_64 because of the fix for bug 1118665 which prevents this issue. I can't tell if it's a proper fix of if it's just a workaround

2. I wasn't able to reproduce this issue without passing -kernel/-initrd to QEMU. The question is whether this issue only affects -kernel/-initrd or if -kernel/-initrd are only triggering a general problem. If this issue only affects -kernel/-initrd then we have to check whether they are supported options, if they are not then this issue is not important for RHEL

I will report this on upstream and discuss a downstream solution with Michael (maybe I'll re-assign this BZ to him too).

Comment 9 Luiz Capitulino 2014-08-21 13:12:52 UTC
I've reported this upstream and Michael has already posted a patch that fixes the issue for me:

[PATCH] pc: reserve more memory for ACPI for new machine type
http://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg03534.html

I'm now going to discuss with him what's the best way to handle this in RHEL7.1 given the two points risen in comment 8.

Comment 10 Luiz Capitulino 2014-08-26 14:05:05 UTC
Michael's fix has been merged upstream:

commit 927766c7d34275ecf586020cc5305e377cc4af10
Author: Michael S. Tsirkin <mst>
Date:   Wed Aug 20 21:58:12 2014 +0200

    pc: reserve more memory for ACPI for new machine types

Comment 12 Luiz Capitulino 2014-09-03 18:23:28 UTC
Two quick updates on this one:

1. It's possible to reproduce this issue with qemu-kvm-rhev-2.1.0-3.el7 by using the q35 machine type

2. For some weird and unknown reason, the fix from commit 10 doesn't fix this issue in qemu-kvm-rhev. Either, it depends on a patch that doesn't exist in qemu-kvm-rhev (but exists on upstream) or there's another bug lurking in qemu-kvm-rhev but not on upstream

I'll wait until qemu-kvm-rhev is rebased to qemu 2.1.1 to work on this again.

Comment 13 Luiz Capitulino 2014-11-10 18:39:40 UTC
I've re-tested this bug against qemu-kvm-rhev-2.1.2-7.el7 and things seem to just work as expected. qemu-kvm-rhev-2.1.2-7.el7 has been rebased on top of latest QEMU stable, this means that qemu-kvm-rhev contains the fix mentioned in comment 10.

Could you please confirm this now works for you too?

PS: Sorry for the long delay. I had to wait for qemu-kvm-rhev to be rebased and also I've got other things on my plate.

Comment 14 Shanzhi Yu 2014-11-11 03:39:26 UTC
Hi Luiz,

It works well for me. 

Thanks for the fix.

Comment 20 mazhang 2014-12-18 07:08:46 UTC
Verify this bug on qemu-kvm-rhev-2.1.2-16.el7.x86_64.

Result:
kernel's ACPI interpreter has been enabled, didn't found call trace in dmesg.
After poweroff guest, got correct status in hmp:

(qemu) sendkey ctrl-alt-f2
(qemu) info status
VM status: paused (shutdown)


So this bug has been fixed.

Comment 24 juzhang 2014-12-23 06:52:22 UTC
Thanks Jeff.

Best Regards,
Junyi

Comment 27 errata-xmlrpc 2015-03-05 09:54:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html