Red Hat Bugzilla – Bug 1377087
shutdown rhel 5.11 guest failed and stop at "system halted"
Last modified: 2017-08-01 13:46:48 EDT
Created attachment 1202146 [details] screendump of the guest when system halted. Description of problem: Boot one rhel5.11 guest, then execute "shutdown -h now" in the guest, guest will fail to shutdown, and stop at "system halted" Version-Release number of selected component (if applicable): qemu: qemu-kvm-1.5.3-125.el7.x86_64 host kernel: kernel-3.10.0-506.el7.x86_64 guest kernel: kernel-2.6.18-398.el5PAE How reproducible: 100% Steps to Reproduce: 1.Boot one rhel 5.11 guest. 2.Login the guest, execute "shutdown -h now". 3.Guest failed to shutdown, and stop at "system halted". At this time, from qemu side, guest status is "running". (qemu) info status VM status: running Actual results: guest failed to shutdown, and stop at "system halted". Expected results: guest should shutdown successfully. Additional info: This is a regression bug since "qemu-kvm-1.5.3-125.el7.x86_64". With qemu-kvm-1.5.3-124.el7.x86_64, failed to hit this issue. CLI: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -vga qxl \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=05 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/usr/share/avocado/data/avocado-vt/images/RHEL-Server-5.11-32-virtio.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=06 \ -device virtio-net-pci,mac=9a:0e:0f:10:11:12,id=idqsknof,vectors=4,netdev=idwergXL,bus=pci.0,addr=07 \ -netdev tap,id=idwergXL,vhost=on \ -m 4096 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'Opteron_G3' \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -boot menu=on \ -enable-kvm \ -monitor stdio \
Created attachment 1202147 [details] serial log for guest
Seems only rhel5.11 hit this issue. both 32 bit and 64 bit guest. rhel6.8 and rhel7.3 guest are ok in my test.
> Additional info: > This is a regression bug since "qemu-kvm-1.5.3-125.el7.x86_64". > With qemu-kvm-1.5.3-124.el7.x86_64, failed to hit this issue. > Seems qemu-kvm-1.5.3-125.el7 build just fixed one bz. Bug 1285453 - An NBD client can cause QEMU main loop to block when connecting to built-in NBD server Hi Fam, Could you have a look? Best Regards, Junyi
Looking at the dmesg, the ACPI errors while booting is what is new in qemu-kvm-1.5.3-125.el7. But like Junyi said, that build only included a highly unrelated change compared to the previous, qemu-kvm-1.5.3-124.el7. The ACPI errors are not seen on the previous build. For completeness, here is the full diff between good and bad boots: # diff qemu-kvm-1.5.3-124.dmesg.log qemu-kvm-1.5.3-125.dmesg.log 7,8c7,8 < BIOS-e820: 0000000000100000 - 00000000bfffd000 (usable) < BIOS-e820: 00000000bfffd000 - 00000000c0000000 (reserved) --- > BIOS-e820: 0000000000100000 - 00000000bfffb000 (usable) > BIOS-e820: 00000000bfffb000 - 00000000c0000000 (reserved) 33c33 < Nosave address range: 00000000bfffd000 - 00000000c0000000 --- > Nosave address range: 00000000bfffb000 - 00000000c0000000 41c41 < Built 1 zonelists. Total pages: 1029031 --- > Built 1 zonelists. Total pages: 1029029 53c53 < Memory: 3908972k/5242880k available (2630k kernel code, 284868k reserved, 1679k data, 224k init) --- > Memory: 3908964k/5242880k available (2630k kernel code, 284868k reserved, 1679k data, 224k init) 64a65,88 > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc20000004068 offset 4, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc2000000406f offset B, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc20000004076 offset 12, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc2000000407d offset 19, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 3 at AML address ffffc20000004082 offset 1E, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc20000004084 offset 20, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 3 at AML address ffffc20000004089 offset 25, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc2000000408b offset 27, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 3 at AML address ffffc20000004090 offset 2C, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc20000004092 offset 2E, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc20000004068 offset 4, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc2000000406f offset B, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc20000004076 offset 12, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc2000000407d offset 19, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 3 at AML address ffffc20000004082 offset 1E, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc20000004084 offset 20, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 3 at AML address ffffc20000004089 offset 25, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc2000000408b offset 27, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 3 at AML address ffffc20000004090 offset 2C, ignoring [20060707] > ACPI Error (psloop-0196): Found unknown opcode 15 at AML address ffffc20000004092 offset 2E, ignoring [20060707] > ACPI Error (dsobject-0134): [ON] Namespace lookup failure, AE_NOT_FOUND > ACPI Exception (tbxface-0113): AE_NOT_FOUND, Could not load namespace [20060707] > ACPI Exception (tbxface-0120): AE_NOT_FOUND, Could not load tables [20060707] > ACPI: Unable to load the System Description Tables 73d96 < ACPI: bus type pci registered 75,85c98 < ACPI: Interpreter enabled < ACPI: Using IOAPIC for interrupt routing < ACPI: No dock devices found. < ACPI: PCI Root Bridge [PCI0] (0000:00) < PCI quirk: region 0600-063f claimed by PIIX4 ACPI < PCI quirk: region 0700-070f claimed by PIIX4 SMB < ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11) < ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11) < ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11) < ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11) < ACPI: PCI Interrupt Link [LNKS] (IRQs *9) --- > ACPI: Interpreter disabled. 87,88c100 < pnp: PnP ACPI init < pnp: PnP ACPI: found 6 devices --- > pnp: PnP ACPI: disabled 91,92c103,106 < PCI: Using ACPI for IRQ routing < PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report --- > PCI: Probing PCI hardware > PCI quirk: region 0600-063f claimed by PIIX4 ACPI > PCI quirk: region 0700-070f claimed by PIIX4 SMB > pci 0000:00:01.0: PIIX/ICH IRQ router [8086/7000] 106c120 < type=2000 audit(1474539822.549:1): initialized --- > type=2000 audit(1474540036.509:1): initialized 124d137 < ACPI: Invalid PBLK length [0] 130d142 < 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 143c155 < PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12 --- > PNP: No PS/2 controller found. Probing ports directly. 153,154d164 < input: AT Translated Set 2 keyboard as /class/input/input0 < ACPI: (supports S5) 163a174 > input: AT Translated Set 2 keyboard as /class/input/input0 176,177d186 < ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 < ACPI: PCI Interrupt 0000:00:06.0[A] -> Link [LNKB] -> GSI 10 (level, high) -> IRQ 10
BTW, failure to initialize ACPI is the reason why guest refuses to shutdown properly, and falls back to a "halt" state as seen in the report instead.
(Adding Marcel.) This is a very interesting bug, and I think the regression is actually caused by a SeaBIOS change, not a qemu-kvm change. Like everyone else, I investigated the difference between "qemu-kvm-1.5.3-124.el7." and "qemu-kvm-1.5.3-125.el7", and that qemu_set_nonblock(client->sock); call in nbd_co_client_start() is really completely irrelevant. However, look at the timestamps! This bug was reported on 2016-Sep-18, and for our downstream SeaBIOS package, the only change in a very long time, since 2016-May-11 specifically, has been this one: * Thu Sep 15 2016 Miroslav Rezanina <mrezanin@redhat.com> - 1.9.1-5.el7 - seabios-pci-don-t-map-virtio-1.0-storage-devices-above-4G.patch [bz#1373154] - Resolves: bz#1373154 (Guest fails boot up with ivshmem-plain and virtio-pci device) That is, the first new SeaBIOS build became available, since May, just three days before this regression was reported. Thus I'm inclined to think that the qemu-kvm update to 1.5.3-125.el7 on QE's side *coincided* with the SeaBIOS update to 1.9.1-5.el7, and then the regression got mis-attributed to qemu-kvm-1.5.3-125.el7. Now, the only difference between seabios-1.9.1-4.el7 and seabios-1.9.1-5.el7 is: commit 01549028733315a513b1b5fcc1951fd271e8a531 Author: Marcel Apfelbaum <marcel@redhat.com> Date: Tue Sep 13 13:20:45 2016 +0200 pci: don't map virtio 1.0 storage devices above 4G RH-Author: Marcel Apfelbaum <marcel@redhat.com> Message-id: <1473772845-913-1-git-send-email-marcel@redhat.com> Patchwork-id: 72292 O-Subject: [RHEL-7.3 seabios PATCH V2] pci: don't map virtio 1.0 storage devices above 4G Bugzilla: 1373154 RH-Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com> RH-Acked-by: Gerd Hoffmann <kraxel@redhat.com> RH-Acked-by: Laszlo Ersek <lersek@redhat.com> RH-Acked-by: Michael S. Tsirkin <mst@redhat.com> v1->v2: - add the note to the commit message (Gerd) BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1373154 Brew: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11741451 Upstream: Fixed upstream by commit: 0e21548b15 (virtio: pci cfg access) Tests: Checked the virtio BARs are placed in the 32-bit range and the guest boots successfully. Otherwise SeaBIOS can't access virtio's modern BAR. Note: It works in the master branch but can't be merged easily into 1.9 branch, so use this as an interim solution until we'll rebase to 1.10. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com> What likely happens here is that SeaBIOS now allocates the MMIO BAR of the sole virtio-blk device -- see comment #0 -- under 4GB. And then you end up with two more reserved pages -- and consequently two fewer free pages -- under 4GB: (In reply to Fam Zheng from comment #7) > # diff qemu-kvm-1.5.3-124.dmesg.log qemu-kvm-1.5.3-125.dmesg.log > 7,8c7,8 > < BIOS-e820: 0000000000100000 - 00000000bfffd000 (usable) > < BIOS-e820: 00000000bfffd000 - 00000000c0000000 (reserved) > --- > > BIOS-e820: 0000000000100000 - 00000000bfffb000 (usable) > > BIOS-e820: 00000000bfffb000 - 00000000c0000000 (reserved) > [...] > 41c41 > < Built 1 zonelists. Total pages: 1029031 > --- > > Built 1 zonelists. Total pages: 1029029 In turn, this change in the 32-bit memory map probably displaces the ACPI payload generated by QEMU and installed by SeaBIOS from its original location to a different address. That should be no problem, of course, as long as the memory is not corrupted in some way. Now, given that the rhel6.8 and rhel7.3 guests seem fine with the change (according to comment 3), I can't promise that the SeaBIOS patch actually regressed SeaBIOS -- it might just tickle something brittle in the RHEL-5.11 guest kernel's ACPI interpreter. So: (1) Please try to reproduce the issue with: - the same RHEL-5.11 guest, - qemu-kvm-1.5.3-125.el7, - seabios-1.9.1-4.el7 (i.e., SeaBIOS should be downgraded). This should confirm whether the qemu-kvm or the seabios update triggers the issue. (2) Please re-verify: - the same RHEL-6.8 guest and the same RHEL-7.3 guest (using an otherwise identical QEMU command line to the RHEL-5.11 case), - with qemu-kvm-1.5.3-125.el7, - and seabios-1.9.1-5.el7. If these more modern guests are okay with the SeaBIOS change, then we might have to patch the RHEL-5 guest kernel. (Theoretically it's possible that the SeaBIOS patch causes genuine corruption in the ACPI tables, but that would be visible to the RHEL-6.8 and RHEL-7.3 guests as well.) Thanks!
When I fiddled around "qemu-kvm-1.5.3-124.el7." and "qemu-kvm-1.5.3-125.el7" on my machine yesterday, the seabios package was kept intact all the time: # rpm -q seabios-bin seabios-bin-1.9.1-5.el7.noarch Maybe I'm making a stupid mistake, but my testing of just now shows that downgrading seabios to "seabios-1.9.1-4.el7" doesn't help at all. The dmesg is exactly the same before/after the downgrading, and the guest system still "halts". Laszlo, let me know if I can do any other quick tests.
Thanks Fam -- can you please confirm that you downgraded the "seabios-bin" package as well? (That's the one that supplies the firmware binary actually.)
The version i have tested when hit this bug is : qemu: qemu-kvm-1.5.3-125.el7.x86_64 seavgabios-bin: seavgabios-bin-1.9.1-4.el7.noarch seabiso-bin:seabios-bin-1.9.1-4.el7.noarch Then, only downgrade qemu version to "qemu-kvm-1.5.3-124.el7.x86_64", it is ok. Now, test with latest seabios-bin version: qemu: qemu-kvm-1.5.3-126.el7.x86_64 seavgabios-bin: seavgabios-bin-1.9.1-5.el7.noarch seabiso-bin: seabios-bin-1.9.1-5.el7.noarch Still can hit this bug. Downgrade qemu version to "qemu-kvm-1.5.3-124.el7.x86_64", it is ok. qemu-kvm | seabios-bin/seavgabios-bin | result | --------------------|-------------------------------|---------| qemu-kvm-1.5.3-124 | 1.9.1-5 | OK | | 1.9.1-4 | OK | --------------------|-------------------------------|---------| qemu-kvm-1.5.3-125 | 1.9.1-5 | NG | | 1.9.1-4 | NG |
(In reply to Laszlo Ersek from comment #13) > Thanks Fam -- can you please confirm that you downgraded the "seabios-bin" > package as well? (That's the one that supplies the firmware binary actually.) Yes, that is what I did.
Thanks guys, your data proves that my hypothesis about SeaBIOS was incorrect. I'll try to reproduce the issue locally and see if I can get more insight.
(Click "Unwrap comments" to the right of comment 0 for reading this comment.) My test environment consist of SeaBIOS 1.9.1-4 (invariably), RHEL-5.11 GA (invariably), and qemu-kvm-1.5.3-124 vs. qemu-kvm-1.5.3-125. I launched the RHEL-5.11 guest under -124, and dumped the guest memory with virsh dump seabios.rhel5 seabios.rhel5.124.core \ --memory-only --format kdump-snappy Then I shut off the guest, upgraded qemu-kvm to -125, booted it, and repeated the same: virsh dump seabios.rhel5 seabios.rhel5.125.core \ --memory-only --format kdump-snappy I shut off the guest (well, forced it off ultimately). I also captured the dmesg in the guest (after booting with the "ignore_loglevel" kernel param), for both qemu-kvm versions. As it's already known from Fam's investigations, we have inexplicable differences in the ACPI table addresses like: > --- with-124/dmesg 2016-09-23 13:43:13.610686990 +0200 > +++ with-125/dmesg 2016-09-23 13:43:25.578544705 +0200 > @@ -13,10 +13,10 @@ > DMI: Red Hat KVM, BIOS 0.5.1 01/01/2011 > kvm-clock: cpu 0, msr 7eff:804ab401, boot clock > ACPI: RSDP (v000 BOCHS ) @ 0x00000000000f7350 > -ACPI: RSDT (v001 BOCHS BXPCRSDT 0x00000001 BXPC 0x00000001) @ 0x00000000bffffaba > -ACPI: FADT (v001 BOCHS BXPCFACP 0x00000001 BXPC 0x00000001) @ 0x00000000bfffeeb7 > -ACPI: SSDT (v001 BOCHS BXPCSSDT 0x00000001 BXPC 0x00000001) @ 0x00000000bfffef2b > -ACPI: MADT (v001 BOCHS BXPCAPIC 0x00000001 BXPC 0x00000001) @ 0x00000000bffffa0a > +ACPI: RSDT (v001 BOCHS BXPCRSDT 0x00000001 BXPC 0x00000001) @ 0x00000000bffffb30 > +ACPI: FADT (v001 BOCHS BXPCFACP 0x00000001 BXPC 0x00000001) @ 0x00000000bfffef0b > +ACPI: SSDT (v001 BOCHS BXPCSSDT 0x00000001 BXPC 0x00000001) @ 0x00000000bfffef7f > +ACPI: MADT (v001 BOCHS BXPCAPIC 0x00000001 BXPC 0x00000001) @ 0x00000000bffffa80 > ACPI: DSDT (v001 BOCHS BXPCDSDT 0x00000001 BXPC 0x00000001) @ 0x(null) > No NUMA configuration found > Faking a node at 0000000000000000-0000000140000000 This difference is already unfathomable, but I wanted to see the contents of those tables; most importantly, the SSDT, because that's what contains the _S5 package description for powering off the machine. So, I installed the "crash" utility on my RHEL-7 laptop, plus the following two debuginfo RPMs, matching the RHEL-5.11 GA kernel that ran in the guest: kernel-debuginfo-2.6.18-398.el5.x86_64 kernel-debuginfo-common-2.6.18-398.el5.x86_64 (Let me repeat -- you can install any kernel debuginfo package on your laptop or workstation, it doesn't have to match your running kernel -- instead it has to match the dumped vmcore that you want to analyze with "crash".) So, here's what "crash" has to say about the contents of the SSDT, when the guest is booted with -124, using the physical start address from the dmesg: > crash> rd -p -8 0x00000000bfffef2b 100 > bfffef2b: 53 53 44 54 df 0a 00 00 01 0d 42 4f 43 48 53 20 SSDT......BOCHS > bfffef3b: 42 58 50 43 53 53 44 54 01 00 00 00 42 58 50 43 BXPCSSDT....BXPC > bfffef4b: 01 00 00 00 10 42 05 5c 00 08 50 30 53 5f 0c 00 .....B.\..P0S_.. > bfffef5b: 00 00 c0 08 50 30 45 5f 0c ff ff bf fe 08 50 31 ....P0E_......P1 > bfffef6b: 56 5f 0a 00 08 50 31 53 5f 11 0b 0a 08 00 00 00 V_...P1S_....... > bfffef7b: 00 00 00 00 00 08 50 31 45 5f 11 0b 0a 08 00 00 ......P1E_...... > bfffef8b: 00 00 00 00 .... Okay. Let's see the same for -125 (using the right address again from the -125 dmesg): > crash> rd -p -8 0x00000000bfffef7f 100 > bfffef7f: 53 53 44 54 01 0b 00 00 01 e1 42 4f 43 48 53 20 SSDT......BOCHS > bfffef8f: 42 58 50 43 53 53 44 54 01 00 00 00 42 58 50 43 BXPCSSDT....BXPC > bfffef9f: 01 00 00 00 a0 21 00 15 5c 2e 5f 53 42 5f 50 43 .....!..\._SB_PC > bfffefaf: 49 30 06 00 15 5c 2f 03 5f 53 42 5f 50 43 49 30 I0...\/._SB_PCI0 > bfffefbf: 49 53 41 5f 06 00 10 42 05 5c 00 08 50 30 53 5f ISA_...B.\..P0S_ > bfffefcf: 0c 00 00 00 c0 08 50 30 45 5f 0c ff ff bf fe 08 ......P0E_...... > bfffefdf: 50 31 56 5f P1V_ (Side note: I had to use the "crash" utility and memory dumps for this because RHEL-5 doesn't ship "acpidump". No RHEL-5 package provides it, and when I built it from source, in the guest, it failed to dump anything at all.) What the heck??? The addresses of the tables differ because their sizes and their contents differ too! These differences are obviously impossible to correlate with the fix for bug 1285453, however. So I opened the build pages in Brew, for both -124 [1] and -125 [2] -- see the URLs in the next, private, comment --, downloaded the build log for each [3] [4], and compared the "iasl" build messages. (In the qemu-kvm version that we ship in base RHEL (forked from upstream 1.5.3), we still build the ACPI payload from DSL template files. The _S5 package, which controls ACPI power-off, is in "hw/i386/ssdt-misc.dsl".) In the -124 build, iasl emitted the following messages: > iasl -Pn -vs -l -tc -p ssdt-misc ssdt-misc.dsl.i 2>&1 > ASL Input: ssdt-misc.dsl.i - 102 lines, 2567 bytes, 35 keywords > AML Output: ssdt-misc.aml - 354 bytes, 24 named objects, 11 executable opcodes > Listing File: ssdt-misc.lst - 7590 bytes > Hex Dump: ssdt-misc.hex - 3686 bytes > Compilation complete. 0 Errors, 0 Warnings, 0 Remarks, 2 Optimizations Whereas in the -125 build, iasl emitted: > iasl -Pn -vs -l -tc -p ssdt-misc ssdt-misc.dsl.i 2>&1 > ASL Input: ssdt-misc.dsl.i - 102 lines, 2567 bytes, 35 keywords > AML Output: ssdt-misc.aml - 388 bytes, 24 named objects, 11 executable opcodes > Listing File: ssdt-misc.lst - 7874 bytes > Hex Dump: ssdt-misc.hex - 3986 bytes > Compilation complete. 0 Errors, 0 Warnings, 0 Remarks, 2 Optimizations Note that the "AML Output" lines differ. Given that the patch for bug 1285453 doesn't touch "ssdt-misc.dsl", this difference can only be explained by a change in *iasl itself*. So, after the build logs, I also downloaded the "root logs" (= the buildroot setup logs) for both the -124 and -125 builds [5] [6], and compared them. Here we go: for -124, we got > DEBUG util.py:257: --> acpica-tools-20150619-3.el7.x86_64 while for -125, we got > DEBUG util.py:257: --> acpica-tools-20160527-1.el7.x86_64 That is, the "rhel-7.3-candidate" Brew build root saw an upgrade for "acpica-tools", form 20150619-3.el7 to 20160527-1.el7, unbeknownst to us. This caused "iasl" (which is part of acpica-tools) to compile "ssdt-misc.dsl" into a different AML byte-stream. The new AML can be digested by the AML interpreters in the RHEL-6 and RHEL-7 guest kernels; the RHEL-5 guest kernel chokes on the new AML however. This BZ is definitely a blocker. There are three approaches to fix the bug. First, we could try to convince the new iasl, with various command line options, to emit AML that the RHEL-5 guest kernel can digest. Second, we could modify the spec file for qemu-kvm so that it BuildRequires the known-good, exact version of iasl. That is, "acpica-tools-20150619-3.el7.x86_64". Third, the qemu-kvm build system supports the inclusion of pre-generated AML (which is actually checked into the git tree), should the "iasl" utility be unavailable on the build host. In RHEL-7 downstream we don't use this fallback, but we could -- we could remove the "BuildRequires: iasl" RPM macro completely, and make sure that the pre-generated AML is the right one. My preference is option #2. Option #1 is a moving target; every new iasl version could mess up stuff for us in a different way. And option #3 is not too safe either; even if we don't require "iasl", the build root could include it at some point independently, and then the safe fallback wouldn't be used at build. There's option #3/b as well: we could modify the qemu-kvm build system to *only* consider the pre-generated AML, and never use "iasl", even when it's available. I think #3/b would also be viable, but it's more intrusive than option #2, so I prefer to try option #2 first.
Note that the same bug shouldn't affect qemu-kvm-rhev: in qemu-kvm-rhev, we have no template DSL files, and "iasl" does not partake in the build process. The complete ACPI payload is generated by qemu-kvm-rhev at runtime, implemented in C.
Option #2 is a no-go. I tried to build qemu-kvm with the following patch in place: > diff --git a/redhat/qemu-kvm.spec.template b/redhat/qemu-kvm.spec.template > index c82642de3614..b478eaa54544 100644 > --- a/redhat/qemu-kvm.spec.template > +++ b/redhat/qemu-kvm.spec.template > @@ -228,7 +228,7 @@ BuildRequires: librdmacm-devel > # iasl and cpp for acpi generation (not a hard requirement as we can use > # pre-compiled files, but it's better to use this) > %ifarch %{ix86} x86_64 > -BuildRequires: iasl > +BuildRequires: acpica-tools = 20150619-3.el7 > BuildRequires: cpp > %endif > %if 0%{!?build_only_sub:1} But then Brew said, > Error: No Package found for acpica-tools = 20150619-3.el7 So, the next choice is option #3/b.
Do we think it's a bug in iasl? That means upstream qemu builds on rhel will produce a broken binary (not latest qemu, that does not use iasl anymore). why do we want to work around and not fix iasl?
(In reply to Michael S. Tsirkin from comment #22) > Do we think it's a bug in iasl? That means upstream qemu builds on rhel will > produce a broken binary (not latest qemu, that does not use iasl anymore). > why do we want to work around and not fix iasl? It's not a bug in iasl; the AML emitted by the new iasl is consumed by RHEL-6.8 and RHEL-7.3 guests just fine (see comment 3). Instead, it's an ACPI compat bug in the AML interpreter of RHEL-5.11. And, even if we fixed that bug in RHEL-5.11.z, the 5.11 GA installer ISO would no longer work; a new installer ISO (= 5.12) would be necessary, which I don't think will happen (certainly not just for qemu's / iasl's sake). It's practically the same thing as with old Windows guests: in upstream QEMU we've been careful lately not to generate otherwise valid AML that is known to break old Windows guests. The RHEL-5.11 guest is now in the same category.
Do you know what the change is? It does not prove a lot that some ASPMs can consume it, iasl should generate code that is compatible with the claimed version of the spec, which is ACPI 1 for our case.
More info: the specific opcode that trips up the RHEL-5 guest is 0x15 ("ExternalOp"). This opcode was added in ACPI 6.0, and its sole purpose is to support the disassembler in determining the prototype of external methods. At execution time, the opcode should be ignored, it is embedded in a if(0){} block. The following is an excerpt from the upstream ACPI CA git tree, file documents/changes.txt, at current master (git commit 0c1666287140, 2016-Sep-23): > ---------------------------------------- > 12 February 2016. Summary of changes for version 20160212: > > [...] > > 2) iASL Compiler/Disassembler and Tools: > > Completed full support for the ACPI 6.0 External() AML opcode. The > compiler emits an external AML opcode for each ASL External statement. > This opcode is used by the disassembler to assist with the disassembly of > external control methods by specifying the required number of arguments > for the method. AML interpreters do not use this opcode. To ensure that > interpreters do not even see the opcode, a block of one or more external > opcodes is surrounded by an "If(0)" construct. As this feature becomes > commonly deployed in BIOS code, the ability of disassemblers to correctly > disassemble AML code will be greatly improved. David Box. The If(0) trick works with the RHEL-6.8 and RHEL-7.3 guests, but it does not prevent the RHEL-5.11 guest from seeing the 0x15 (ExternalOp) opcode, and unfortunately RHEL-5.11 chokes on it. The iasl utility doesn't seem to support a command line option that turns off this feature.
I'm also in the process of checking whether current upstream iasl behaves any different. Namely, I built the iasl binary at upstream commit 0c1666287140, and embedded it in the SRPM by adding it to EXTRA_SOURCES in "redhat/Makefile.common", adding it to the spec file as Source21, and passing it to configure with --iasl=%{SOURCE21}. It's currently brewing. Once done, I'll check the build log (to make sure it was indeed used to build the tables) and then I'll repeat the RHEL-5.11 test.
From the build log (note the pathname of the iasl binary): > /builddir/build/SOURCES/iasl-0c1666287140 -Pn -vs -l -tc -p ssdt-misc ssdt-misc.dsl.i 2>&1 > ASL Input: ssdt-misc.dsl.i - 102 lines, 2567 bytes, 35 keywords > AML Output: ssdt-misc.aml - 388 bytes, 24 named objects, 11 executable opcodes > Listing File: ssdt-misc.lst - 7874 bytes > Hex Dump: ssdt-misc.hex - 3986 bytes > Compilation complete. 0 Errors, 0 Warnings, 0 Remarks, 2 Optimizations The "AML Output" line matches that under "-125 build" in comment 17, that is, the flawed build. After launching the RHEL-5.11 guest with the qemu-kvm binary built like this, I get the same "Found unknown opcode 15 at AML address ..." error messages as originally reported. Thus, I confirm that current upstream iasl presents the same reported problem for the RHEL-5.11 guest.
*** Bug 1394095 has been marked as a duplicate of this bug. ***
Fix included in qemu-kvm-1.5.3-127.el7
Verify this bz with the latest qemu-kvm build by now. Test version: kernel: kernel-3.10.0-591.el7.x86_64 qemu: qemu-kvm-1.5.3-133.el7.x86_64 seabios: seavgabios-bin-1.10.1-2.el7.noarch seabios-bin-1.10.1-2.el7.noarch This test is covered by acceptance test. Test with both 32 bit and 64 bit rhel 5.11 guest. all pass. 020-smp_8.8192m.repeat1.Host_RHEL.m7.u4.spice.qcow2.virtio_blk.up.virtio_net.RHEL.5.11.x86_64.io-github-autotest-qemu.shutdown PASS 022-smp_8.8192m.repeat1.Host_RHEL.m7.u4.spice.qcow2.virtio_blk.up.virtio_net.Win2012.x86_64.r2.io-github-autotest-qemu.shutdown PASS And it is ok too when test manually. According to the test result above, move to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:1856