Bug 1173167

Summary: Corrupted ACPI tables in some configurations using pc-i440fx-rhel7.0.0
Product: Red Hat Enterprise Linux 7 Reporter: Eduardo Habkost <ehabkost>
Component: qemu-kvm-rhevAssignee: Eduardo Habkost <ehabkost>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.1CC: armbru, atheurer, dgilbert, hhuang, huding, juzhang, lersek, pbonzini, qiguo, tlavigne, virt-maint, xfu, ypu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.1.2-17.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 09:57:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
verified by the logs of dmesg in guest none

Description Eduardo Habkost 2014-12-11 15:06:43 UTC
Description of problem:
When starting a guest with some NUMA nodes and specific cores/sockets CPU topology, the guest complains about invalid ACPI tables.


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.1.2-16.el7.x86_64
libvirt-1.2.8-9.el7.x86_64

How reproducible:
Always (when using config below). Couldn't reproduce when trying to simplify qemu-kvm command-line and remove a few options that seemed unrelated.

Steps to Reproduce:
1. Start VM using the attached libvirt XML configuration
2. Check guest dmesg

Actual results:
[root@localhost ~]# dmesg | egrep -i 'numa|srat|acpi'
[    0.000000] ACPI: RSDP 00000000000f1fe0 00014 (v00 BOCHS )
[    0.000000] ACPI: \xffffffff\xfffffffb?? 000000007ffff6d8 00000 (v208 \xffffffd0?A?\xffffffff\xffffffff \xffffffff\xffffffff\xffffffff\xffffffffCPU  6F420033  chs 04200400)
[    0.000000] ACPI BIOS Error (bug): Invalid table length 0x0 in RSDT/XSDT (20130517/tbutils-513)
[    0.000000] No NUMA configuration found
[    0.177064] ACPI: Interpreter disabled.
[    0.204876] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by PIIX4 ACPI
[    0.259760] pnp: PnP ACPI: disabled
[    4.628306] ACPI Exception: AE_BAD_PARAMETER, Thread 2141639648 could not acquire Mutex [0x1] (20130517/utmutex-285)
[root@localhost ~]# 


Expected results:
NUMA topology and ACPI tables shouldn't be ignored.

Additional info:

libvirt XML:

<domain type='kvm'>
  <name>vm1</name>
  <uuid>e1c3ee44-f425-4a64-9720-18d1d0177588</uuid>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB' nodeset='0-1'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>40</vcpu>
  <numatune>
    <memnode cellid='0' mode='strict' nodeset='0'/>
    <memnode cellid='1' mode='strict' nodeset='1'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/> 
  </features>
  <cpu mode='host-model'>
    <model fallback='allow'/>
    <topology sockets='2' cores='10' threads='2'/>
    <numa> 
      <cell id='0' cpus='0-19' memory='1048576'/>
      <cell id='1' cpus='20-39' memory='1048576'/>
    </numa>
  </cpu>   
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/Fedora-x86_64-20-20131211.1-sda.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <interface type='bridge'>
      <mac address='52:54:00:61:61:0d'/>
      <source bridge='virbr0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <serial type='file'>
      <source path='/tmp/vm1.console'/>
      <target port='1'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'/>
    <video>
      <model type='cirrus' vram='16384' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </video>
    <memballoon model='none'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'/>
</domain>  

QEMU command-line:
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name vm1 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pcid,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 2048 -realtime mlock=off -smp 40,sockets=2,cores=10,threads=2 -object memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=1024M,id=ram-node0,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-19,memdev=ram-node0 -object memory-backend-file,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,size=1024M,id=ram-node1,host-nodes=1,policy=bind -numa node,nodeid=1,cpus=20-39,memdev=ram-node1 -uuid e1c3ee44-f425-4a64-9720-18d1d0177588 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/libvirt/images/Fedora-x86_64-20-20131211.1-sda.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:61:61:0d,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev file,id=charserial1,path=/tmp/vm1.console -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x5 -msg timestamp=on

Comment 1 Eduardo Habkost 2014-12-11 17:07:09 UTC
Bug is not present if using machine-type pc-i440fx-rhel7.1.0.

Comment 2 Eduardo Habkost 2014-12-11 17:13:47 UTC
Simplified command-line to reproduce the bug:
/usr/libexec/qemu-kvm -nodefaults -serial stdio -machine pc-i440fx-rhel7.0.0,accel=kvm -m 2048 -smp 40,sockets=2,cores=10,threads=2 -numa node,nodeid=0,cpus=0-19 -numa node,nodeid=1,cpus=20-39  -drive if=virtio,file=/var/lib/libvirt/images/Fedora-x86_64-20-20131211.1-sda.qcow2,format=qcow2

Bug can't be reproduced if not using -numa.

Bug can't be reproduced if using: 8 cores per socket and 32 VCPUs:
/usr/libexec/qemu-kvm -nodefaults -serial stdio -machine pc-i440fx-rhel7.0.0,accel=kvm -m 2048 -smp 32,sockets=2,cores=8,threads=2 -numa node,nodeid=0,cpus=0-15 -numa node,nodeid=1,cpus=16-31  -drive if=virtio,file=/var/lib/libvirt/images/Fedora-x86_64-20-20131211.1-sda.qcow2,format=qcow2

Comment 3 Eduardo Habkost 2014-12-11 17:46:40 UTC
Bug is triggered by the legacy_acpi_table_size=6418 line on pc_compat_rhel700().

Comment 4 Eduardo Habkost 2014-12-12 13:56:30 UTC
When reproducing the bug I see the ACPI table size warning:
  qemu-system-x86_64: Warning: migration may not work.
  qemu-system-x86_64: Warning: migration may not work.

Comment 5 Paolo Bonzini 2014-12-12 16:08:56 UTC
<ehabkost> I think I found it
<ehabkost> we are using max_cpus instead of apic_id_limit

Comment 6 Eduardo Habkost 2014-12-12 16:44:02 UTC
Fix submitted upstream:

From: Eduardo Habkost <ehabkost>
To: qemu-devel, qemu-stable
Cc: Paolo Bonzini <pbonzini>, Laszlo Ersek <lersek>,
       	"Michael S. Tsirkin" <mst>
Subject: [PATCH] acpi: Use apic_id_limit when calculating legacy ACPI table size
Date: Fri, 12 Dec 2014 14:38:36 -0200
Message-Id: <1418402316-31738-1-git-send-email-ehabkost>

Comment 8 Jeff Nelson 2014-12-17 04:05:20 UTC
Fix included in qemu-kvm-rhev-2.1.2-17.el7

Comment 10 Qian Guo 2014-12-22 08:14:34 UTC
Reproduced this bug with qemu-kvm-rhev-2.1.2-16.el7.x86_64

Steps:
1.Boot guest with numa :
# /usr/libexec/qemu-kvm -nodefaults -serial stdio -machine pc-i440fx-rhel7.0.0,accel=kvm -m 2048 -smp 40,sockets=2,cores=10,threads=2 -numa node,nodeid=0,cpus=0-19 -numa node,nodeid=1,cpus=20-39 -cpu SandyBridge -k en-us -boot menu=on -monitor stdio -vnc :1 -drive file=rhel7.1.qcow2,if=none,id=drive-virtio-disk,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk,id=disk0,bootindex=1 -vga std  -boot menu=on -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=00:01:02:B6:40:27,bus=pci.0


Results:
1.qemu report 'qemu-kvm: Warning: migration may not work.'

2.From guest dmesg:
# dmesg | egrep -i 'numa|srat|acpi'
[    0.000000] ACPI: RSDP 00000000000f1ff0 00014 (v00 BOCHS )
[    0.000000] ACPI: \xfffffffd\xfffffffb?? 000000007ffff6d8 00000 (v208 \xffffffd0?A?\xffffffff\xffffffff \xffffffff\xffffffff\xffffffff\xffffffffCPU  6F420033  chs 04200400)
[    0.000000] ACPI BIOS Error (bug): Invalid table length 0x0 in RSDT/XSDT (20130517/tbutils-513)
[    0.000000] No NUMA configuration found
[    0.143141] ACPI: Interpreter disabled.
[    0.150480] pci 0000:00:01.3: quirk: [io  0x0600-0x063f] claimed by PIIX4 ACPI
[    0.189977] pnp: PnP ACPI: disabled
[    7.913138] ACPI Exception: AE_BAD_PARAMETER, Thread 918146976 could not acquire Mutex [0x1] (20130517/utmutex-285)


So this bug is reproduced

Verify this bug with qemu-kvm-rhev-2.1.2-17.el7.x86_64

Steps as above

Results:
No infos from qemu, and just work fine of dmesg of guest, will attach the dmesg logs about numa/srat/acpi.

Comment 11 Qian Guo 2014-12-22 08:15:22 UTC
Created attachment 971907 [details]
verified by the logs of dmesg in guest

Comment 13 errata-xmlrpc 2015-03-05 09:57:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html