Bug 1267533 - qemu quit when rebooting guest which hotplug memory >=13 times
qemu quit when rebooting guest which hotplug memory >=13 times
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.2
x86_64 Linux
high Severity urgent
: rc
: ---
Assigned To: Igor Mammedov
Virtualization Bugs
:
Depends On:
Blocks: 1264093
  Show dependency treegraph
 
Reported: 2015-09-30 06:48 EDT by Igor Mammedov
Modified: 2015-12-04 11:58 EST (History)
18 users (show)

See Also:
Fixed In Version: qemu-kvm-rhev-2.3.0-30.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1245864
Environment:
Last Closed: 2015-12-04 11:58:50 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 2 Igor Mammedov 2015-09-30 09:51:30 EDT
Reproducer:

qemu-kvm -m 128M,slots=20,maxmem=40G -numa node -drive if=virtio,file=rhel72.img -snapshot  `for i in $(seq 1 15); do echo "-object memory-backend-ram,id=d$i,size=10M -device pc-dimm,id=dm$i,memdev=d$i"; done` -nodefaults -snapshot -M rhel71-machine-types

if it boots do in guest:
 dd if=/dev/vda of=/dev/null bs=128M

during boot or during 'dd' QEMU should crash with:
  "virtio: error trying to map MMIO memory"

to verify workaround run the same command but with 7.2 machine types, no crash should happen.
Comment 3 Jeff Nelson 2015-10-12 13:57:42 EDT
Fix included in qemu-kvm-rhev-2.3.0-30.el7
Comment 4 Pei Zhang 2015-10-13 01:48:01 EDT
Summary: When booting guest with '-m 2G' and hotplugging several(1~256) memory devices, the guest can work well. But with '-m 128G' and hotpluging >=161 memory devices, the guest can not start up. Whether the guest work or not, errors like '[    0.237592] acpi PNP0C80:01: acpi_memory_enable_device() error' will show in console when booting the guest. So this bug may not be fixed.

Versions:
Host:
kernel:3.10.0-323.el7.x86_64
qemu-kvm-rhev:qemu-kvm-rhev-2.3.0-30.el7.x86_64

Guest:
kernel:3.10.0-323.el7.x86_64

Scenario 1: boot guest with '-m 2G'
Steps:
1. Boot guest with hotplug $num(tested with 15, 100, 255,256) memory devices
# /usr/libexec/qemu-kvm -name rhel7.2 -machine pc-i440fx-rhel7.2.0,accel=kvm \
-cpu SandyBridge -m 2G,slots=256,maxmem=40G -numa node \
-smp 4,sockets=2,cores=2,threads=1 \
-uuid 82b1a01e-5f6c-4f5f-8d27-3855a74e6b6b \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16 \
-spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on \
-monitor stdio \
-serial unix:/tmp/monitor,server,nowait \
-qmp tcp:0:5555,server,nowait \
-drive file=/home/rhel7.2.virtio.qcow2,format=qcow2,if=none,id=drive-virtio-blk-0,werror=stop,rerror=stop \
-device virtio-blk-pci,bus=pci.0,addr=0x8,drive=drive-virtio-blk-0,id=virtio-blk-0 \
-netdev tap,id=hostnet0,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=b6:ab:30:3a:bd:66 \
-snapshot \
-nodefaults \
`for i in $(seq 1 $num); do echo "-object memory-backend-ram,id=d$i,size=10M -device pc-dimm,id=dm$i,memdev=d$i"; done` \

2. After the guest start up, do dd
# dd if=/dev/vda of=/dev/null bs=128M

3. Reboot guest several times, the guest always works well.


Scenario 2:boot guest with '-m 128M'
Steps:Same steps(1~3) with Scenario 1.
Results:
$num result
100  work
150  work
155  work
158  work
159  work
160  work
161  fail
162  fail
175  fail
200  fail
255  fail
256  fail

For Scenario 1 and Scenario 2, no matter if the guest can boot well or not, errors like below always show in console. 
# nc -U /tmp/monitor
[    0.237592] acpi PNP0C80:01: acpi_memory_enable_device() error
[    0.238898] acpi PNP0C80:03: acpi_memory_enable_device() error
[    0.240032] acpi PNP0C80:05: acpi_memory_enable_device() error
[    0.241174] acpi PNP0C80:07: acpi_memory_enable_device() error
[    0.242264] acpi PNP0C80:09: acpi_memory_enable_device() error
[    0.244168] acpi PNP0C80:0c: acpi_memory_enable_device() error
[    0.245223] acpi PNP0C80:0e: acpi_memory_enable_device() error
[    0.246269] acpi PNP0C80:10: acpi_memory_enable_device() error
[    0.247266] acpi PNP0C80:12: acpi_memory_enable_device() error
[    0.248300] acpi PNP0C80:14: acpi_memory_enable_device() error
[    0.250202] acpi PNP0C80:17: acpi_memory_enable_device() error
[    0.251231] acpi PNP0C80:19: acpi_memory_enable_device() error
[    0.252323] acpi PNP0C80:1b: acpi_memory_enable_device() error
[    0.253438] acpi PNP0C80:1d: acpi_memory_enable_device() error
[    0.254490] acpi PNP0C80:1f: acpi_memory_enable_device() error
......
Comment 5 Pei Zhang 2015-10-13 01:51:52 EDT
More info when the guest fails:
1. qemu still works well
(qemu) info status
VM status: running

2. full console message
# nc -U /tmp/monitor
[    0.240049] acpi PNP0C80:01: acpi_memory_enable_device() error
[    0.241267] acpi PNP0C80:03: acpi_memory_enable_device() error
[    0.242278] acpi PNP0C80:05: acpi_memory_enable_device() error
[    0.243315] acpi PNP0C80:07: acpi_memory_enable_device() error
[    0.244298] acpi PNP0C80:09: acpi_memory_enable_device() error
[    0.246136] acpi PNP0C80:0c: acpi_memory_enable_device() error
[    0.247177] acpi PNP0C80:0e: acpi_memory_enable_device() error
[    0.248163] acpi PNP0C80:10: acpi_memory_enable_device() error
[    0.249144] acpi PNP0C80:12: acpi_memory_enable_device() error
[    0.250476] acpi PNP0C80:14: acpi_memory_enable_device() error
[    0.252447] acpi PNP0C80:17: acpi_memory_enable_device() error
[    0.253476] acpi PNP0C80:19: acpi_memory_enable_device() error
[    0.254523] acpi PNP0C80:1b: acpi_memory_enable_device() error
[    0.255593] acpi PNP0C80:1d: acpi_memory_enable_device() error
[    0.256715] acpi PNP0C80:1f: acpi_memory_enable_device() error
[    0.258514] acpi PNP0C80:21: acpi_memory_enable_device() error
[    0.259569] acpi PNP0C80:23: acpi_memory_enable_device() error
[    0.260654] acpi PNP0C80:25: acpi_memory_enable_device() error
[    0.261674] acpi PNP0C80:27: acpi_memory_enable_device() error
[    0.262692] acpi PNP0C80:29: acpi_memory_enable_device() error
[    0.264634] acpi PNP0C80:2c: acpi_memory_enable_device() error
[    0.265717] acpi PNP0C80:2e: acpi_memory_enable_device() error
[    0.266774] acpi PNP0C80:30: acpi_memory_enable_device() error
[    0.267857] acpi PNP0C80:32: acpi_memory_enable_device() error
[    0.268957] acpi PNP0C80:34: acpi_memory_enable_device() error
[    0.271312] acpi PNP0C80:37: acpi_memory_enable_device() error
[    0.272795] acpi PNP0C80:39: acpi_memory_enable_device() error
[    0.274219] acpi PNP0C80:3b: acpi_memory_enable_device() error
[    0.275601] acpi PNP0C80:3d: acpi_memory_enable_device() error
[    0.277047] acpi PNP0C80:3f: acpi_memory_enable_device() error
[    0.279181] acpi PNP0C80:41: acpi_memory_enable_device() error
[    0.281348] acpi PNP0C80:43: acpi_memory_enable_device() error
[    0.282750] acpi PNP0C80:45: acpi_memory_enable_device() error
[    0.284246] acpi PNP0C80:47: acpi_memory_enable_device() error
[    0.285642] acpi PNP0C80:49: acpi_memory_enable_device() error
[    0.288152] acpi PNP0C80:4c: acpi_memory_enable_device() error
[    0.289593] acpi PNP0C80:4e: acpi_memory_enable_device() error
[    0.291091] acpi PNP0C80:50: acpi_memory_enable_device() error
[    0.292578] acpi PNP0C80:52: acpi_memory_enable_device() error
[    0.294012] acpi PNP0C80:54: acpi_memory_enable_device() error
[    0.296405] acpi PNP0C80:57: acpi_memory_enable_device() error
[    0.297832] acpi PNP0C80:59: acpi_memory_enable_device() error
[    0.299480] acpi PNP0C80:5b: acpi_memory_enable_device() error
[    0.300946] acpi PNP0C80:5d: acpi_memory_enable_device() error
[    0.302393] acpi PNP0C80:5f: acpi_memory_enable_device() error
[    0.304597] acpi PNP0C80:61: acpi_memory_enable_device() error
[    0.306038] acpi PNP0C80:63: acpi_memory_enable_device() error
[    0.307522] acpi PNP0C80:65: acpi_memory_enable_device() error
[    0.309010] acpi PNP0C80:67: acpi_memory_enable_device() error
[    0.310789] acpi PNP0C80:69: acpi_memory_enable_device() error
[    0.313248] acpi PNP0C80:6c: acpi_memory_enable_device() error
[    0.314688] acpi PNP0C80:6e: acpi_memory_enable_device() error
[    0.316122] acpi PNP0C80:70: acpi_memory_enable_device() error
[    0.317580] acpi PNP0C80:72: acpi_memory_enable_device() error
[    0.319068] acpi PNP0C80:74: acpi_memory_enable_device() error
[    0.321452] acpi PNP0C80:77: acpi_memory_enable_device() error
[    0.322946] acpi PNP0C80:79: acpi_memory_enable_device() error
[    0.324340] acpi PNP0C80:7b: acpi_memory_enable_device() error
[    0.325809] acpi PNP0C80:7d: acpi_memory_enable_device() error
[    0.327200] acpi PNP0C80:7f: acpi_memory_enable_device() error
[    0.329264] acpi PNP0C80:81: acpi_memory_enable_device() error
[    0.330646] acpi PNP0C80:83: acpi_memory_enable_device() error
[    0.332104] acpi PNP0C80:85: acpi_memory_enable_device() error
[    0.333505] acpi PNP0C80:87: acpi_memory_enable_device() error
[    0.334930] acpi PNP0C80:89: acpi_memory_enable_device() error
[    0.337178] acpi PNP0C80:8c: acpi_memory_enable_device() error
[    0.338597] acpi PNP0C80:8e: acpi_memory_enable_device() error
[    0.340628] acpi PNP0C80:90: acpi_memory_enable_device() error
[    0.342184] acpi PNP0C80:92: acpi_memory_enable_device() error
[    0.343686] acpi PNP0C80:94: acpi_memory_enable_device() error
[    0.346124] acpi PNP0C80:97: acpi_memory_enable_device() error
[    0.347575] acpi PNP0C80:99: acpi_memory_enable_device() error
[    0.349043] acpi PNP0C80:9b: acpi_memory_enable_device() error
[    0.350514] acpi PNP0C80:9d: acpi_memory_enable_device() error
[    0.351989] acpi PNP0C80:9f: acpi_memory_enable_device() error
[  240.602730] INFO: task kworker/2:0:27 blocked for more than 120 seconds.
[  240.603865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.605408] INFO: task kworker/2:1:114 blocked for more than 120 seconds.
[  240.606447] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.609048] INFO: task systemd-udevd:263 blocked for more than 120 seconds.
[  240.610126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.611659] INFO: task systemd-udevd:264 blocked for more than 120 seconds.
[  240.612730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.613730] INFO: task kworker/2:0:27 blocked for more than 120 seconds.
[  360.614847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.616399] INFO: task kworker/2:1:114 blocked for more than 120 seconds.
[  360.617435] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.618993] INFO: task systemd-udevd:263 blocked for more than 120 seconds.
[  360.620059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.621573] INFO: task systemd-udevd:264 blocked for more than 120 seconds.
[  360.622674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  480.623731] INFO: task kworker/2:0:27 blocked for more than 120 seconds.
[  480.624873] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  480.626439] INFO: task kworker/2:1:114 blocked for more than 120 seconds.
[  480.627490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Comment 7 juzhang 2015-10-13 04:07:20 EDT
Hi Igor,

According to comment4 and comment5, seems this issue failed qa. Could you add comment?

Best Regards,
Junyi
Comment 8 Igor Mammedov 2015-10-13 05:13:59 EDT
(In reply to juzhang from comment #7)
> Hi Igor,
> 
> According to comment4 and comment5, seems this issue failed qa. Could you
> add comment?
You're seeing error messages because guest kernel supports minimum 128M memory
blocks while this bz tries to hotplug 10M blocks. So this error messages are expected.
Goal of this BZ is to fix QEMU crash and it looks like comment 4 confirms that it's fixed.

As for Scenario 2 from comment 4, I don't think that 128M RAM is supported by RHEL7 (I think that supported minimum is 512M). And since guest consumes a little bit of memory per DIMM device, 128M might be not enough to support more than 160 DIMM modules.

> 
> Best Regards,
> Junyi
Comment 9 juzhang 2015-10-13 05:34:50 EDT
Thanks Igor for the detailed explanation. We agreed.
Comment 10 juzhang 2015-10-13 05:35:54 EDT
According to comment4, 5, 8 and comment9, set this issue as verified.
Comment 12 errata-xmlrpc 2015-12-04 11:58:50 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html

Note You need to log in before you can comment on or make changes to this bug.