Bug 1267533 - qemu quit when rebooting guest which hotplug memory >=13 times
Summary: qemu quit when rebooting guest which hotplug memory >=13 times
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.2
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Igor Mammedov
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1264093
TreeView+ depends on / blocked
 
Reported: 2015-09-30 10:48 UTC by Igor Mammedov
Modified: 2015-12-04 16:58 UTC (History)
18 users (show)

Fixed In Version: qemu-kvm-rhev-2.3.0-30.el7
Doc Type: Bug Fix
Doc Text:
Clone Of: 1245864
Environment:
Last Closed: 2015-12-04 16:58:50 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2546 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2015-12-04 21:11:56 UTC

Comment 2 Igor Mammedov 2015-09-30 13:51:30 UTC
Reproducer:

qemu-kvm -m 128M,slots=20,maxmem=40G -numa node -drive if=virtio,file=rhel72.img -snapshot  `for i in $(seq 1 15); do echo "-object memory-backend-ram,id=d$i,size=10M -device pc-dimm,id=dm$i,memdev=d$i"; done` -nodefaults -snapshot -M rhel71-machine-types

if it boots do in guest:
 dd if=/dev/vda of=/dev/null bs=128M

during boot or during 'dd' QEMU should crash with:
  "virtio: error trying to map MMIO memory"

to verify workaround run the same command but with 7.2 machine types, no crash should happen.

Comment 3 Jeff Nelson 2015-10-12 17:57:42 UTC
Fix included in qemu-kvm-rhev-2.3.0-30.el7

Comment 4 Pei Zhang 2015-10-13 05:48:01 UTC
Summary: When booting guest with '-m 2G' and hotplugging several(1~256) memory devices, the guest can work well. But with '-m 128G' and hotpluging >=161 memory devices, the guest can not start up. Whether the guest work or not, errors like '[    0.237592] acpi PNP0C80:01: acpi_memory_enable_device() error' will show in console when booting the guest. So this bug may not be fixed.

Versions:
Host:
kernel:3.10.0-323.el7.x86_64
qemu-kvm-rhev:qemu-kvm-rhev-2.3.0-30.el7.x86_64

Guest:
kernel:3.10.0-323.el7.x86_64

Scenario 1: boot guest with '-m 2G'
Steps:
1. Boot guest with hotplug $num(tested with 15, 100, 255,256) memory devices
# /usr/libexec/qemu-kvm -name rhel7.2 -machine pc-i440fx-rhel7.2.0,accel=kvm \
-cpu SandyBridge -m 2G,slots=256,maxmem=40G -numa node \
-smp 4,sockets=2,cores=2,threads=1 \
-uuid 82b1a01e-5f6c-4f5f-8d27-3855a74e6b6b \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16 \
-spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on \
-monitor stdio \
-serial unix:/tmp/monitor,server,nowait \
-qmp tcp:0:5555,server,nowait \
-drive file=/home/rhel7.2.virtio.qcow2,format=qcow2,if=none,id=drive-virtio-blk-0,werror=stop,rerror=stop \
-device virtio-blk-pci,bus=pci.0,addr=0x8,drive=drive-virtio-blk-0,id=virtio-blk-0 \
-netdev tap,id=hostnet0,script=/etc/ovs-ifup,downscript=/etc/ovs-ifdown \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=b6:ab:30:3a:bd:66 \
-snapshot \
-nodefaults \
`for i in $(seq 1 $num); do echo "-object memory-backend-ram,id=d$i,size=10M -device pc-dimm,id=dm$i,memdev=d$i"; done` \

2. After the guest start up, do dd
# dd if=/dev/vda of=/dev/null bs=128M

3. Reboot guest several times, the guest always works well.


Scenario 2:boot guest with '-m 128M'
Steps:Same steps(1~3) with Scenario 1.
Results:
$num result
100  work
150  work
155  work
158  work
159  work
160  work
161  fail
162  fail
175  fail
200  fail
255  fail
256  fail

For Scenario 1 and Scenario 2, no matter if the guest can boot well or not, errors like below always show in console. 
# nc -U /tmp/monitor
[    0.237592] acpi PNP0C80:01: acpi_memory_enable_device() error
[    0.238898] acpi PNP0C80:03: acpi_memory_enable_device() error
[    0.240032] acpi PNP0C80:05: acpi_memory_enable_device() error
[    0.241174] acpi PNP0C80:07: acpi_memory_enable_device() error
[    0.242264] acpi PNP0C80:09: acpi_memory_enable_device() error
[    0.244168] acpi PNP0C80:0c: acpi_memory_enable_device() error
[    0.245223] acpi PNP0C80:0e: acpi_memory_enable_device() error
[    0.246269] acpi PNP0C80:10: acpi_memory_enable_device() error
[    0.247266] acpi PNP0C80:12: acpi_memory_enable_device() error
[    0.248300] acpi PNP0C80:14: acpi_memory_enable_device() error
[    0.250202] acpi PNP0C80:17: acpi_memory_enable_device() error
[    0.251231] acpi PNP0C80:19: acpi_memory_enable_device() error
[    0.252323] acpi PNP0C80:1b: acpi_memory_enable_device() error
[    0.253438] acpi PNP0C80:1d: acpi_memory_enable_device() error
[    0.254490] acpi PNP0C80:1f: acpi_memory_enable_device() error
......

Comment 5 Pei Zhang 2015-10-13 05:51:52 UTC
More info when the guest fails:
1. qemu still works well
(qemu) info status
VM status: running

2. full console message
# nc -U /tmp/monitor
[    0.240049] acpi PNP0C80:01: acpi_memory_enable_device() error
[    0.241267] acpi PNP0C80:03: acpi_memory_enable_device() error
[    0.242278] acpi PNP0C80:05: acpi_memory_enable_device() error
[    0.243315] acpi PNP0C80:07: acpi_memory_enable_device() error
[    0.244298] acpi PNP0C80:09: acpi_memory_enable_device() error
[    0.246136] acpi PNP0C80:0c: acpi_memory_enable_device() error
[    0.247177] acpi PNP0C80:0e: acpi_memory_enable_device() error
[    0.248163] acpi PNP0C80:10: acpi_memory_enable_device() error
[    0.249144] acpi PNP0C80:12: acpi_memory_enable_device() error
[    0.250476] acpi PNP0C80:14: acpi_memory_enable_device() error
[    0.252447] acpi PNP0C80:17: acpi_memory_enable_device() error
[    0.253476] acpi PNP0C80:19: acpi_memory_enable_device() error
[    0.254523] acpi PNP0C80:1b: acpi_memory_enable_device() error
[    0.255593] acpi PNP0C80:1d: acpi_memory_enable_device() error
[    0.256715] acpi PNP0C80:1f: acpi_memory_enable_device() error
[    0.258514] acpi PNP0C80:21: acpi_memory_enable_device() error
[    0.259569] acpi PNP0C80:23: acpi_memory_enable_device() error
[    0.260654] acpi PNP0C80:25: acpi_memory_enable_device() error
[    0.261674] acpi PNP0C80:27: acpi_memory_enable_device() error
[    0.262692] acpi PNP0C80:29: acpi_memory_enable_device() error
[    0.264634] acpi PNP0C80:2c: acpi_memory_enable_device() error
[    0.265717] acpi PNP0C80:2e: acpi_memory_enable_device() error
[    0.266774] acpi PNP0C80:30: acpi_memory_enable_device() error
[    0.267857] acpi PNP0C80:32: acpi_memory_enable_device() error
[    0.268957] acpi PNP0C80:34: acpi_memory_enable_device() error
[    0.271312] acpi PNP0C80:37: acpi_memory_enable_device() error
[    0.272795] acpi PNP0C80:39: acpi_memory_enable_device() error
[    0.274219] acpi PNP0C80:3b: acpi_memory_enable_device() error
[    0.275601] acpi PNP0C80:3d: acpi_memory_enable_device() error
[    0.277047] acpi PNP0C80:3f: acpi_memory_enable_device() error
[    0.279181] acpi PNP0C80:41: acpi_memory_enable_device() error
[    0.281348] acpi PNP0C80:43: acpi_memory_enable_device() error
[    0.282750] acpi PNP0C80:45: acpi_memory_enable_device() error
[    0.284246] acpi PNP0C80:47: acpi_memory_enable_device() error
[    0.285642] acpi PNP0C80:49: acpi_memory_enable_device() error
[    0.288152] acpi PNP0C80:4c: acpi_memory_enable_device() error
[    0.289593] acpi PNP0C80:4e: acpi_memory_enable_device() error
[    0.291091] acpi PNP0C80:50: acpi_memory_enable_device() error
[    0.292578] acpi PNP0C80:52: acpi_memory_enable_device() error
[    0.294012] acpi PNP0C80:54: acpi_memory_enable_device() error
[    0.296405] acpi PNP0C80:57: acpi_memory_enable_device() error
[    0.297832] acpi PNP0C80:59: acpi_memory_enable_device() error
[    0.299480] acpi PNP0C80:5b: acpi_memory_enable_device() error
[    0.300946] acpi PNP0C80:5d: acpi_memory_enable_device() error
[    0.302393] acpi PNP0C80:5f: acpi_memory_enable_device() error
[    0.304597] acpi PNP0C80:61: acpi_memory_enable_device() error
[    0.306038] acpi PNP0C80:63: acpi_memory_enable_device() error
[    0.307522] acpi PNP0C80:65: acpi_memory_enable_device() error
[    0.309010] acpi PNP0C80:67: acpi_memory_enable_device() error
[    0.310789] acpi PNP0C80:69: acpi_memory_enable_device() error
[    0.313248] acpi PNP0C80:6c: acpi_memory_enable_device() error
[    0.314688] acpi PNP0C80:6e: acpi_memory_enable_device() error
[    0.316122] acpi PNP0C80:70: acpi_memory_enable_device() error
[    0.317580] acpi PNP0C80:72: acpi_memory_enable_device() error
[    0.319068] acpi PNP0C80:74: acpi_memory_enable_device() error
[    0.321452] acpi PNP0C80:77: acpi_memory_enable_device() error
[    0.322946] acpi PNP0C80:79: acpi_memory_enable_device() error
[    0.324340] acpi PNP0C80:7b: acpi_memory_enable_device() error
[    0.325809] acpi PNP0C80:7d: acpi_memory_enable_device() error
[    0.327200] acpi PNP0C80:7f: acpi_memory_enable_device() error
[    0.329264] acpi PNP0C80:81: acpi_memory_enable_device() error
[    0.330646] acpi PNP0C80:83: acpi_memory_enable_device() error
[    0.332104] acpi PNP0C80:85: acpi_memory_enable_device() error
[    0.333505] acpi PNP0C80:87: acpi_memory_enable_device() error
[    0.334930] acpi PNP0C80:89: acpi_memory_enable_device() error
[    0.337178] acpi PNP0C80:8c: acpi_memory_enable_device() error
[    0.338597] acpi PNP0C80:8e: acpi_memory_enable_device() error
[    0.340628] acpi PNP0C80:90: acpi_memory_enable_device() error
[    0.342184] acpi PNP0C80:92: acpi_memory_enable_device() error
[    0.343686] acpi PNP0C80:94: acpi_memory_enable_device() error
[    0.346124] acpi PNP0C80:97: acpi_memory_enable_device() error
[    0.347575] acpi PNP0C80:99: acpi_memory_enable_device() error
[    0.349043] acpi PNP0C80:9b: acpi_memory_enable_device() error
[    0.350514] acpi PNP0C80:9d: acpi_memory_enable_device() error
[    0.351989] acpi PNP0C80:9f: acpi_memory_enable_device() error
[  240.602730] INFO: task kworker/2:0:27 blocked for more than 120 seconds.
[  240.603865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.605408] INFO: task kworker/2:1:114 blocked for more than 120 seconds.
[  240.606447] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.609048] INFO: task systemd-udevd:263 blocked for more than 120 seconds.
[  240.610126] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.611659] INFO: task systemd-udevd:264 blocked for more than 120 seconds.
[  240.612730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.613730] INFO: task kworker/2:0:27 blocked for more than 120 seconds.
[  360.614847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.616399] INFO: task kworker/2:1:114 blocked for more than 120 seconds.
[  360.617435] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.618993] INFO: task systemd-udevd:263 blocked for more than 120 seconds.
[  360.620059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.621573] INFO: task systemd-udevd:264 blocked for more than 120 seconds.
[  360.622674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  480.623731] INFO: task kworker/2:0:27 blocked for more than 120 seconds.
[  480.624873] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  480.626439] INFO: task kworker/2:1:114 blocked for more than 120 seconds.
[  480.627490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Comment 7 juzhang 2015-10-13 08:07:20 UTC
Hi Igor,

According to comment4 and comment5, seems this issue failed qa. Could you add comment?

Best Regards,
Junyi

Comment 8 Igor Mammedov 2015-10-13 09:13:59 UTC
(In reply to juzhang from comment #7)
> Hi Igor,
> 
> According to comment4 and comment5, seems this issue failed qa. Could you
> add comment?
You're seeing error messages because guest kernel supports minimum 128M memory
blocks while this bz tries to hotplug 10M blocks. So this error messages are expected.
Goal of this BZ is to fix QEMU crash and it looks like comment 4 confirms that it's fixed.

As for Scenario 2 from comment 4, I don't think that 128M RAM is supported by RHEL7 (I think that supported minimum is 512M). And since guest consumes a little bit of memory per DIMM device, 128M might be not enough to support more than 160 DIMM modules.

> 
> Best Regards,
> Junyi

Comment 9 juzhang 2015-10-13 09:34:50 UTC
Thanks Igor for the detailed explanation. We agreed.

Comment 10 juzhang 2015-10-13 09:35:54 UTC
According to comment4, 5, 8 and comment9, set this issue as verified.

Comment 12 errata-xmlrpc 2015-12-04 16:58:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2546.html


Note You need to log in before you can comment on or make changes to this bug.