Bug 1480112 - Unable to schedule i386 arch - qemu killed with SIGABRT
Unable to schedule i386 arch - qemu killed with SIGABRT
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Eduardo Habkost
Chao Yang
Depends On:
  Show dependency treegraph
Reported: 2017-08-10 04:03 EDT by Robin Hack
Modified: 2017-09-19 16:36 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-09-19 16:36:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
"info proc all" output from gdb (36.76 KB, text/plain)
2017-08-15 08:04 EDT, Eduardo Habkost
no flags Details
'maps' file from abrt report (48.87 KB, text/plain)
2017-08-18 18:21 EDT, Eduardo Habkost
no flags Details

  None (edit)
Description Robin Hack 2017-08-10 04:03:11 EDT
Description of problem:
After upgrade to rhel7.4, I'm not able to schedule machines with i386 arch. qemu process is always killed by SIGTRAP.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. I have set <cpu mode='host-passthrough'/>
2. schedule i386 machine with beaker

Actual results:
qemu is killed by SIGTRAP

Expected results:
Running i386 machine

Additional info:
If you need, I can provide access to my hypervisors, so you can debug or investigate it on-the-fly.

More info from abrt:
id 1c028ed58dacd879828a24ecbe3730c00b8cc45d
reason:         qemu-kvm killed by SIGABRT
time:           Wed 09 Aug 2017 07:48:51 PM CEST
cmdline:        /usr/libexec/qemu-kvm -name guest=sheep-22,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-154-sheep-22/master-key.aes -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off,dump-guest-core=off,mem-merge=off -cpu host -m 2862 -mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/154-sheep-22 -realtime mlock=on -smp 1,sockets=1,cores=1,threads=1 -uuid 15575182-6ad4-4ad3-8beb-315138d3086b -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-154-sheep-22/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/dev/virtual_machines/sheep-22,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=da:36:d0:6e:50:16,bus=pci.0,addr=0x3,bootindex=1 -chardev socket,id=charserial0,host=,port=52447 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5907,addr=,disable-ticketing,seamless-migration=on -vnc -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on
package:        qemu-kvm-rhev-2.9.0-16.el7_4.3
uid:            107 (qemu)
count:          1
Directory:      /var/spool/abrt/ccpp-2017-08-09-19:48:51-13414
Run 'abrt-cli report /var/spool/abrt/ccpp-2017-08-09-19:48:51-13414' for creating a case in Red Hat Customer Portal

The Autoreporting feature is disabled. Please consider enabling it by issuing
'abrt-auto-reporting enabled' as a user with root privileges
Comment 2 Robin Hack 2017-08-10 04:05:04 EDT
Update: it's SIGABRT not SIGTRAP. Sorry. My fault.
Comment 8 Eduardo Habkost 2017-08-15 08:00:02 EDT
For reference, this is where QEMU crashed:

(gdb) bt
#0  0x00007f11004881f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f11004898e8 in __GI_abort () at abort.c:90
#2  0x0000558a3000f2c7 in qemu_alloc_stack (sz=sz@entry=0x558a33af47d8) at util/oslib-posix.c:644
#3  0x0000558a300204f4 in qemu_coroutine_new () at util/coroutine-ucontext.c:105
#4  0x0000558a3001f686 in qemu_coroutine_create (entry=entry@entry=0x558a2ff8f420 <blk_aio_read_entry>, opaque=opaque@entry=0x558a34028e40) at util/qemu-coroutine.c:76
#5  0x0000558a2ff8fc4c in blk_aio_prwv (blk=0x558a32a70000, offset=425984, bytes=4096, qiov=0x558a32a6cab0, co_entry=co_entry@entry=0x558a2ff8f420 <blk_aio_read_entry>, flags=
    0, cb=0x558a2fd67b10 <virtio_blk_rw_complete>, opaque=0x558a32a6ca50) at block/block-backend.c:1157
#6  0x0000558a2ff8fd25 in blk_aio_preadv (blk=<optimized out>, offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>, cb=<optimized out>, opaque=<optimized out>) at block/block-backend.c:1250
#7  0x0000558a2fd68bac in virtio_blk_submit_multireq (niov=<optimized out>, num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>, blk=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/block/virtio-blk.c:372
#8  0x0000558a2fd68bac in virtio_blk_submit_multireq (blk=0x558a32a70000, mrb=mrb@entry=0x7ffeafed2b80) at /usr/src/debug/qemu-2.9.0/hw/block/virtio-blk.c:402
#9  0x0000558a2fd69644 in virtio_blk_handle_vq (s=0x558a3458e510, vq=0x558a34614000) at /usr/src/debug/qemu-2.9.0/hw/block/virtio-blk.c:620
#10 0x0000558a3000cc68 in aio_dispatch_handlers (ctx=ctx@entry=0x558a32a5f700) at util/aio-posix.c:399
#11 0x0000558a3000d4e8 in aio_dispatch (ctx=0x558a32a5f700) at util/aio-posix.c:430
#12 0x0000558a3000a6ae in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
#13 0x00007f1101f3d4c9 in g_main_context_dispatch (context=0x558a32abea50) at gmain.c:3201
#14 0x00007f1101f3d4c9 in g_main_context_dispatch (context=context@entry=0x558a32abea50) at gmain.c:3854
#15 0x0000558a3000c79c in main_loop_wait () at util/main-loop.c:213
#16 0x0000558a3000c79c in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#17 0x0000558a3000c79c in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:517
#18 0x0000558a2fcfdf6c in main () at vl.c:1909
#19 0x0000558a2fcfdf6c in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4733
(gdb) l
639         *sz += pagesz;
641         ptr = mmap(NULL, *sz, PROT_READ | PROT_WRITE,
642                    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
643         if (ptr == MAP_FAILED) {
644             abort();
645         }
647     #if defined(HOST_IA64)
648         /* separate register stack */
Comment 9 Eduardo Habkost 2017-08-15 08:04 EDT
Created attachment 1313609 [details]
"info proc all" output from gdb
Comment 10 Eduardo Habkost 2017-08-18 18:20:12 EDT
Additional data:

qemu_alloc_stack() is trying to allocate a little more than 1MB (1MB + 4k):

#2  0x0000558a3000f2c7 in qemu_alloc_stack (sz=sz@entry=0x558a33af47d8) at util/oslib-posix.c:644
644             abort();
(gdb) p *sz
$1 = 1052672

mmap() returned -ENOMEM:

(gdb) p errno
$2 = 12

By looking at the segments in the core dump, I don't see why mmap() would return -ENOMEM: there are only 3.6GB of mapped areas on the core dump, and lots of unmapped areas available.  Some help from people on the kernel side to figure out what could be causing mmap() to fail would be welcome.
Comment 11 Eduardo Habkost 2017-08-18 18:21 EDT
Created attachment 1315407 [details]
'maps' file from abrt report
Comment 12 Eduardo Habkost 2017-08-18 19:16:31 EDT
Additional information that would be helpful:

* Testing with the 7.3 kernel, so we can exclude the possibility of a 7.4 kernel regression.
* Access to the live system where the problem is happening.
Comment 13 Andrea Arcangeli 2017-08-20 06:39:13 EDT
Does it still fail if you run:

echo 1 >/proc/sys/vm/overcommit_memory
Comment 14 Robin Hack 2017-08-21 03:24:55 EDT

[root@pesng-02 ~]# cat /proc/sys/vm/overcommit_memory
Comment 17 Eduardo Habkost 2017-08-24 17:42:11 EDT
Setting needinfo for overcommit_memory=1 results.

Note You need to log in before you can comment on or make changes to this bug.