Bug 1820279

Summary: RHEL 6 KVM VMs hang and coredumps
Product: Red Hat Enterprise Linux 6 Reporter: dyuen
Component: qemu-kvmAssignee: Amnon Ilan <ailan>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.10CC: ehabkost, jen, jinzhao, juzhang, knoel, mkenneth, pbonzini, tvainio, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-25 11:50:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dyuen 2020-04-02 16:26:54 UTC
Description of problem:
VMs on a host are seen to hang, and the host is generating segfaults for QEMU-KVM.


Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 6.10 (Santiago) - 2.6.32-754.18.2.el6.x86_64

gpxe-roms-qemu-0.9.7-6.16.el6.noarch                        Tue Jun 18 19:20:53 2019
qemu-img-0.12.1.2-2.506.el6_10.4.x86_64                     Tue Oct 22 06:45:19 2019
qemu-kvm-0.12.1.2-2.506.el6_10.4.x86_64                     Tue Oct 22 06:45:32 2019

How reproducible:
Base on cust provided output.  VMs are generating.
# ll -ha /ericsson/enm/dumps
total 24G
drwxrwxr-x. 17     308 jboss   8.0K Apr  1 15:35 .
drwxr-xr-x.  4 root    root    4.0K Jun 18  2019 ..
-rw-------.  1 root    root    240K Apr  1 12:35 core.dlgenmsvc3.date.pid23177.usr0.sig11.tim1585724702
-rw-------.  1 root    root    240K Apr  1 00:35 core.dlgenmsvc3.date.pid34048.usr0.sig11.tim1585681503
-rw-------.  1 root    root    362M Mar 31 02:14 core.dlgenmsvc3.java.pid28119.usr0.sig6.tim1585601042
-rw-------.  1 root    root    361M Mar 31 09:44 core.dlgenmsvc3.java.pid37069.usr0.sig6.tim1585628043
-rw-------.  1 root    root    135M Apr  1 10:33 core.svc-3-cmserv.java.pid26347.usr0.sig6.tim1585717382
-rw-------.  1 root    root    121M Mar 30 23:09 core.svc-3-cmserv.java.pid9985.usr0.sig6.tim1585589970
-rw-------.  1 root    root     59M Apr  1 10:09 core.svc-3-comecimpolicy.java.pid2717.usr0.sig6.tim1585715955
-rw-------.  1 root    root     76M Mar 30 23:10 core.svc-3-eventbasedclient.java.pid15862.usr0.sig6.tim1585590030
-rw-------.  1 root    root     43M Apr  1 01:05 core.svc-3-eventbasedclient.jps.pid24737.usr0.sig6.tim1585683303
-rw-------.  1 root    root     85M Apr  1 02:57 core.svc-3-flowautomation.java.pid10998.usr0.sig6.tim1585690042
-rw-------.  1 root    root     87M Mar 31 06:43 core.svc-3-flowautomation.java.pid23073.usr0.sig6.tim1585617233
-rw-------.  1 root    root    191M Mar 31 02:37 core.svc-3-fmalarmprocessing.java.pid30555.usr0.sig6.tim1585602420
-rw-------.  1 root    root    175M Mar 31 09:36 core.svc-3-fmhistory.java.pid19464.usr0.sig6.tim1585627562
-rw-------.  1 root    root    151M Mar 30 23:49 core.svc-3-fmhistory.java.pid23896.usr0.sig6.tim1585592342
-rw-------.  1 root    root     81M Mar 30 21:06 core.svc-3-fmhistory.java.pid28357.usr0.sig6.tim1585582561
-rw-------.  1 root    root    127M Apr  1 09:05 core.svc-3-fmhistory.jps.pid31902.usr0.sig6.tim1585712103
-rw-------.  1 root    root    131M Mar 31 07:53 core.svc-3-fmx.java.pid11702.usr0.sig6.tim1585621382
-rw-------.  1 root    root    371M Mar 31 17:33 core.svc-3-fmx.java.pid13922.usr0.sig6.tim1585656183
-rw-------.  1 root    root    236M Mar 31 11:35 core.svc-3-fmx.java.pid16759.usr0.sig6.tim1585634703
-rw-------.  1 root    root    209M Mar 31 06:12 core.svc-3-fmx.java.pid18052.usr0.sig6.tim1585615322
-rw-------.  1 root    root    139M Apr  1 05:43 core.svc-3-fmx.java.pid18303.usr0.sig6.tim1585699982
-rw-------.  1 root    root    225M Mar 31 23:01 core.svc-3-fmx.java.pid19208.usr0.sig6.tim1585675861
-rw-------.  1 root    root    411M Mar 31 05:59 core.svc-3-fmx.java.pid20005.usr0.sig6.tim1585614542
-rw-------.  1 root    root    210M Mar 31 17:43 core.svc-3-fmx.java.pid22596.usr0.sig6.tim1585656781
-rw-------.  1 root    root    125M Mar 31 19:56 core.svc-3-fmx.java.pid24257.usr0.sig6.tim1585664762
-rw-------.  1 root    root    115M Mar 31 07:37 core.svc-3-fmx.java.pid24384.usr0.sig6.tim1585620421
-rw-------.  1 root    root    320M Mar 31 09:19 core.svc-3-fmx.java.pid25951.usr0.sig6.tim1585626542
-rw-------.  1 root    root    235M Apr  1 07:35 core.svc-3-fmx.java.pid28354.usr0.sig6.tim1585706703
-rw-------.  1 root    root    226M Apr  1 02:08 core.svc-3-fmx.java.pid29728.usr0.sig6.tim1585687082
-rw-------.  1 root    root    208M Mar 30 21:36 core.svc-3-fmx.java.pid30704.usr0.sig6.tim1585584362
-rw-------.  1 root    root    211M Mar 31 13:01 core.svc-3-fmx.java.pid31247.usr0.sig6.tim1585639862
-rw-------.  1 root    root    119M Apr  1 12:10 core.svc-3-fmx.java.pid4138.usr0.sig6.tim1585723202
-rw-------.  1 root    root    221M Mar 30 22:30 core.svc-3-fmx.java.pid4239.usr0.sig6.tim1585587602
-rw-------.  1 root    root    408M Apr  1 09:31 core.svc-3-fmx.java.pid4440.usr0.sig6.tim1585713663
-rw-------.  1 root    root    212M Mar 31 04:55 core.svc-3-fmx.java.pid5238.usr0.sig6.tim1585610702
-rw-------.  1 root    root    226M Mar 31 04:11 core.svc-3-fmx.java.pid7340.usr0.sig6.tim1585608062
-rw-------.  1 root    root    238M Apr  1 00:50 core.svc-3-fmx.java.pid8815.usr0.sig6.tim1585682404
-rw-------.  1 root    root    173M Mar 30 23:46 core.svc-3-httpd.java.pid24623.usr0.sig6.tim1585592163
-rw-------.  1 root    root    184M Mar 31 15:46 core.svc-3-httpd.java.pid30252.usr0.sig6.tim1585649764
-rw-------.  1 root    root    151M Mar 31 16:51 core.svc-3-impexpserv.java.pid19998.usr0.sig6.tim1585653717


Steps to Reproduce:
1.
2.
3.

Actual results:
Host generating segfaults for QEMU-KVM

Mar 15 01:02:00 dlgenmsvc3 kernel: python[62788]: segfault at 46e49b6c ip 00007f238d700018 sp 00007fff73db3810 error 6 in libpython2.6.so.1.0[7f238d670000+15d000]
Mar 15 20:45:03 dlgenmsvc3 kernel: lvs[23917]: segfault at 7f17bf3afff8 ip 00007f17bf3afff8 sp 00007fff5a7317e0 error 14 in libdevmapper.so.1.02[7f17bf349000+200000]
Mar 16 03:35:31 dlgenmsvc3 kernel: lvs[17742]: segfault at 7fe3931afff8 ip 00007fe3931afff8 sp 00007ffd9f0818a0 error 14 in libdevmapper.so.1.02[7fe393149000+200000]
Mar 17 16:35:30 dlgenmsvc3 kernel: lvs[11925]: segfault at 7f447988fff8 ip 00007f447988fff8 sp 00007ffea255ccb0 error 14 in libdevmapper.so.1.02[7f4479829000+200000]
Mar 18 09:07:55 dlgenmsvc3 kernel: python[31356]: segfault at 46c87b6c ip 00007fdf6bb00018 sp 00007ffd6b7f5eb0 error 6 in libpython2.6.so.1.0[7fdf6ba70000+15d000]
Mar 19 03:31:37 dlgenmsvc3 kernel: python[51767]: segfault at 46f38efc ip 00007f2a61600018 sp 00007ffc23d38a60 error 6 in libpython2.6.so.1.0[7f2a61570000+15d000]
Mar 19 09:00:02 dlgenmsvc3 kernel: vgs[8819]: segfault at 7f1d3abcfff8 ip 00007f1d3abcfff8 sp 00007ffc066ef8f0 error 14 in libdevmapper.so.1.02[7f1d3ab69000+200000]
Mar 23 03:13:28 dlgenmsvc3 kernel: python[61689]: segfault at 470d5b6c ip 00007fe2a3400018 sp 00007ffdb173e740 error 6 in libpython2.6.so.1.0[7fe2a3370000+15d000]
Mar 23 04:50:03 dlgenmsvc3 kernel: sh[51015]: segfault at 7f417c100018 ip 00007f417c100018 sp 00007fffffc04bc0 error 14 in ld-2.12.so[7f417c17d000+20000]
Mar 25 18:42:14 dlgenmsvc3 kernel: qemu-kvm[16414]: segfault at 7f6dab000183 ip 00007f6dab000183 sp 00007fff25b160e0 error 14 in libaio.so.1.0.1[7f6daae3c000+1ff000]
Mar 30 19:40:13 dlgenmsvc3 kernel: python[60068]: segfault at 0 ip 00007ff078301018 sp 00007ffefebfa690 error 6 in libpython2.6.so.1.0[7ff0782e5000+15d000]



Expected results:
No segfault messages



Additional info:
Cust provided sosreport and core file to case 02621916
 - core.svc-3-eventbasedclient.jps.pid17442.usr0.sig6.tim1585050611
 - sosreport-dlgenmsvc3.3677085-20200331070513.tar.xz
No system outage, but throughput is degraded

Comment 2 FuXiangChun 2020-04-03 06:07:10 UTC
Hi David,

I used virt-manager to boot RHEL6. Guest works well. So I cann't reproduce this bug. This is detailed qemu-kvm command line.  Could you provide qemu command line or layer product tool for me?

/usr/libexec/qemu-kvm -name rhel6 -S -M rhel6.6.0 -cpu Opteron_G5 -enable-kvm -m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 -uuid 9ad6e91e-b072-da76-4039-0f58559c59ad -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel6.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=c,menu=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -drive file=/home/rhel6.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/home/RHEL6.10-Server-x86_64.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:bd:07:7e,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

 
# uname -r
2.6.32-754.el6.x86_64
# rpm -qa|grep qemu
qemu-guest-agent-0.12.1.2-2.506.el6_10.7.x86_64
qemu-kvm-0.12.1.2-2.506.el6_10.7.x86_64
qemu-img-0.12.1.2-2.506.el6_10.7.x86_64
qemu-kvm-tools-0.12.1.2-2.506.el6_10.7.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.506.el6_10.7.x86_64
gpxe-roms-qemu-0.9.7-6.16.el6.noarch

Comment 7 Eduardo Habkost 2020-06-11 20:45:40 UTC
In most of the segfaults shown in the logs, the faulting address is at the instruction pointer:

Mar 15 20:45:03 dlgenmsvc3 kernel: lvs[23917]: segfault at 7f17bf3afff8 ip 00007f17bf3afff8 sp 00007fff5a7317e0 error 14 in libdevmapper.so.1.02[7f17bf349000+200000]
Mar 16 03:35:31 dlgenmsvc3 kernel: lvs[17742]: segfault at 7fe3931afff8 ip 00007fe3931afff8 sp 00007ffd9f0818a0 error 14 in libdevmapper.so.1.02[7fe393149000+200000]
Mar 17 16:35:30 dlgenmsvc3 kernel: lvs[11925]: segfault at 7f447988fff8 ip 00007f447988fff8 sp 00007ffea255ccb0 error 14 in libdevmapper.so.1.02[7f4479829000+200000]
Mar 19 09:00:02 dlgenmsvc3 kernel: vgs[8819]: segfault at 7f1d3abcfff8 ip 00007f1d3abcfff8 sp 00007ffc066ef8f0 error 14 in libdevmapper.so.1.02[7f1d3ab69000+200000]
Mar 23 04:50:03 dlgenmsvc3 kernel: sh[51015]: segfault at 7f417c100018 ip 00007f417c100018 sp 00007fffffc04bc0 error 14 in ld-2.12.so[7f417c17d000+20000]
Mar 25 18:42:14 dlgenmsvc3 kernel: qemu-kvm[16414]: segfault at 7f6dab000183 ip 00007f6dab000183 sp 00007fff25b160e0 error 14 in libaio.so.1.0.1[7f6daae3c000+1ff000]

In all of them except the sh[51015] crash, the IP value is inside the shared library address range.  I don't know what could be causing those segfaults, but I suspect the cause is not related to KVM at all.