Bug 1388797

Summary: Qemu kvm guest 'Segmentation fault (core dumped)' after hot unplug the 4 hot plugged virtio rng devices
Product: Red Hat Enterprise Linux 7 Reporter: Gu Nini <ngu>
Component: qemu-kvmAssignee: pagupta
Status: CLOSED CURRENTRELEASE QA Contact: Yumei Huang <yuhuang>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: amit.shah, chayang, jinzhao, juzhang, knoel, michen, ngu, pingl, rbalakri, shuang, virt-bugs, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-01 05:07:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 917953, 1401400, 1473046    
Attachments:
Description Flags
GDB debug info
none
gdb_info-02072017 none

Description Gu Nini 2016-10-26 08:20:55 UTC
Created attachment 1214204 [details]
GDB debug info

Description of problem:
Boot up a guest with one virtio rng device; after the guest boots up, stop the rngd service, and hot unplug the virtio rng device; then try to hot plug/unplug 4 virtio rng devices successively; after 2-5 mins following the hot unplug, the guest crashed for 'Segmentation fault (core dumped)'

Version-Release number of selected component (if applicable):
Host kernel: 3.10.0-514.el7.x86_64
Guest kernel: 3.10.0-514.el7.x86_64
Qemu-kvm: qemu-kvm-1.5.3-126.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot up a guest with a virtio rng device:

/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -vga cirrus  \
    -chardev socket,id=qmp_id_qmp1,path=/var/tmp/avocado_1,server,nowait \
    -mon chardev=qmp_id_qmp1,mode=control  \
    -device usb-ehci,id=usb1,bus=pci.0 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/usr/share/avocado/data/avocado-vt/images/RHEL-Server-7.3-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:57:58:59:5a:5b,id=idXXQhp1,vectors=4,netdev=idfZuynp,bus=pci.0  \
    -netdev tap,id=idfZuynp,vhost=on \
    -object rng-random,filename=/dev/random,id=rng_0 \
    -device virtio-rng-pci,id=rng0,rng=rng_0,bus=pci.0  \
    -m 1024  \
    -smp 2,maxcpus=2,cores=2,threads=1,sockets=1  \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio 

2. After the guest boots up, stop rngd service inside the guest:
# service rngd stop

3. Hot unplug the rng device in qmp:

nc -U /var/tmp/avocado_1
{"QMP": {"version": {"qemu": {"micro": 3, "minor": 5, "major": 1}, "package": " (qemu-kvm-1.5.3-126.el7)"}, "capabilities": []}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute":"device_del","arguments":{"id":"rng0"}}
{"return": {}}

4. Then hot plug 4 virtio rng devices

{"execute":"device_add","arguments":{"driver":"virtio-rng-pci","id":"rng0"}}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"virtio-rng-pci","id":"rng1"}}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"virtio-rng-pci","id":"rng2"}}
{"return": {}}
{"execute":"device_add","arguments":{"driver":"virtio-rng-pci","id":"rng3"}}
{"return": {}}

5. Hot unplug the 4 plugged rng devices

{"execute":"device_del","arguments":{"id":"rng0"}}
{"return": {}}
{"execute":"device_del","arguments":{"id":"rng1"}}
{"return": {}}
{"execute":"device_del","arguments":{"id":"rng2"}}
{"return": {}}
{"execute":"device_del","arguments":{"id":"rng3"}}
{"return": {}}


Actual results:
After step5, the guest crashed in hmp:
(qemu) Segmentation fault (core dumped)


Expected results:
Guest works well after the hot unplug of rng devices in step5.


Additional info:
1. Failed to reproduce the bug on latest qemu-kvm-rhev host:
Host kernel: 3.10.0-514.el7.x86_64
Guest kernel: 3.10.0-514.el7.x86_64
Qemu-kvm-rhev: qemu-kvm-rhev-2.6.0-27.el7.x86_64

2. The core file core.3306 is saved in nfs server 10.73.194.27:/vol/s2coredump, while its debug info is attached.

Comment 3 Gu Nini 2017-02-06 11:12:35 UTC
Amit,

As said in description of the bug, I had failed to reproduce the bug on following qemu-kvm-rhev host:
Host kernel: 3.10.0-514.el7.x86_64
Guest kernel: 3.10.0-514.el7.x86_64
Qemu-kvm-rhev: qemu-kvm-rhev-2.6.0-27.el7.x86_64

I have tried on latest rhel7.3z qemu-kvm-rhev versions, it's also a failure to produce the bug:
Host kernel: 3.10.0-514.10.1.el7.x86_64
Qemu-kvm-rhev: qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64
Guest kernel: 3.10.0-514.el7.x86_64

I also tried on latest rhel7.4 qemu-kvm versions, it's a failure to reproduce the too. So it's curious, I will try other test, no all would fail to reproduce it, we can close it.
Host kernel: 3.10.0-514.10.1.el7.x86_64
Qemu-kvm: qemu-kvm-1.5.3-130.el7.x86_64
Guest kernel: 3.10.0-514.el7.x86_64

Comment 4 Gu Nini 2017-02-07 11:05:05 UTC
Created attachment 1248367 [details]
gdb_info-02072017

Reproduced the bug on RHEL7.4, however, the reproduce rate is 2/4 intead of the original 100%:

Host kernel: 3.10.0-556.el7.x86_64
Guest kernel: 3.10.0-514.el7.x86_64
Qemu kvm: qemu-kvm-1.5.3-130.el7.x86_64

You can get the core file core.15846 in nfs server 10.73.194.27:/vol/s2coredump.

Comment 5 Gu Nini 2017-02-07 11:08:25 UTC
(In reply to Gu Nini from comment #4)
> Created attachment 1248367 [details]
> gdb_info-02072017
> 
> Reproduced the bug on RHEL7.4, however, the reproduce rate is 2/4 intead of
> the original 100%:
> 

Please note I have reproduced the bug on an **amd** host with latest qemu-kvm 7.4 version, while failed to reproduce it on intel host with the same version in comment #3, and also failed to reproduce it on both intel and amd hosts with latest qemu-kvm-rhev 7.3z version.