Bug 970549

Summary: segfault with hot-unplug after creation of device fails
Product: Red Hat Enterprise Linux 7 Reporter: Paolo Bonzini <pbonzini>
Component: qemu-kvmAssignee: Markus Armbruster <armbru>
Status: CLOSED NEXTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: armbru, chayang, flang, hhuang, juzhang, knoel, mdeng, pbonzini, rbalakri, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-23 15:17:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Paolo Bonzini 2013-06-04 10:03:27 UTC
The following command sequence reproduces the issue consistently:

chardev-add file,path=foo2,id=foo2
chardev-add file,path=foo3,id=foo3
device_add id=gg,driver=pci-serial-2x,chardev1=foo2,chardev2=foo2
device_add id=gg,driver=pci-serial-2x,chardev1=foo2,chardev2=foo3
device_del gg
device_add id=gg,driver=pci-serial-2x,chardev1=foo2,chardev2=foo3
device_add id=gg,driver=pci-serial-2x,chardev1=foo2,chardev2=foo3
device_del gg

Program received signal SIGSEGV, Segmentation fault.
0x00005555556b2622 in pci_unplug_device (qdev=<optimized out>)
    at /home/pbonzini/work/upstream/qemu/hw/pci/pci.c:1760
1760	    return dev->bus->hotplug(dev->bus->hotplug_qdev, dev,

dev->bus is NULL.

Comment 2 Min Deng 2013-06-06 03:40:34 UTC
Hi Paolo,
   Thanks for reporting the bug,QE re-tested the bug via QMP&monitor and got the following test results.Build info,qemu-kvm-1.5.0-2.el7.x86_64,

*a.From QMP,I cannot reproduce the bug via qmp command.
1.{"execute": "qmp_capabilities"}
  {"return": {}}
2.{"execute":"chardev-add","arguments":{"id":"bar7","backend":  {"type":"file","data":{"out":"/tmp/log7"}}}}
  {"return": {}}
3.{"execute":"chardev-add","arguments":{"id":"bar8","backend": {"type":"file","data":{"out":"/tmp/log8"}}}}
{"return": {}}
4.{"execute":"device_add","arguments":{"driver":"pci-serial-2x","chardev1":"bar7","chardev2":"bar7","id":"gg"}}
{"error": {"class": "GenericError", "desc": "Property 'pci-serial-2x.chardev1' can't take value 'bar7', it's in use"}}
5.{"execute":"device_add","arguments":{"driver":"pci-serial-2x","chardev1":"bar7","chardev2":"bar8","id":"gg"}}
{"error": {"class": "GenericError", "desc": "Property 'pci-serial-2x.chardev1' can't take value 'bar7', it's in use"}}
6.{"execute":"chardev-remove","arguments":{"id":"gg"}}
{"error": {"class": "GenericError", "desc": "Chardev 'gg' not found"}}
7.{"execute":"device_add","arguments":{"driver":"pci-serial-2x","chardev1":"bar7","chardev2":"bar7","id":"gg"}}
{"error": {"class": "GenericError", "desc": "Property 'pci-serial-2x.chardev2' can't take value 'bar7', it's in use"}}
8.{"execute":"device_add","arguments":{"driver":"pci-serial-2x","chardev1":"bar7","chardev2":"bar8","id":"gg"}}
{"error": {"class": "GenericError", "desc": "Property 'pci-serial-2x.chardev2' can't take value 'bar8', it's in use"}}
9.{"execute":"chardev-remove","arguments":{"id":"gg"}}
{"error": {"class": "GenericError", "desc": "Chardev 'gg' not found"}}

*b.it induced coredump from monitor
(qemu) chardev-add file,path=foo2,id=foo2
(qemu) chardev-add file,path=foo3,id=foo3
(qemu) device_add id=gg,driver=pci-serial-2x,chardev1=foo2,chardev2=foo2
Property 'pci-serial-2x.chardev1' can't take value 'foo2', it's in use
(qemu) device_add id=gg,driver=pci-serial-2x,chardev1=foo2,chardev2=foo3
Property 'pci-serial-2x.chardev1' can't take value 'foo2', it's in use
(qemu) device_del gg
Device 'gg' not found
(qemu) device_add id=gg,driver=pci-serial-2x,chardev1=foo2,chardev2=foo3
Property 'pci-serial-2x.chardev2' can't take value 'foo3', it's in use
(qemu) device_add id=gg,driver=pci-serial-2x,chardev1=foo2,chardev2=foo3
Property 'pci-serial-2x.chardev2' can't take value 'foo3', it's in use
(qemu) device_del gg
Segmentation fault (core dumped)

   As a result,it could not be reproduced via QMP but reproduced via monitor, any issues please let me know.

Best regards,
Min

Comment 3 Chao Yang 2014-03-13 02:42:14 UTC
Reproduced similar backtrace on qemu-kvm-rhev-1.5.3-52.el7.x86_64.

Steps:
Repeatedly hot plug and unplug Emulex VFs in a loop. It always crashed at 30th iteration.

CLI:
/usr/libexec/qemu-kvm -name test -S -M pc -cpu Nehalem -m 2G -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -nodefaults -rtc base=utc -boot menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0 -drive file=rhel7.qcow2_v3,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:23:18:85:20,bus=pci.0 -device usb-tablet,id=input0 -spice port=5900,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0 -monitor stdio -qmp tcp:0:4455,server,nowait

Program terminated with signal 11, Segmentation fault.
#0  pci_unplug_device (qdev=<optimized out>) at hw/pci/pci.c:1759
1759	    return dev->bus->hotplug(dev->bus->hotplug_qdev, dev,
(gdb) bt
#0  pci_unplug_device (qdev=<optimized out>) at hw/pci/pci.c:1759
#1  0x00007f6c750e74ab in qdev_unplug (dev=0x7f6c78018080, errp=errp@entry=0x7fff65ec2b68) at hw/core/qdev.c:219
#2  0x00007f6c7517c5a2 in qmp_device_del (id=<optimized out>, errp=errp@entry=0x7fff65ec2b68) at qdev-monitor.c:689
#3  0x00007f6c751894b5 in qmp_marshal_input_device_del (mon=<optimized out>, qdict=<optimized out>, ret=<optimized out>)
    at qmp-marshal.c:2898
#4  0x00007f6c75212917 in qmp_call_cmd (cmd=<optimized out>, params=0x7f6c783443c0, mon=0x7f6c76a54040)
    at /usr/src/debug/qemu-1.5.3/monitor.c:4509
#5  handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-1.5.3/monitor.c:4575
#6  0x00007f6c752bf012 in json_message_process_token (lexer=0x7f6c76a54100, token=0x7f6c7738bab0, type=JSON_OPERATOR, x=10216, y=0)
    at qobject/json-streamer.c:87
#7  0x00007f6c752ce59f in json_lexer_feed_char (lexer=lexer@entry=0x7f6c76a54100, ch=<optimized out>, flush=flush@entry=false)
    at qobject/json-lexer.c:303
#8  0x00007f6c752ce66e in json_lexer_feed (lexer=0x7f6c76a54100, buffer=<optimized out>, size=<optimized out>)
    at qobject/json-lexer.c:356
#9  0x00007f6c752bf1a9 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>)
    at qobject/json-streamer.c:110
#10 0x00007f6c75211663 in monitor_control_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>)
    at /usr/src/debug/qemu-1.5.3/monitor.c:4596
#11 0x00007f6c7517f761 in qemu_chr_be_write (len=<optimized out>, buf=0x7fff65ec2d70 "}Z\244vl\177", s=0x7f6c76a46150)
    at qemu-char.c:167
#12 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x7f6c76a46150) at qemu-char.c:2491
#13 0x00007f6c744b5ac6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#14 0x00007f6c75151d1a in glib_pollfds_poll () at main-loop.c:187
#15 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:232
#16 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:464
#17 0x00007f6c75075460 in main_loop () at vl.c:1988
#18 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4360



Paolo,

Can you please double confirm?

Comment 4 Paolo Bonzini 2014-03-13 10:09:00 UTC
Chao,

that backtrace could be caused by a duplicate of bug 1046248.  You can try reproducing after that bug is fixed.  If you still have problems around the 30th-32nd iteration, open a new bug.

Comment 5 Paolo Bonzini 2014-03-13 10:15:25 UTC
Fixed upstream (1.7.0).

Comment 7 Markus Armbruster 2014-06-24 12:39:19 UTC
Since pci-serial-2x has been cut out (bug 1001180), we need another
reproducer.  Paolo, any ideas?

I put it back locally to try the reproducer using it, but it dies
differently, right on first device_del:

(qemu) chardev-add file,path=foo2,id=foo2
(qemu) chardev-add file,path=foo3,id=foo3
(qemu) device_add id=gg,driver=pci-serial-2x,chardev1=foo2,chardev2=foo2
Property 'pci-serial-2x.chardev1' can't take value 'foo2', it's in use
(qemu) device_add id=gg,driver=pci-serial-2x,chardev1=foo2,chardev2=foo3
(qemu) device_del gg
(qemu) upstream-qemu: /work/armbru/qemu/memory.c:1118: memory_region_destroy: Assertion `((&mr->subregions)->tqh_first == ((void *)0))' failed.
Aborted (core dumped)

Upstream crashes the same, full backtrace below.  Probably a different
bug masking the one I'm looking for.

Re comment#5 "Fixed upstream (1.7.0)": got a commit for me, or do I
have to bisect 1.6.0 .. 1.7.0?



#0  0x00007f22f4c6cc39 in raise () from /lib64/libc.so.6
#1  0x00007f22f4c6e348 in abort () from /lib64/libc.so.6
#2  0x00007f22f4c65b96 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f22f4c65c42 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f22fee8d628 in memory_region_destroy (mr=0x7f2301751ab0)
    at /work/armbru/qemu/memory.c:1118
#5  0x00007f22fefd5bd8 in multi_serial_pci_exit (dev=0x7f2301751330)
    at /work/armbru/qemu/hw/char/serial-pci.c:154
#6  0x00007f22ff02aad0 in pci_unregister_device (dev=<optimized out>)
    at /work/armbru/qemu/hw/pci/pci.c:909
#7  0x00007f22fefd79f4 in device_unrealize (dev=0x7f2301751330, 
    errp=0x7f22e9c2b8c0) at /work/armbru/qemu/hw/core/qdev.c:196
#8  0x00007f22fefd8e08 in device_set_realized (obj=<optimized out>, 
    value=<optimized out>, errp=0x0) at /work/armbru/qemu/hw/core/qdev.c:863
#9  0x00007f22ff0841de in property_set_bool (obj=0x7f2301751330, 
    v=<optimized out>, opaque=0x7f2301757fa0, name=<optimized out>, errp=0x0)
    at /work/armbru/qemu/qom/object.c:1456
#10 0x00007f22ff0869f7 in object_property_set_qobject (obj=0x7f2301751330, 
    value=<optimized out>, name=0x7f22ff14da95 "realized", errp=0x0)
    at /work/armbru/qemu/qom/qom-qobject.c:24
#11 0x00007f22ff085710 in object_property_set_bool (
    obj=obj@entry=0x7f2301751330, value=value@entry=false, 
    name=name@entry=0x7f22ff14da95 "realized", errp=errp@entry=0x0)
    at /work/armbru/qemu/qom/object.c:884
#12 0x00007f22fefd7708 in device_unparent (obj=0x7f2301751330)
    at /work/armbru/qemu/hw/core/qdev.c:979
#13 0x00007f22ff0853c1 in object_unparent (obj=0x7f2301751330)
    at /work/armbru/qemu/qom/object.c:401
#14 0x00007f22fefb49b6 in acpi_pcihp_eject_slot (s=<optimized out>, 
    bsel=<optimized out>, slots=<optimized out>)
    at /work/armbru/qemu/hw/acpi/pcihp.c:139
#15 0x00007f22fee8a3d2 in access_with_adjusted_size (addr=addr@entry=8, 
    value=value@entry=0x7f22e9c2bab0, size=size@entry=4, 
    access_size_min=<optimized out>, access_size_max=<optimized out>, access=
    0x7f22fee8a820 <memory_region_write_accessor>, mr=0x7f23016a9d08)
    at /work/armbru/qemu/memory.c:480
#16 0x00007f22fee8eae6 in memory_region_dispatch_write (size=4, data=16, 
    addr=8, mr=0x7f23016a9d08) at /work/armbru/qemu/memory.c:1004
#17 io_mem_write (mr=mr@entry=0x7f23016a9d08, addr=8, val=<optimized out>, 
    size=4) at /work/armbru/qemu/memory.c:1812
#18 0x00007f22fee59333 in address_space_rw (
    as=0x7f22ff5791e0 <address_space_io>, addr=addr@entry=44552, 
    buf=0x7f22fed97000 <error: Cannot access memory at address 0x7f22fed97000>, len=len@entry=4, is_write=is_write@entry=true) at /work/armbru/qemu/exec.c:2047
#19 0x00007f22fee89738 in kvm_handle_io (count=1, size=4, 
    direction=<optimized out>, data=<optimized out>, port=44552)
    at /work/armbru/qemu/kvm-all.c:1597
#20 kvm_cpu_exec (cpu=cpu@entry=0x7f2301685870)
    at /work/armbru/qemu/kvm-all.c:1734
#21 0x00007f22fee77bb2 in qemu_kvm_cpu_thread_fn (arg=0x7f2301685870)
    at /work/armbru/qemu/cpus.c:872
#22 0x00007f22fd93ef33 in start_thread () from /lib64/libpthread.so.0
#23 0x00007f22f4d2bded in clone () from /lib64/libc.so.6

Comment 8 Markus Armbruster 2014-07-16 13:12:13 UTC
The masking bug mentioned in comment#7 has been fixed upstream, in commit 7497bce.

The reported bug is not reproducible upstream, in accordance with comment#5.

We still need a downstream reproducer to make further progress on this bug.

Comment 9 Paolo Bonzini 2014-10-23 15:17:24 UTC
Since it was reported internally, I think we can just close it.