Red Hat Bugzilla – Bug 1088822
hot-plug a virtio-scsi disk via 'blockdev-add' always cause QEMU quit
Last modified: 2015-03-05 03:06:32 EST
Description of problem: launch a KVM guest, then hotplug/unplug virtio-scsi/virtio-blk with 'blockdev-add' and '__com.redhat_drive_add' repeat which will cause QEMU quit with backtrace, this issue was caused by virtio-scsi checking with the bt log. Version-Release number of selected component (if applicable): host info: # uname -r && rpm -q qemu-kvm-rhev 3.10.0-121.el7.x86_64 qemu-kvm-rhev-1.5.3-60.el7ev.x86_64 guest info: # uname -r 3.10.0-121.el7.x86_64 How reproducible: always Steps to Reproduce: 1.launch a KVM guest with gdb. # gdb /usr/libexec/qemu-kvm (gdb) b exit Breakpoint 1 at 0xaaf30 (gdb) r -M pc -cpu host -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -no-kvm-pit-reinjection -usb -device usb-tablet,id=input0 -name sluo -uuid 990ea161-6b67-47b2-b803-19fb01d30d30 -rtc base=localtime,clock=host,driftfix=slew -device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0,bus=pci.0,addr=0x3 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait -device virtserialport,chardev=channel1,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port1 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=com.redhat.rhevm.vdsm,bus=virtio-serial0.0,id=port2 -drive file=/home/RHEL-7.0-20140409.0_Server_x86_64.qcow2,if=none,id=drive-system-disk,format=qcow2,cache=none,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-system-disk,id=system-disk,bootindex=1 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=00:01:02:B6:40:21,bus=pci.0,addr=0x5 -device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x6 -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -serial unix:/tmp/ttyS0,server,nowait -k en-us -boot menu=on -qmp tcp:0:4444,server,nowait -spice disable-ticketing,port=5931 -monitor stdio 2.hotplug/unplug virtio-scsi/virtio-blk with 'blockdev-add' and '__com.redhat_drive_add' repeat with python script, i will attach the script. Actual results: after a while, QEMU will quit with a backtrace. (qemu) [Thread 0x7fffeabf7700 (LWP 31736) exited] [New Thread 0x7fffeabf7700 (LWP 31741)] Breakpoint 1, 0x00007ffff2c9a250 in exit () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff2c9a250 in exit () from /lib64/libc.so.6 #1 0x0000555555795fe2 in virtqueue_num_heads (vq=0x55555811db48, idx=0) at /usr/src/debug/qemu-1.5.3/hw/virtio/virtio.c:291 #2 0x0000555555796673 in virtqueue_pop (vq=vq@entry=0x55555811db48, elem=elem@entry=0x55555767ec50) at /usr/src/debug/qemu-1.5.3/hw/virtio/virtio.c:441 #3 0x0000555555790353 in virtio_scsi_pop_req (s=s@entry=0x555558d8c6f8, vq=0x55555811db48) at /usr/src/debug/qemu-1.5.3/hw/scsi/virtio-scsi.c:116 #4 0x0000555555790582 in virtio_scsi_push_event (s=0x555558d8c6f8, dev=0x55555677d580, event=1, reason=1) at /usr/src/debug/qemu-1.5.3/hw/scsi/virtio-scsi.c:484 #5 0x00005555556b52b0 in scsi_qdev_init (qdev=<optimized out>) at hw/scsi/scsi-bus.c:193 #6 0x00005555556769d4 in device_realize (dev=0x55555677d580, err=0x7fffffffc630) at hw/core/qdev.c:178 #7 0x000055555567793b in device_set_realized (obj=0x55555677d580, value=<optimized out>, err=0x7fffffffc758) at hw/core/qdev.c:693 #8 0x000055555571c9be in property_set_bool (obj=0x55555677d580, v=<optimized out>, opaque=0x555556f7e5b0, name=<optimized out>, errp=0x7fffffffc758) at qom/object.c:1302 #9 0x000055555571ea97 in object_property_set_qobject (obj=0x55555677d580, value=<optimized out>, name=0x55555587249a "realized", errp=0x7fffffffc758) at qom/qom-qobject.c:24 #10 0x000055555571dc80 in object_property_set_bool (obj=obj@entry=0x55555677d580, value=value@entry=true, name=name@entry=0x55555587249a "realized", errp=errp@entry=0x7fffffffc758) at qom/object.c:853 #11 0x000055555570bb2f in qdev_device_add (opts=opts@entry=0x5555568dda30) at qdev-monitor.c:556 #12 0x000055555570bf1d in do_device_add (mon=<optimized out>, qdict=<optimized out>, ret_data=<optimized out>) at qdev-monitor.c:673 #13 0x00005555557a2387 in qmp_call_cmd (cmd=<optimized out>, params=0x5555567cdd50, mon=0x555556514550) at /usr/src/debug/qemu-1.5.3/monitor.c:4509 #14 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-1.5.3/monitor.c:4575 #15 0x000055555584eab2 in json_message_process_token (lexer=0x5555564f6160, token=0x5555566d0470, type=JSON_OPERATOR, x=10442, y=0) at qobject/json-streamer.c:87 #16 0x000055555585e03f in json_lexer_feed_char (lexer=lexer@entry=0x5555564f6160, ch=<optimized out>, flush=flush@entry=false) at qobject/json-lexer.c:303 #17 0x000055555585e10e in json_lexer_feed (lexer=0x5555564f6160, buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:356 #18 0x000055555584ec49 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:110 #19 0x00005555557a10d3 in monitor_control_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-1.5.3/monitor.c:4596 #20 0x000055555570f181 in qemu_chr_be_write (len=<optimized out>, buf=0x7fffffffca30 "}", s=0x5555564e79c0) at qemu-char.c:167 #21 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x5555564e79c0) at qemu-char.c:2491 #22 0x00007ffff74edac6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #23 0x00005555556e173a in glib_pollfds_poll () at main-loop.c:187 #24 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:232 #25 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:464 #26 0x0000555555602e30 in main_loop () at vl.c:1988 #27 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4360 (gdb) Expected results: QEMU should no any quit. Additional info: Discus with famz who told me there was a mistake code in line 289. 283 static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx) 284 { 285 uint16_t num_heads = vring_avail_idx(vq) - idx; 286 287 /* Check it isn't doing very strange things with descriptor numbers. */ 288 if (num_heads > vq->vring.num) { 289 error_report("Guest moved used index from %u to %u", 290 idx, vring_avail_idx(vq)); 291 exit(1); 292 }
Created attachment 887119 [details] hotpluging testing python script.
(gdb) bt full #0 0x00007ffff2c9a250 in exit () from /lib64/libc.so.6 No symbol table info available. #1 0x0000555555795fe2 in virtqueue_num_heads (vq=0x55555811db48, idx=0) at /usr/src/debug/qemu-1.5.3/hw/virtio/virtio.c:291 idx = 0 vq = 0x55555811db48 num_heads = <optimized out> #2 0x0000555555796673 in virtqueue_pop (vq=vq@entry=0x55555811db48, elem=elem@entry=0x55555767ec50) at /usr/src/debug/qemu-1.5.3/hw/virtio/virtio.c:441 i = <optimized out> head = <optimized out> max = <optimized out> desc_pa = <optimized out> #3 0x0000555555790353 in virtio_scsi_pop_req (s=s@entry=0x555558d8c6f8, vq=0x55555811db48) at /usr/src/debug/qemu-1.5.3/hw/scsi/virtio-scsi.c:116 req = 0x55555767ec40 #4 0x0000555555790582 in virtio_scsi_push_event (s=0x555558d8c6f8, dev=0x55555677d580, event=1, reason=1) at /usr/src/debug/qemu-1.5.3/hw/scsi/virtio-scsi.c:484 vs = <optimized out> __func__ = "virtio_scsi_push_event" req = <optimized out> evt = <optimized out> vdev = <optimized out> in_size = <optimized out> __PRETTY_FUNCTION__ = "virtio_scsi_push_event" #5 0x00005555556b52b0 in scsi_qdev_init (qdev=<optimized out>) at hw/scsi/scsi-bus.c:193 ---Type <return> to continue, or q <return> to quit--- dev = 0x55555677d580 __func__ = "scsi_qdev_init" bus = 0x555558d8c810 d = <optimized out> rc = <optimized out> __PRETTY_FUNCTION__ = "scsi_qdev_init" #6 0x00005555556769d4 in device_realize (dev=0x55555677d580, err=0x7fffffffc630) at hw/core/qdev.c:178 rc = <optimized out> dc = <optimized out> #7 0x000055555567793b in device_set_realized (obj=0x55555677d580, value=<optimized out>, err=0x7fffffffc758) at hw/core/qdev.c:693 dev = 0x55555677d580 __func__ = "device_set_realized" dc = 0x5555568e9810 local_err = 0x0 #8 0x000055555571c9be in property_set_bool (obj=0x55555677d580, v=<optimized out>, opaque=0x555556f7e5b0, name=<optimized out>, errp=0x7fffffffc758) at qom/object.c:1302 prop = 0x555556f7e5b0 value = true local_err = 0x0 #9 0x000055555571ea97 in object_property_set_qobject (obj=0x55555677d580, value=<optimized out>, name=0x55555587249a "realized", errp=0x7fffffffc758) at qom/qom-qobject.c:24 mi = 0x555558655730 #10 0x000055555571dc80 in object_property_set_bool (obj=obj@entry=0x55555677d580, value=value@entry=true, name=name@entry=0x55555587249a "realized", errp=errp@entry=0x7fffffffc758) at qom/object.c:853 qbool = 0x555556f971d0 #11 0x000055555570bb2f in qdev_device_add (opts=opts@entry=0x5555568dda30) at qdev-monitor.c:556 oc = <optimized out> dc = 0x5555568e9810 driver = 0x5555568f6e20 "scsi-hd" path = 0x555556c23fa0 "scsi1.0" id = <optimized out> dev = 0x55555677d580 bus = <optimized out> err = 0x0 __func__ = "qdev_device_add" #12 0x000055555570bf1d in do_device_add (mon=<optimized out>, qdict=<optimized out>, ret_data=<optimized out>) at qdev-monitor.c:673 local_err = 0x0 opts = 0x5555568dda30 dev = <optimized out> #13 0x00005555557a2387 in qmp_call_cmd (cmd=<optimized out>, params=0x5555567cdd50, mon=0x555556514550) at /usr/src/debug/qemu-1.5.3/monitor.c:4509 ret = <optimized out> data = 0x0 #14 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-1.5.3/monitor.c:4575 err = <optimized out> obj = <optimized out> input = <optimized out> args = 0x5555567cdd50 cmd_name = <optimized out> mon = 0x555556514550 #15 0x000055555584eab2 in json_message_process_token (lexer=0x5555564f6160, token=0x5555566d0470, type=JSON_OPERATOR, x=10442, y=0) at qobject/json-streamer.c:87 parser = 0x5555564f6158 dict = 0x5555567cbd10 #16 0x000055555585e03f in json_lexer_feed_char (lexer=lexer@entry=0x5555564f6160, ch=<optimized out>, flush=flush@entry=false) at qobject/json-lexer.c:303 new_state = 100 #17 0x000055555585e10e in json_lexer_feed (lexer=0x5555564f6160, buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:356 err = <optimized out> i = <optimized out> #18 0x000055555584ec49 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:110 No locals. #19 0x00005555557a10d3 in monitor_control_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/debug/qemu-1.5.3/monitor.c:4596 old_mon = 0x0 #20 0x000055555570f181 in qemu_chr_be_write (len=<optimized out>, buf=0x7fffffffca30 "}", s=0x5555564e79c0) at qemu-char.c:167 No locals. #21 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x5555564e79c0) at qemu-char.c:2491 chr = 0x5555564e79c0 s = 0x5555564e7a90 buf = "}\000\000\000\000\000\000\000x\203\217VUU\000\000\000\020", '\000' <repeats 22 times>, "\030\"\a\334\377\177\000\000\000\002", '\000' <repeats 22 times>, "X\260\216VUU\000\000\000\062", '\000' <repeats 22 times>, "\370\205\217VUU\000\000\000 ", '\000' <repeats 22 times>, "hH\217VUU\000\000\000\060", '\000' <repeats 22 times>, "\070\207\217VUU\000\00---Type <return> to continue, or q <return> to quit--- 0\000\020", '\000' <repeats 22 times>... len = <optimized out> size = <optimized out> #22 0x00007ffff74edac6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 No symbol table info available. #23 0x00005555556e173a in glib_pollfds_poll () at main-loop.c:187 context = 0x5555564e6e00 pfds = <optimized out> #24 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:232 ret = 2 spin_counter = 0 #25 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:464 ret = 2 timeout = 4294967295 #26 0x0000555555602e30 in main_loop () at vl.c:1988 nonblocking = <optimized out> last_io = 2 #27 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4360 i = <optimized out> snapshot = 0 linux_boot = 0 icount_option = 0x0 initrd_filename = 0x0 kernel_filename = 0x0 kernel_cmdline = 0x5555558b7bc0 "" boot_order = 0x55555586f4c6 "cad" ds = <optimized out> cyls = 0 heads = 0 secs = 0 translation = 0 hda_opts = <optimized out> opts = 0x5555564e51d0 machine_opts = <optimized out> olist = <optimized out> optind = 56 optarg = 0x7fffffffe7e6 "stdio" loadvm = 0x0 machine = 0x555555c446c0 <pc_machine_rhel700> cpu_model = 0x7fffffffe305 "host" vga_model = 0x55555589a21f "cirrus" pid_file = 0x0 incoming = 0x0 show_vnc_port = 0 defconfig = <optimized out> userconfig = 5 log_mask = <optimized out> log_file = 0x0 mem_trace = {malloc = 0x555555749c50 <malloc_and_trace>, realloc = 0x555555749c30 <realloc_and_trace>, free = 0x555555749c20 <free_and_trace>, calloc = 0x0, try_malloc = 0x0, try_realloc = 0x0} trace_events = 0x0 trace_file = 0x0 __PRETTY_FUNCTION__ = "main" args = {machine = 0x555555c446c0 <pc_machine_rhel700>, ram_size = 2147483648, boot_device = 0x55555586f4c6 "cad", kernel_filename = 0x0, kernel_cmdline = 0x5555558b7bc0 "", initrd_filename = 0x0, cpu_model = 0x7fffffffe305 "host"} (gdb) q
I run the hotpluging python script: # python hotplug-via-QMP.py -f 4444 -n 200 The script quit with the following log: ***** hot-plug virtio-blk disk via 'blockdev-add' ***** *cmdline = {'execute': 'blockdev-add', 'arguments': {'options': {'rerror': 'stop', 'cache': {'no-flush': False, 'writeback': False, 'direct': True}, 'driver': 'qcow2', 'werror': 'stop', 'aio': 'native', 'file': {'driver': 'file', 'filename': '/home/my-data-disk2.qcow2'}, 'id': 'drive-disk2'}}} {'execute': 'blockdev-add', 'arguments': {'options': {'rerror': 'stop', 'cache': {'no-flush': False, 'writeback': False, 'direct': True}, 'driver': 'qcow2', 'werror': 'stop', 'aio': 'native', 'file': {'driver': 'file', 'filename': '/home/my-data-disk2.qcow2'}, 'id': 'drive-disk2'}}} Traceback (most recent call last): File "hotplug-via-QMP.py", line 136, in <module> qmp_socket.send({ "execute": "blockdev-add", "arguments": {'options' : {'driver': 'qcow2', 'id':'drive-disk2', "aio":"native", 'rerror':'stop', 'werror':'stop', 'file': {'driver': 'file', 'filename': '/home/my-data-disk2.qcow2'}, 'cache': { 'writeback': False, 'direct': True, 'no-flush': False }}} }, convert=False) File "hotplug-via-QMP.py", line 44, in send resp = self.__json_read() File "hotplug-via-QMP.py", line 88, in __json_read return json.loads(self.sock.recv(1024)) socket.timeout: timed out
Fam, what was your impression about this bug?
(In reply to Paolo Bonzini from comment #4) > Fam, what was your impression about this bug? I didn't debug this one, just had a look at the backtrace. Does the device hot plug code here calling virtio_scsi_pop_req() make sense? The device is new, so there is no vring buffer from guest yet. But in virtio_scsi_pop_req, it checks the desc and reads guest memory for getting an index.
Yes, this is the event queue. The hotplugged device causing the crash is a scsi-disk device, not the parent virtio-scsi device. Perhaps this is a guest bug where the guest is setting VIRTIO_CONFIG_S_DRIVER_OK too early, before it has set up the virtqueues?
I think we need this one to ensure virtio_scsi_pop_req is only called when VIRTIO_CONFIG_S_DRIVER_OK is seen: commit 91e7fcca4743cf694eb0c8e7a8d938cf359b5bd8 Author: Markus Armbruster <armbru@redhat.com> Date: Fri May 16 17:44:06 2014 +0200 virtio-scsi: Plug memory leak on virtio_scsi_push_event() error path Spotted by Coverity. Signed-off-by: Markus Armbruster <armbru@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index b0d7517..91a62ce 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -489,7 +489,7 @@ static void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev, uint32_t event, uint32_t reason) { VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s); - VirtIOSCSIReq *req = virtio_scsi_pop_req(s, vs->event_vq); + VirtIOSCSIReq *req; VirtIOSCSIEvent *evt; VirtIODevice *vdev = VIRTIO_DEVICE(s); int in_size; @@ -498,6 +498,7 @@ static void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev, return; } + req = virtio_scsi_pop_req(s, vs->event_vq); if (!req) { s->events_dropped = true; return;
Sibiao, Could you have a test of this below scratch build (that has the fix in comment 8)? http://brewweb.devel.redhat.com/brew/taskinfo?taskID=8187815 Thanks, Fam
(In reply to Fam Zheng from comment #9) > Sibiao, > > Could you have a test of this below scratch build (that has the fix in > comment 8)? > > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=8187815 > > Thanks, > Fam Tried this issue with the same steps on the scratch build(qemu-kvm-1.5.3-77.el7.test.x86_64). host info: # uname -r && rpm -q qemu-kvm 3.10.0-191.el7.x86_64 qemu-kvm-1.5.3-77.el7.test.x86_64 Results: Tried 300 times hotpluging, guest will call trace and reboot(bug 1159723), but qemu work well which did not meet any quit and back-traces. # python qmp-test.py -n 300 -f 4444 And i meet a "block I/O error in device '': No medium found (123)" message, but according to bug 970159#c12 which is not a issue i think. (qemu) block I/O error in device '': No medium found (123) (qemu) info status VM status: running Best Regards, sluo
Fix included in qemu-kvm-1.5.3-78.el7
Verified this issue on qemu-kvm-rhev-2.1.2-7.el7.x86_64 and with the same testing which did not meet any QEMU quit with backtrace. #########qemu-kvm-rhev-2.1.2-7.el7.x86_64 host info: # uname -r && rpm -q qemu-kvm-rhev 3.10.0-183.el7.x86_64 qemu-kvm-rhev-2.1.2-7.el7.x86_64 #########qemu-kvm-1.5.3-78.el7.x86_64 host info: # uname -r && rpm -q qemu-kvm 3.10.0-183.el7.x86_64 qemu-kvm-1.5.3-78.el7.x86_64 Base on above, this issue has been fixed correctly, move to VERIFIED status, please correct me if any mistake, thanks. Best Regards, sluo
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0349.html