Bug 1431224
Summary: | Attach lun type disk report error and crash guest | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jaroslav Suchanek <jsuchane> | |
Component: | qemu-kvm-rhev | Assignee: | Fam Zheng <famz> | |
Status: | CLOSED ERRATA | QA Contact: | aihua liang <aliang> | |
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | 7.4 | CC: | abologna, aliang, chayang, coli, drjones, eric.auger, famz, hachen, juzhang, michen, mprivozn, mrezanin, pablo.iranzo, qzhang, rbalakri, virt-maint, wehuang, weizhan, xuma, xuwei, yhong | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-rhev-2.9.0-1.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1430634 | |||
: | 1432057 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-02 03:39:56 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1430634 | |||
Bug Blocks: | 1432057 |
Description
Jaroslav Suchanek
2017-03-10 16:22:07 UTC
Wei, In your original BZ description, Bug 1430634 - Attach lun type disk report error and crash guest ther following were mentioned in the decription: 0) the avocado VM is started with namespaces enabled 1) libvirt starts the hotplug routine 2) qemu dies right in the middle of it: ^^^^^^^^^ 2017-03-10 09:23:15.853+0000: 22172: info : qemuMonitorIOWrite:534 : QEMU_MONITOR_IO_WRITE: mon=0xffff60005f80 buf={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":"on","bus":"pci.2","addr":"0x0","drive":"drive-virtio-disk1","id":"virtio-disk1"},"id":"libvirt-12"} len=172 ret=172 errno=0 2017-03-10 09:23:16.011+0000: 22172: error : qemuAgentIO:652 : internal error: End of file from agent monitor So there are two bugs here: 1) libvirt shouldn't try to use namespace routines once a domain dies, 2) qemu should not crash on device_add. ^^^^^^^^^^^^^^^^^^^^^ Apparently, qemu crashed on the device_add operation. Would it be possible for you the provide the qemu stack trace when it crashed? Thanks. Hi Hai, Please check if it is what you need. # gdb -p `pidof qemu-kvm` (gdb) c Continuing. [New Thread 0xfffe73a8ec30 (LWP 22058)] Program received signal SIGABRT, Aborted. 0x0000ffff8cc541b8 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000ffff8cc541b8 in raise () from /lib64/libc.so.6 #1 0x0000ffff8cc55848 in abort () from /lib64/libc.so.6 #2 0x0000ffff8cc4d8d4 in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000ffff8cc4d98c in __assert_fail () from /lib64/libc.so.6 #4 0x0000aaaada0766c8 in error_setv () #5 0x0000aaaada076808 in error_setg_internal () #6 0x0000aaaad9f7d998 in virtio_pci_device_plugged () #7 0x0000aaaad9f7ef84 in virtio_bus_device_plugged () #8 0x0000aaaad9e783b8 in virtio_device_realize () #9 0x0000aaaad9f451ac in device_set_realized () #10 0x0000aaaad9fca460 in property_set_bool () #11 0x0000aaaad9fcbf28 in object_property_set () #12 0x0000aaaad9fce3d8 in object_property_set_qobject () #13 0x0000aaaad9fcc0e4 in object_property_set_bool () #14 0x0000aaaad9f7e770 in virtio_pci_realize () #15 0x0000aaaad9f6211c in pci_qdev_realize () #16 0x0000aaaad9f7c058 in virtio_pci_dc_realize () #17 0x0000aaaad9f451ac in device_set_realized () #18 0x0000aaaad9fca460 in property_set_bool () #19 0x0000aaaad9fcbf28 in object_property_set () #20 0x0000aaaad9fce3d8 in object_property_set_qobject () #21 0x0000aaaad9fcc0e4 in object_property_set_bool () #22 0x0000aaaad9eede0c in qdev_device_add () #23 0x0000aaaad9eee408 in qmp_device_add () #24 0x0000aaaada0699e0 in qmp_dispatch () #25 0x0000aaaad9e3244c in handle_qmp_command () #26 0x0000aaaada06ef48 in json_message_process_token () #27 0x0000aaaada086340 in json_lexer_feed_char () #28 0x0000aaaada086428 in json_lexer_feed () #29 0x0000aaaad9e30d30 in monitor_qmp_read () #30 0x0000aaaad9ef38d8 in qemu_chr_be_write () #31 0x0000aaaad9ef3cfc in tcp_chr_read () #32 0x0000aaaada03695c in qio_channel_fd_source_dispatch () #33 0x0000ffff8d07ebb4 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #34 0x0000aaaad9fd7ba4 in main_loop_wait () #35 0x0000aaaad9df7af8 in main () I think there's another libvirt bug here, as well as a qemu bug. Based on the stack trace it appears we're calling error_setg from virtio_pci_device_plugged with a preexisting error (that's the qemu bug). The preexisting error comes from virtio_blk_get_features (that's the libvirt bug). Here's the sequence virtio_bus_device_plugged -- errp is NULL virtio_blk_get_features -- errp is set, vdev->host_features = 0 virtio_pci_device_plugged if !ignore_backend_features && !(vdev->host_features & VIRTIO_F_VERSION_1) && !legacy error_setg(errp, ...) As all the conditions are true (!(vdev->host_features & VIRTIO_F_VERSION_1) is true because vdev->host_features has been set to zero by the failure of virtio_blk_get_features), then we attempt to error [again]. To solve the QEMU bug, I don't think we should call virtio_pci_device_plugged when virtio_blk_get_features fails, but rather we should propagate the error. To solve the libvirt bug, I don't think we should do what's making virtio_blk_get_features fail. So what's making it fail? Commit efb8206ca7f1, which states we can't have both scsi=on and disable-modern=off,disable-legacy=on at the same time, as SCSI passthrough is no longer supported in virtio 1.0. Thank you Drew, following your analysis I posted this fix to upstream: https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg03533.html Merged for upstream QEMU 2.9: commit a77690c41da67d85bab1e784a9f24f18bc63dbd9 Author: Fam Zheng <famz> Date: Fri Mar 17 20:32:42 2017 +0800 virtio: Fix error handling in virtio_bus_device_plugged For one thing we shouldn't continue if an error happened, for the other two steps failing can cause an abort() in error_setg because we reuse the same errp blindly. Add error handling checks to fix both issues. Signed-off-by: Fam Zheng <famz> Reviewed-by: Michael S. Tsirkin <mst> Signed-off-by: Michael S. Tsirkin <mst> Reviewed-by: Cornelia Huck <cornelia.huck.com> Reviewed-by: Andrew Jones <drjones> Reviewed-by: Philippe Mathieu-Daudé <f4bug> The bug can't be reproduced on x86. kernel version:3.10.0-588.el7.x86_64 qemu-kvm-rhev version:qemu-kvm-rhev-2.8.0-6.el7.x86_64 libvirt version:libvirt-2.0.0-10.el7.x86_64 Test steps: 1.start domain "test", qemu cmds as bellow: /usr/libexec/qemu-kvm\ -name guest=test,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-test/master-key.aes \ -machine pc-i440fx-rhel7.4.0,accel=kvm,usb=off \ -cpu Penryn \ -m 4096 \ -realtime mlock=off \ -smp 1,sockets=1,cores=1,threads=1 \ -uuid 96cd2ce0-b126-40bf-9557-2f33aa559621 \ -no-user-config -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-test/monitor.sock,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \ -global PIIX4_PM.disable_s3=1 \ -global PIIX4_PM.disable_s4=1 \ -boot strict=on \ -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 \ -device ich9-usb uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 \ -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:22:2c:4d,bus=pci.0,addr=0x3 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -vnc 0.0.0.0:0 \ -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ -msg timestamp=on \ 2.Passthrough blk device with scsi=on virsh attach-disk --domain test --source /dev/sdb --target vdb --driver qemu --type lun Test Result: Attach device failed with error msg: error: Failed to attach disk error: internal error: unable to execute QEMU command 'device_add': Please set scsi=off for virtio-blk devices in order to use virtio 1.0 Hi, Fam I can't reproduce the bug on x86, can you help to check if the test steps in comment 7 is right? Thanks, aliang Wei seems to have a reproducer in comment 2. Could you help, Wei? Hi Fam and aihua, It can not be reproduced in x86, the previous bug I report is on aarch64 Hi,wei Got it,thanks BR, aliang Not sure what the needinfo is for, but the plan in comment 12 makes sense to me. Can reproduce this bug on aarch64. kernel version:4.10.0-11.el7.aarch64 libvirt version:libvirt-3.2.0-2.el7.aarch64 AAVMF version:AAVMF-20170228-3.gitc325e41585e3.el7.noarch qemu-kvm-rhev:qemu-kvm-rhev-2.8.0-6.el7.aarch64 Test steps: 1.start guest with qemu cmds: /usr/libexec/qemu-kvm \ -name guest=te,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-te/master-key.aes \ -machine virt-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu host -drive file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \ -drive file=/var/lib/libvirt/qemu/nvram/te_VARS.fd,if=pflash,format=raw,unit=1 \ -m 4096 \ -realtime mlock=off \ -smp 1,sockets=1,cores=1,threads=1 \ -uuid 39f5ef2a-44c5-4a18-b0b2-2f28e063405e \ -no-user-config -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-te/monitor.sock,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on \ -device ioh3420,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \ -device ioh3420,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \ -device ioh3420,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 -device ioh3420,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \ -drive file=/home/te.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \ -device virtio-blk-pci,scsi=off,bus=pci.2,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=28 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1f:67:bf,bus=pci.1,addr=0x0 \ -serial pty \ -vnc 0.0.0.0:0 -device virtio-gpu-pci,id=video0,bus=pci.3,addr=0x0 \ -msg timestamp=on \ 2. Passthrough a disk. virsh attach-disk --domain te --source /dev/sdb --target vdb --driver qemu --type lun Test Result: error: Failed to attach disk error: internal error: child reported: Kernel does not provide mount namespace: No such file or directory Verified, the issue has been resolved, so set its status to "Verified". ****************Details************** Test Version: kernel version:4.10.0-13.el7.aarch64 qemu-kvm-rhev version:qemu-kvm-rhev-2.9.0-2.el7.aarch64 libvirt version:libvirt-3.2.0-3.el7.aarch64 Test steps: 1.Start guest with qemu cmds: /usr/libexec/qemu-kvm \ -name guest=te,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-te/master-key.aes \ -machine virt-rhel7.4.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu host -drive file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \ -drive file=/var/lib/libvirt/qemu/nvram/te_VARS.fd,if=pflash,format=raw,unit=1 \ -m 4096 \ -realtime mlock=off \ -smp 1,sockets=1,cores=1,threads=1 \ -uuid 39f5ef2a-44c5-4a18-b0b2-2f28e063405e \ -no-user-config -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-te/monitor.sock,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on \ -device ioh3420,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \ -device ioh3420,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \ -device ioh3420,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 -device ioh3420,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \ -drive file=/home/te.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \ -device virtio-blk-pci,scsi=off,bus=pci.2,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=28 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1f:67:bf,bus=pci.1,addr=0x0 \ -serial pty \ -vnc 0.0.0.0:0 -device virtio-gpu-pci,id=video0,bus=pci.3,addr=0x0 \ -msg timestamp=on \ 2. Passthrough a disk. virsh attach-disk --domain te --source /dev/sdb --target vdb --driver qemu --type lun Test Result: error: Failed to attach disk error: internal error: unable to execute QEMU command 'device_add': Please set scsi=off for virtio-blk devices in order to use virtio 1.0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |