Bug 1432057

Summary: Attach lun type disk report error and crash guest
Product: Red Hat Enterprise Linux 7 Reporter: Andrew Jones <drjones>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 7.4CC: abologna, chayang, drjones, eric.auger, jsuchane, jtomko, juzhang, knoel, michen, mprivozn, rbalakri, virt-bugs, virt-maint, wehuang, weizhan
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1431224 Environment:
Last Closed: 2017-03-14 13:46:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1430634, 1431224    
Bug Blocks: 1173757    

Description Andrew Jones 2017-03-14 12:30:07 UTC
+++ This bug was initially created as a clone of Bug #1431224 +++

+++ This bug was initially created as a clone of Bug #1430634 +++

Description of problem:
Attach lun type disk report error and crash guest
not sure it is libvirt or qemu-kvm-rhev issues so report libvirt first

Version-Release number of selected component (if applicable):
kernel-4.9.0-10.el7.aarch64
qemu-kvm-rhev-2.8.0-6.el7.aarch64
libvirt-3.1.0-2.el7.aarch64


How reproducible:
100%

Steps to Reproduce:
1.virsh attach-disk --domain avocado-vt-vm1 --source /dev/sdb --target vdb --driver qemu --type lun
error: Failed to attach disk
error: internal error: child reported: Kernel does not provide mount namespace: No such file or directory
2.Then find guest is destroyed
3.

Actual results:
Attach failed and crash guest

Expected results:
PASS without crash

Additional info:

--- Additional comment from Jaroslav Suchanek on 2017-03-09 10:50:28 CET ---

Libvirt debug logs would be fine, as well as qemu logs of the guest.

Also I assume that the same command succeeded on x86. It is not clear to me, what do you mean by 'guest crash'. Was there any qemu process crash? Or the guest was stopped due to guest kernel panic? All in all, there might be no issue in libvirt unless the mount namespace is in charge. Adding Michal and Peter to CC list.

Thanks.

--- Additional comment from weizhang on 2017-03-10 10:28 CET ---

Hi Jaroslav,

Sorry for not describing clear, it means qemu process crash. I can not see any crash info with console. And the libvirtd.log will be attached.

--- Additional comment from Michal Privoznik on 2017-03-10 11:22:44 CET ---

Ah, so after careful examination of the logs, I think this is what is happening here:

0) the avocado VM is started with namespaces enabled
1) libvirt starts the hotplug routine
2) qemu dies right in the middle of it:

2017-03-10 09:23:15.853+0000: 22172: info : qemuMonitorIOWrite:534 : QEMU_MONITOR_IO_WRITE: mon=0xffff60005f80 buf={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":"on","bus":"pci.2","addr":"0x0","drive":"drive-virtio-disk1","id":"virtio-disk1"},"id":"libvirt-12"}
 len=172 ret=172 errno=0
2017-03-10 09:23:16.011+0000: 22172: error : qemuAgentIO:652 : internal error: End of file from agent monitor

3) libvirt tries to roll back. And because it still thinks that the domain is using namespaces it calls function to enter the namespace of the qemu process and do all the work there. The namespace, however, no longer exists - kernel cleaned it up (it always does when the last process in the namespace dies). Therefore our roll back attempts fail: we are trying to enter non-existent namespace.

So there are two bugs here:
1) libvirt shouldn't try to use namespace routines once a domain dies,
2) qemu should not crash on device_add.

Working on fixing libvirt issue.

--- Additional comment from Hai Huang on 2017-03-13 15:20:37 CET ---

Wei,
In your original BZ description,
   Bug 1430634 - Attach lun type disk report error and crash guest

ther following were mentioned in the decription:


0) the avocado VM is started with namespaces enabled
1) libvirt starts the hotplug routine
2) qemu dies right in the middle of it:
   ^^^^^^^^^

2017-03-10 09:23:15.853+0000: 22172: info : qemuMonitorIOWrite:534 : QEMU_MONITOR_IO_WRITE: mon=0xffff60005f80 buf={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":"on","bus":"pci.2","addr":"0x0","drive":"drive-virtio-disk1","id":"virtio-disk1"},"id":"libvirt-12"}
 len=172 ret=172 errno=0
2017-03-10 09:23:16.011+0000: 22172: error : qemuAgentIO:652 : internal error: End of file from agent monitor


So there are two bugs here:
1) libvirt shouldn't try to use namespace routines once a domain dies,
2) qemu should not crash on device_add.
   ^^^^^^^^^^^^^^^^^^^^^

Apparently, qemu crashed on the device_add operation.
Would it be possible for you the provide the 
qemu stack trace when it crashed?

Thanks.

--- Additional comment from weizhang on 2017-03-14 02:51:19 CET ---

Hi Hai,

Please check if it is what you need.

# gdb -p `pidof qemu-kvm`
(gdb) c
Continuing.
[New Thread 0xfffe73a8ec30 (LWP 22058)]

Program received signal SIGABRT, Aborted.
0x0000ffff8cc541b8 in raise () from /lib64/libc.so.6

(gdb) bt
#0  0x0000ffff8cc541b8 in raise () from /lib64/libc.so.6
#1  0x0000ffff8cc55848 in abort () from /lib64/libc.so.6
#2  0x0000ffff8cc4d8d4 in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000ffff8cc4d98c in __assert_fail () from /lib64/libc.so.6
#4  0x0000aaaada0766c8 in error_setv ()
#5  0x0000aaaada076808 in error_setg_internal ()
#6  0x0000aaaad9f7d998 in virtio_pci_device_plugged ()
#7  0x0000aaaad9f7ef84 in virtio_bus_device_plugged ()
#8  0x0000aaaad9e783b8 in virtio_device_realize ()
#9  0x0000aaaad9f451ac in device_set_realized ()
#10 0x0000aaaad9fca460 in property_set_bool ()
#11 0x0000aaaad9fcbf28 in object_property_set ()
#12 0x0000aaaad9fce3d8 in object_property_set_qobject ()
#13 0x0000aaaad9fcc0e4 in object_property_set_bool ()
#14 0x0000aaaad9f7e770 in virtio_pci_realize ()
#15 0x0000aaaad9f6211c in pci_qdev_realize ()
#16 0x0000aaaad9f7c058 in virtio_pci_dc_realize ()
#17 0x0000aaaad9f451ac in device_set_realized ()
#18 0x0000aaaad9fca460 in property_set_bool ()
#19 0x0000aaaad9fcbf28 in object_property_set ()
#20 0x0000aaaad9fce3d8 in object_property_set_qobject ()
#21 0x0000aaaad9fcc0e4 in object_property_set_bool ()
#22 0x0000aaaad9eede0c in qdev_device_add ()
#23 0x0000aaaad9eee408 in qmp_device_add ()
#24 0x0000aaaada0699e0 in qmp_dispatch ()
#25 0x0000aaaad9e3244c in handle_qmp_command ()
#26 0x0000aaaada06ef48 in json_message_process_token ()
#27 0x0000aaaada086340 in json_lexer_feed_char ()
#28 0x0000aaaada086428 in json_lexer_feed ()
#29 0x0000aaaad9e30d30 in monitor_qmp_read ()
#30 0x0000aaaad9ef38d8 in qemu_chr_be_write ()
#31 0x0000aaaad9ef3cfc in tcp_chr_read ()
#32 0x0000aaaada03695c in qio_channel_fd_source_dispatch ()
#33 0x0000ffff8d07ebb4 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#34 0x0000aaaad9fd7ba4 in main_loop_wait ()
#35 0x0000aaaad9df7af8 in main ()

--- Additional comment from Andrew Jones on 2017-03-14 12:20:33 CET ---

I think there's another libvirt bug here, as well as a qemu bug. Based on the stack trace it appears we're calling error_setg from virtio_pci_device_plugged with a preexisting error (that's the qemu bug). The preexisting error comes from virtio_blk_get_features (that's the libvirt bug). Here's the sequence

virtio_bus_device_plugged       -- errp is NULL
  virtio_blk_get_features       -- errp is set, vdev->host_features = 0
    virtio_pci_device_plugged
      if !ignore_backend_features && !(vdev->host_features & VIRTIO_F_VERSION_1)
         && !legacy
        error_setg(errp, ...)

As all the conditions are true (!(vdev->host_features & VIRTIO_F_VERSION_1) is true because vdev->host_features has been set to zero by the failure of virtio_blk_get_features), then we attempt to error [again].

To solve the QEMU bug, I don't think we should call virtio_pci_device_plugged when virtio_blk_get_features fails, but rather we should propagate the error.

To solve the libvirt bug, I don't think we should do what's making virtio_blk_get_features fail. So what's making it fail? Commit efb8206ca7f1, which states we can't have both scsi=on and disable-modern=off,disable-legacy=on at the same time, as SCSI passthrough is no longer supported in virtio 1.0.

Comment 2 Andrew Jones 2017-03-14 12:33:50 UTC
This clone was made to address the last paragraph of bug 1431224 comment 3.

Comment 3 Ján Tomko 2017-03-14 13:46:30 UTC

*** This bug has been marked as a duplicate of bug 1365823 ***