Bug 1658504 - Guest crash after restart libvirtd when start guest with iscsi-direct volume
Summary: Guest crash after restart libvirtd when start guest with iscsi-direct volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: 8.0
Assignee: Michal Privoznik
QA Contact: Meina Li
URL:
Whiteboard:
Depends On:
Blocks: 1685151
TreeView+ depends on / blocked
 
Reported: 2018-12-12 10:15 UTC by Meina Li
Modified: 2020-11-14 07:25 UTC (History)
10 users (show)

Fixed In Version: libvirt-5.3.0-1.el8
Doc Type: Bug Fix
Doc Text:
Cause: When libvirt started a domain it fills in some runtime information for each of the disks. However, for disk type='volume' and source pool type of 'iscsi-direct' it created an invalid combination of data. Therefore, when libvirtd was restarted and some sanity checks were performed this combination was caught and the domain was killed as a result. Consequence: Fix: Result:
Clone Of:
: 1685151 (view as bug list)
Environment:
Last Closed: 2019-11-06 07:12:13 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3723 0 None None None 2019-11-06 07:12:37 UTC

Description Meina Li 2018-12-12 10:15:06 UTC
Description of problem:
Guest crash after restart libvirtd when start guest with iscsi-direct volume

Version-Release number of selected component (if applicable):
libvirt-4.10.0-1.module+el8+2317+367e35b5.x86_64
qemu-kvm-3.0.0-2.module+el8+2246+78080371.x86_64

How reproducible:
100%

Steps to Reproduce:

1. Prepare an iscsi-direct pool.
# virsh pool-dumpxml iscsi-direct 
<pool type='iscsi-direct'>
  <name>iscsi-direct</name>
  <uuid>0799697a-94dd-4115-9601-8714b1931248</uuid>
  <capacity unit='bytes'>524287488</capacity>
  <allocation unit='bytes'>524287488</allocation>
  <available unit='bytes'>0</available>
  <source>
    <host name='10.66.144.87'/>
    <device path='iqn.2017-12.com.virttest:emulated-iscsi-noauth.target2'/>
    <initiator>
      <iqn name='iqn.2017-12.com.example:client'/>
    </initiator>
  </source>
</pool>

# virsh vol-list iscsi-direct 
 Name         Path                                                                                     
-------------------------------------------------------------------------------
 unit:0:0:0   ip-10.66.144.87:3260-iscsi-iqn.2017-12.com.virttest:emulated-iscsi-noauth.target2-lun-0

2. Start guest with iscsi volume.
# virsh dumpxml q35 | grep disk -a8
...
<disk type='volume' device='disk'>
      <driver name='qemu' type='raw'/>
      <source pool='iscsi-direct' volume='unit:0:0:0' mode='direct'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>

# virsh start q35
Domain q35 started

# virsh list --all
 Id   Name             State     
---------------------------------
 37   lmn              running   
 49   q35              running   

3. Restart libvirtd.
# systemctl restart libvirtd
# virsh list --all
 Id   Name             State     
---------------------------------
 37   lmn              running    
 -    q35              shut off  

Actual results:
As step 3, the guest crash after restart libvirtd

Expected results:
The guest should still be running

Additional info:
1) qemu.log:
2018-12-12 02:51:39.178+0000: 22786: debug : virFileClose:109 : Closed fd 38
2018-12-12 02:51:39.178+0000: 22786: debug : virFileClose:109 : Closed fd 39
2018-12-12 02:51:39.178+0000: 22786: debug : virCommandHandshakeChild:460 : Handshake with parent is done
2018-12-12T02:51:39.218693Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/4 (label charserial0)
2018-12-12 02:51:49.103+0000: shutting down, reason=crashed
2018-12-12T02:51:49.106079Z qemu-kvm: terminating on signal 15 from pid 22843 (<unknown process>)

2) debug.log:
2018-12-12 02:51:49.103+0000: 22910: info : virObjectUnref:344 : OBJECT_UNREF: obj=0x5580600d95b0
2018-12-12 02:51:49.103+0000: 22910: info : virObjectUnref:344 : OBJECT_UNREF: obj=0x7f5c680f1f00
2018-12-12 02:51:49.103+0000: 22910: error : virDomainDiskTranslateSourcePool:30383 : XML error: disk source mode is only valid when storage pool is of iscsi type
2018-12-12 02:51:49.103+0000: 22910: info : virObjectUnref:344 : OBJECT_UNREF: obj=0x7f5c540049e0
2018-12-12 02:51:49.103+0000: 22910: info : virObjectUnref:344 : OBJECT_UNREF: obj=0x7f5c54006640
2018-12-12 02:51:49.103+0000: 22910: info : virObjectUnref:346 : OBJECT_DISPOSE: obj=0x7f5c54006640
2018-12-12 02:51:49.103+0000: 22910: debug : virStoragePoolDispose:517 : release pool 0x7f5c54006640 iscsi-direct 0799697a-94dd-4115-9601-8714b1931248

Comment 2 Michal Privoznik 2019-02-28 14:47:02 UTC
(In reply to Meina Li from comment #0)
> Description of problem:
> Guest crash after restart libvirtd when start guest with iscsi-direct volume
> 
> Version-Release number of selected component (if applicable):
> libvirt-4.10.0-1.module+el8+2317+367e35b5.x86_64
> qemu-kvm-3.0.0-2.module+el8+2246+78080371.x86_64
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 
> 1. Prepare an iscsi-direct pool.
> # virsh pool-dumpxml iscsi-direct 
> <pool type='iscsi-direct'>
>   <name>iscsi-direct</name>
>   <uuid>0799697a-94dd-4115-9601-8714b1931248</uuid>
>   <capacity unit='bytes'>524287488</capacity>
>   <allocation unit='bytes'>524287488</allocation>
>   <available unit='bytes'>0</available>
>   <source>
>     <host name='10.66.144.87'/>
>     <device path='iqn.2017-12.com.virttest:emulated-iscsi-noauth.target2'/>
>     <initiator>
>       <iqn name='iqn.2017-12.com.example:client'/>
>     </initiator>
>   </source>
> </pool>
> 
> # virsh vol-list iscsi-direct 
>  Name         Path                                                          
> 
> -----------------------------------------------------------------------------
> --
>  unit:0:0:0  
> ip-10.66.144.87:3260-iscsi-iqn.2017-12.com.virttest:emulated-iscsi-noauth.
> target2-lun-0
> 
> 2. Start guest with iscsi volume.
> # virsh dumpxml q35 | grep disk -a8
> ...
> <disk type='volume' device='disk'>
>       <driver name='qemu' type='raw'/>
>       <source pool='iscsi-direct' volume='unit:0:0:0' mode='direct'/>
>       <target dev='vdb' bus='virtio'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
> function='0x0'/>
>     </disk>
> 
> # virsh start q35
> Domain q35 started

This is where I have troubles. When I try to start such domain I get the error immediatelly:
virsh # start fedora
error: Failed to start domain fedora
error: XML error: disk source mode is only valid when storage pool is of iscsi type

Is it possible that the domain was started with an older libvirt and what we see here is just during libvirt upgrade?

Comment 3 Meina Li 2019-03-01 03:17:50 UTC
(In reply to Michal Privoznik from comment #2)
...
> 
> This is where I have troubles. When I try to start such domain I get the
> error immediatelly:
> virsh # start fedora
> error: Failed to start domain fedora
> error: XML error: disk source mode is only valid when storage pool is of
> iscsi type
> 
> Is it possible that the domain was started with an older libvirt and what we
> see here is just during libvirt upgrade?

I can also encounter this trouble in the latest version.
But the domain will start successfully when there's no mode='direct' in domain xml:
# virsh dumpxml q35 | grep disk -a8
...
<disk type='volume' device='disk'>
      <driver name='qemu' type='raw'/>
      <source pool='iscsi-direct' volume='unit:0:0:0'/>              --no mode='direct' in disk source
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
# virsh start q35
Domain q35 started

I think this maybe an another new bug except there's some new design methods on it. Please review it again, thanks.

Test Version:
libvirt-5.0.0-4.module+el8+2835+faae67de.x86_64
qemu-kvm-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64

Comment 4 Michal Privoznik 2019-03-04 14:04:01 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2019-March/msg00093.html

Comment 5 Michal Privoznik 2019-03-04 15:58:50 UTC
I've just merged the commit upstream:

commit e89694735011fad95bf9fc61221744e69d695137
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Mar 1 16:05:16 2019 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Mon Mar 4 16:54:11 2019 +0100

    virDomainDiskTranslateSourcePool: Don't set @mode of iscsi-direct
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1658504
    
    This function is called when a domain is starting up (in qemu
    driver that is when qemu cmd line is generated). It is used to
    translate <disk type='volume'/> to something usable by filling in
    virStorageSource (e.g. fetching disk path, or some connection URI
    for a network FS). But some of these info are not stored in
    status XML and thus the function is called on
    qemuProcessReconnect too to reconstruct runtime data. But this
    poses a problem because after the first run the mode is set to
    'direct', but in the second run this triggers a failure because
    mode is valid only for 'iscsi' volumes and not 'iscsi-direct'
    ones.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Erik Skultety <eskultet>


v5.1.0-20-ge896947350

Comment 7 Meina Li 2019-06-26 09:49:27 UTC
Verified Version:
libvirt-5.4.0-1.module+el8.1.0+3304+7eb41d5f.x86_64
qemu-kvm-4.0.0-4.module+el8.1.0+3356+cda7f1ee.x86_64
kernel-4.18.0-107.el8.x86_64

Verified Steps:
Scenario 1: Start the guest with iscsi-direct volume and check list after libvirtd restart.
1. Prepare an iscsi-direct pool.
# virsh pool-dumpxml iscsi-direct 
<pool type='iscsi-direct'>
  <name>iscsi-direct</name>
  <uuid>2b621385-f734-4b98-8131-0fc17ed29e67</uuid>
  <capacity unit='bytes'>64424704512</capacity>
  <allocation unit='bytes'>64424704512</allocation>
  <available unit='bytes'>0</available>
  <source>
    <host name='**IP**'/>
    <device path='iqn.2017-12.com.virttest:emulated-iscsi-noauth.target2'/>
    <initiator>
      <iqn name='iqn.2017-12.com.example:client'/>
    </initiator>
  </source>
</pool>
# virsh vol-list  iscsi-direct 
 Name         Path
------------------------------------------------------------------------------------------------------
 unit:0:0:0   ip-10.66.4.109:3260-iscsi-iqn.2017-12.com.virttest:emulated-iscsi-noauth.target2-lun-0

2. Start the guest with the following volume disk.
...
<disk type='volume' device='disk'>
      <driver name='qemu' type='raw'/>
      <source pool='iscsi-direct' volume='unit:0:0:0'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </disk>
...
# virsh start lmn
Domain lmn started

3. Check the guest info after restart libvirtd/save/restore guest.
# virsh list --all
 Id   Name   State
-----------------------
 1    lmn    running
# systemctl restart libvirtd
# virsh list --all
 Id   Name   State
-----------------------
 5    lmn    running
# virsh save lmn test.save
Domain lmn saved to test.save
# virsh list --all
 Id   Name   State
-----------------------
 -    lmn    shut off
# virsh restore test.save 
Domain restored from test.save
# virsh list --all
 Id   Name   State
-----------------------
 6    lmn    running

Scenario 2: Hotplug/unplug iscsi-pool volume disk to the guest.
1. Prepare the disk xml with iscsi-pool volume info.
# cat disk.xml 
<disk type='volume' device='disk'>
      <driver name='qemu' type='raw'/>
      <source pool='iscsi-direct' volume='unit:0:0:0'/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </disk>

2. Hotplug the disk to guest.
# virsh attach-device lmn disk.xml 
Device attached successfully
# virsh dumpxml lmn | grep disk -a8
...
 <disk type='volume' device='disk'>
      <driver name='qemu' type='raw'/>
      <source pool='iscsi-direct' volume='unit:0:0:0'/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </disk>
...

3. Detach the disk from the guest.
# virsh detach-device lmn disk.xml 
Device detached successfully
# virsh dumpxml lmn | grep disk -a8
...
No this volume disk
...

So move this bug to be verified.

Comment 9 errata-xmlrpc 2019-11-06 07:12:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723


Note You need to log in before you can comment on or make changes to this bug.