Bug 1877790 - lsm causes disk to change from RAW to QCOW2, but database is not updated
Summary: lsm causes disk to change from RAW to QCOW2, but database is not updated
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.4.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.4.2
: 4.4.2.6
Assignee: Benny Zlotnik
QA Contact: Evelina Shames
URL:
Whiteboard:
Depends On:
Blocks: 1878341
TreeView+ depends on / blocked
 
Reported: 2020-09-10 12:57 UTC by Jean-Louis Dupond
Modified: 2020-09-18 09:00 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.4.2.6
Clone Of:
: 1878341 (view as bug list)
Environment:
Last Closed: 2020-09-18 07:13:09 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.4+
michal.skrivanek: blocker?
tnisan: devel_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 111264 0 master MERGED core: update volume format after move 2021-02-07 14:43:04 UTC
oVirt gerrit 111270 0 ovirt-engine-4.4.2.z MERGED core: update volume format after move 2021-02-07 14:43:04 UTC

Description Jean-Louis Dupond 2020-09-10 12:57:23 UTC
Description of problem:
We've did a live storage migration of a VM from NFS to iSCSI.
Everything was fine, until we restarted the VM. 1 disk could not be mounted.

After some debugging, I found out that from inside the VM the disk was visible as QCOW!

Now I was able to reproduce this quite easly.

Create a raw disk on a random VM on NFS Storage:
2020-09-10 14:43:22,030+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (default task-55) [a09ff0e6-ea0a-48c9-af5c-5223c42d89d9] START, CreateImageVDSCommand( CreateImageVDSCommandParameters:{storagePoolId='d497efe5-2344-4d58-8985-7b053d3c35a3', ignoreFailoverLimit='false', storageDomainId='500c30e6-efe7-4dc8-b42d-7252dd812769', imageGroupId='12f8ecc3-f1b4-42ac-814c-af422aa49512', imageSizeInBytes='53687091200', volumeFormat='RAW', newImageId='06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1', imageType='Sparse', newImageDescription='{"DiskAlias":"bugtest","DiskDescription":""}', imageInitialSizeInBytes='0'}), log id: 7546bfda

Now initiage a live storage migration of that disk to a iSCSI storage:
You'll get the following warning:
"Block storage domain does not support disk format raw with volume type sparse. The following disks format will become qcow2: bugtest"


Volume is created as QCOW on destination:
2020-09-10 14:46:02,091+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-35) [753fd06d-b91e-4e31-b737-c40a83440f33] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='d497efe5-2344-4d58-8985-7b053d3c35a3', ignoreFailoverLimit='false', storageDomainId='6e99da85-8414-4ec5-92c3-b6cf741fc125', imageGroupId='12f8ecc3-f1b4-42ac-814c-af422aa49512', imageSizeInBytes='53687091200', volumeFormat='COW', newImageId='06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1', imageType='Sparse', newImageDescription='null', imageInitialSizeInBytes='53695545344', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000'}), log id: 4c9f7790


When migration is done, dumpxml gives the following:
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads'/>
      <source dev='/rhev/data-center/mnt/blockSD/6e99da85-8414-4ec5-92c3-b6cf741fc125/images/12f8ecc3-f1b4-42ac-814c-af422aa49512/06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1' index='9'>
        <seclabel model='dac' relabel='no'/>
      </source>
      <backingStore/>
      <target dev='sdd' bus='scsi'/>
      <serial>12f8ecc3-f1b4-42ac-814c-af422aa49512</serial>
      <alias name='ua-12f8ecc3-f1b4-42ac-814c-af422aa49512'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>


Which is correct!

But now shutdown the VM, and start it:
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/6e99da85-8414-4ec5-92c3-b6cf741fc125/images/12f8ecc3-f1b4-42ac-814c-af422aa49512/06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1' index='3'>
        <seclabel model='dac' relabel='no'/>
      </source>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>12f8ecc3-f1b4-42ac-814c-af422aa49512</serial>
      <alias name='ua-12f8ecc3-f1b4-42ac-814c-af422aa49512'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>


And in the VM:
# file -s /dev/sdc
/dev/sdc: QEMU QCOW Image (v3), 53687091200 bytes


So it seems like some entry didn't change in the database?
Also, how to fix the currect VM in this state?

Comment 1 RHEL Program Management 2020-09-11 09:09:39 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 2 Germano Veit Michel 2020-09-12 05:28:18 UTC
I can easily reproduce the same on:
* vdsm-4.40.22-1.el8ev.x86_64
* ovirt-engine-4.4.1.10-0.1.el8ev.noarch

* does not happen on copy or cold move. Only on live move.
* only the DB volume_format seems wrong. Storage metadata is fine.

1. Create Thin Disk on NFS:

# su vdsm -s /bin/sh -c "qemu-img info ea8bcf2e-d2c1-410b-8907-38e36e765b19"
image: ea8bcf2e-d2c1-410b-8907-38e36e765b19
file format: raw
virtual size: 1 GiB (1073741824 bytes)
disk size: 4 KiB

# cat ea8bcf2e-d2c1-410b-8907-38e36e765b19.meta 
CAP=1073741824
CTIME=1599886753
DESCRIPTION={"DiskAlias":"TestDisk","DiskDescription":""}
DISKTYPE=DATA
DOMAIN=d2def521-aa89-4738-aaca-5b618b97e925
FORMAT=RAW
GEN=0
IMAGE=973c50fc-7e21-4a48-a130-40acb6fa5744
LEGALITY=LEGAL
PUUID=00000000-0000-0000-0000-000000000000
TYPE=SPARSE
VOLTYPE=LEAF
EOF


engine=# select image_guid,image_group_id,size,volume_format,volume_type from images where image_group_id = '973c50fc-7e21-4a48-a130-40acb6fa5744';
              image_guid              |            image_group_id            |    size    | volume_format | volume_type 
--------------------------------------+--------------------------------------+------------+---------------+-------------
 ea8bcf2e-d2c1-410b-8907-38e36e765b19 | 973c50fc-7e21-4a48-a130-40acb6fa5744 | 1073741824 |             5 |           2
(1 row)

2. Move to block

3. After moving:

# qemu-img info /dev/729c0555-4148-4fcf-b5c9-4f07ec9f0307/ea8bcf2e-d2c1-410b-8907-38e36e765b19 
image: /dev/729c0555-4148-4fcf-b5c9-4f07ec9f0307/ea8bcf2e-d2c1-410b-8907-38e36e765b19
file format: qcow2
virtual size: 1 GiB (1073741824 bytes)
disk size: 0 B
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

# dd if=/dev/729c0555-4148-4fcf-b5c9-4f07ec9f0307/metadata bs=8k count=1 skip=129
CAP=1073741824
CTIME=1599887113
DESCRIPTION=None
DISKTYPE=DATA
DOMAIN=729c0555-4148-4fcf-b5c9-4f07ec9f0307
FORMAT=COW
GEN=1
IMAGE=973c50fc-7e21-4a48-a130-40acb6fa5744
LEGALITY=LEGAL
PUUID=00000000-0000-0000-0000-000000000000
TYPE=SPARSE
VOLTYPE=INTERNAL
EOF

engine=# select image_guid,image_group_id,size,volume_format,volume_type from images where image_group_id = '973c50fc-7e21-4a48-a130-40acb6fa5744';
              image_guid              |            image_group_id            |    size    | volume_format | volume_type 
--------------------------------------+--------------------------------------+------------+---------------+-------------
 ea8bcf2e-d2c1-410b-8907-38e36e765b19 | 973c50fc-7e21-4a48-a130-40acb6fa5744 | 1073741824 |             5 |           2
(1 row)

4. Shutdown the VM

5. Start again

6. Engine generates wrong XML for the disk, as volume_format is wrong in the Database

2020-09-12 15:14:30,134+10 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (EE-ManagedThreadFactory-engine-Thread-880) [c1ab3844-5b6b-431b-9654-4ba771434ace] VM <?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0">
...
    <disk snapshot="no" type="file" device="disk">
      <target dev="sda" bus="scsi"/>
      <source file="/rhev/data-center/2ce9d738-dd1f-11ea-bb9a-5254000000ff/d2def521-aa89-4738-aaca-5b618b97e925/images/5359b6bc-93c6-42b1-a779-f6037d08ed47/f01f1989-70ff-4e1f-a2c2-9bc5ddeb800d">
        <seclabel model="dac" type="none" relabel="no"/>
      </source>
      <driver name="qemu" io="threads" type="raw" error_policy="stop" cache="none"/>   <------- RAW
      <alias name="ua-5359b6bc-93c6-42b1-a779-f6037d08ed47"/>
      <address bus="0" controller="0" unit="0" type="drive" target="0"/>
      <boot order="1"/>
      <serial>5359b6bc-93c6-42b1-a779-f6037d08ed47</serial>
    </disk>
    
7. Since just the DB is wrong, the fix is relatively simple if one knows the image_guid
UPDATE images SET volume_format = '4' WHERE image_guid = 'ea8bcf2e-d2c1-410b-8907-38e36e765b19';

Comment 3 Germano Veit Michel 2020-09-12 05:32:10 UTC
(In reply to Jean-Louis Dupond from comment #0)
> So it seems like some entry didn't change in the database?
> Also, how to fix the currect VM in this state?

Thanks for reporting this Jean-Louis.

In my reproducer only the database is incorrect indeed, the storage metadata is fine.

Please try this in ovirt-engine, then start the VM again:
$ /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "UPDATE images SET volume_format = '4' WHERE image_guid = '06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1'"

You should see type='qcow2' in the XML after the change above.

Does it work for you as well?

Comment 7 Jean-Louis Dupond 2020-09-14 09:54:29 UTC
    <disk type='block' device='disk' snapshot='no'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source dev='/rhev/data-center/mnt/blockSD/6e99da85-8414-4ec5-92c3-b6cf741fc125/images/12f8ecc3-f1b4-42ac-814c-af422aa49512/06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1' index='3'>
        <seclabel model='dac' relabel='no'/>
      </source>

The db update fixed it indeed!

Comment 8 Germano Veit Michel 2020-09-14 22:17:45 UTC
(In reply to Jean-Louis Dupond from comment #7)
>     <disk type='block' device='disk' snapshot='no'>
>       <driver name='qemu' type='qcow2' cache='none' error_policy='stop'
> io='native'/>
>       <source
> dev='/rhev/data-center/mnt/blockSD/6e99da85-8414-4ec5-92c3-b6cf741fc125/
> images/12f8ecc3-f1b4-42ac-814c-af422aa49512/06b0a1ce-1e89-4c12-ab37-
> 9b30fbb4f8e1' index='3'>
>         <seclabel model='dac' relabel='no'/>
>       </source>
> 
> The db update fixed it indeed!

Thank you for confirming!

Comment 9 Evelina Shames 2020-09-16 10:04:59 UTC
Verified with the following steps:
1. Create a VM with RAW Sparse disk on a file domain
2. Start the VM
3. Live migrate the disk to a block domain
4. Shutdown the VM
5. Start the VM

Before the fix: Operation fails on booting as the volume type sent to Libvirt is wrong
After the fix: Operation succeeded


Version: engine-4.4.2.6-0.2

Comment 10 Sandro Bonazzola 2020-09-18 07:13:09 UTC
This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.