Description of problem: We've did a live storage migration of a VM from NFS to iSCSI. Everything was fine, until we restarted the VM. 1 disk could not be mounted. After some debugging, I found out that from inside the VM the disk was visible as QCOW! Now I was able to reproduce this quite easly. Create a raw disk on a random VM on NFS Storage: 2020-09-10 14:43:22,030+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (default task-55) [a09ff0e6-ea0a-48c9-af5c-5223c42d89d9] START, CreateImageVDSCommand( CreateImageVDSCommandParameters:{storagePoolId='d497efe5-2344-4d58-8985-7b053d3c35a3', ignoreFailoverLimit='false', storageDomainId='500c30e6-efe7-4dc8-b42d-7252dd812769', imageGroupId='12f8ecc3-f1b4-42ac-814c-af422aa49512', imageSizeInBytes='53687091200', volumeFormat='RAW', newImageId='06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1', imageType='Sparse', newImageDescription='{"DiskAlias":"bugtest","DiskDescription":""}', imageInitialSizeInBytes='0'}), log id: 7546bfda Now initiage a live storage migration of that disk to a iSCSI storage: You'll get the following warning: "Block storage domain does not support disk format raw with volume type sparse. The following disks format will become qcow2: bugtest" Volume is created as QCOW on destination: 2020-09-10 14:46:02,091+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-35) [753fd06d-b91e-4e31-b737-c40a83440f33] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='d497efe5-2344-4d58-8985-7b053d3c35a3', ignoreFailoverLimit='false', storageDomainId='6e99da85-8414-4ec5-92c3-b6cf741fc125', imageGroupId='12f8ecc3-f1b4-42ac-814c-af422aa49512', imageSizeInBytes='53687091200', volumeFormat='COW', newImageId='06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1', imageType='Sparse', newImageDescription='null', imageInitialSizeInBytes='53695545344', imageId='00000000-0000-0000-0000-000000000000', sourceImageGroupId='00000000-0000-0000-0000-000000000000'}), log id: 4c9f7790 When migration is done, dumpxml gives the following: <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads'/> <source dev='/rhev/data-center/mnt/blockSD/6e99da85-8414-4ec5-92c3-b6cf741fc125/images/12f8ecc3-f1b4-42ac-814c-af422aa49512/06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1' index='9'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> <target dev='sdd' bus='scsi'/> <serial>12f8ecc3-f1b4-42ac-814c-af422aa49512</serial> <alias name='ua-12f8ecc3-f1b4-42ac-814c-af422aa49512'/> <address type='drive' controller='0' bus='0' target='0' unit='3'/> </disk> Which is correct! But now shutdown the VM, and start it: <disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='raw' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/mnt/blockSD/6e99da85-8414-4ec5-92c3-b6cf741fc125/images/12f8ecc3-f1b4-42ac-814c-af422aa49512/06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1' index='3'> <seclabel model='dac' relabel='no'/> </source> <backingStore/> <target dev='sda' bus='scsi'/> <serial>12f8ecc3-f1b4-42ac-814c-af422aa49512</serial> <alias name='ua-12f8ecc3-f1b4-42ac-814c-af422aa49512'/> <address type='drive' controller='0' bus='0' target='0' unit='3'/> </disk> And in the VM: # file -s /dev/sdc /dev/sdc: QEMU QCOW Image (v3), 53687091200 bytes So it seems like some entry didn't change in the database? Also, how to fix the currect VM in this state?
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
I can easily reproduce the same on: * vdsm-4.40.22-1.el8ev.x86_64 * ovirt-engine-4.4.1.10-0.1.el8ev.noarch * does not happen on copy or cold move. Only on live move. * only the DB volume_format seems wrong. Storage metadata is fine. 1. Create Thin Disk on NFS: # su vdsm -s /bin/sh -c "qemu-img info ea8bcf2e-d2c1-410b-8907-38e36e765b19" image: ea8bcf2e-d2c1-410b-8907-38e36e765b19 file format: raw virtual size: 1 GiB (1073741824 bytes) disk size: 4 KiB # cat ea8bcf2e-d2c1-410b-8907-38e36e765b19.meta CAP=1073741824 CTIME=1599886753 DESCRIPTION={"DiskAlias":"TestDisk","DiskDescription":""} DISKTYPE=DATA DOMAIN=d2def521-aa89-4738-aaca-5b618b97e925 FORMAT=RAW GEN=0 IMAGE=973c50fc-7e21-4a48-a130-40acb6fa5744 LEGALITY=LEGAL PUUID=00000000-0000-0000-0000-000000000000 TYPE=SPARSE VOLTYPE=LEAF EOF engine=# select image_guid,image_group_id,size,volume_format,volume_type from images where image_group_id = '973c50fc-7e21-4a48-a130-40acb6fa5744'; image_guid | image_group_id | size | volume_format | volume_type --------------------------------------+--------------------------------------+------------+---------------+------------- ea8bcf2e-d2c1-410b-8907-38e36e765b19 | 973c50fc-7e21-4a48-a130-40acb6fa5744 | 1073741824 | 5 | 2 (1 row) 2. Move to block 3. After moving: # qemu-img info /dev/729c0555-4148-4fcf-b5c9-4f07ec9f0307/ea8bcf2e-d2c1-410b-8907-38e36e765b19 image: /dev/729c0555-4148-4fcf-b5c9-4f07ec9f0307/ea8bcf2e-d2c1-410b-8907-38e36e765b19 file format: qcow2 virtual size: 1 GiB (1073741824 bytes) disk size: 0 B cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # dd if=/dev/729c0555-4148-4fcf-b5c9-4f07ec9f0307/metadata bs=8k count=1 skip=129 CAP=1073741824 CTIME=1599887113 DESCRIPTION=None DISKTYPE=DATA DOMAIN=729c0555-4148-4fcf-b5c9-4f07ec9f0307 FORMAT=COW GEN=1 IMAGE=973c50fc-7e21-4a48-a130-40acb6fa5744 LEGALITY=LEGAL PUUID=00000000-0000-0000-0000-000000000000 TYPE=SPARSE VOLTYPE=INTERNAL EOF engine=# select image_guid,image_group_id,size,volume_format,volume_type from images where image_group_id = '973c50fc-7e21-4a48-a130-40acb6fa5744'; image_guid | image_group_id | size | volume_format | volume_type --------------------------------------+--------------------------------------+------------+---------------+------------- ea8bcf2e-d2c1-410b-8907-38e36e765b19 | 973c50fc-7e21-4a48-a130-40acb6fa5744 | 1073741824 | 5 | 2 (1 row) 4. Shutdown the VM 5. Start again 6. Engine generates wrong XML for the disk, as volume_format is wrong in the Database 2020-09-12 15:14:30,134+10 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (EE-ManagedThreadFactory-engine-Thread-880) [c1ab3844-5b6b-431b-9654-4ba771434ace] VM <?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0"> ... <disk snapshot="no" type="file" device="disk"> <target dev="sda" bus="scsi"/> <source file="/rhev/data-center/2ce9d738-dd1f-11ea-bb9a-5254000000ff/d2def521-aa89-4738-aaca-5b618b97e925/images/5359b6bc-93c6-42b1-a779-f6037d08ed47/f01f1989-70ff-4e1f-a2c2-9bc5ddeb800d"> <seclabel model="dac" type="none" relabel="no"/> </source> <driver name="qemu" io="threads" type="raw" error_policy="stop" cache="none"/> <------- RAW <alias name="ua-5359b6bc-93c6-42b1-a779-f6037d08ed47"/> <address bus="0" controller="0" unit="0" type="drive" target="0"/> <boot order="1"/> <serial>5359b6bc-93c6-42b1-a779-f6037d08ed47</serial> </disk> 7. Since just the DB is wrong, the fix is relatively simple if one knows the image_guid UPDATE images SET volume_format = '4' WHERE image_guid = 'ea8bcf2e-d2c1-410b-8907-38e36e765b19';
(In reply to Jean-Louis Dupond from comment #0) > So it seems like some entry didn't change in the database? > Also, how to fix the currect VM in this state? Thanks for reporting this Jean-Louis. In my reproducer only the database is incorrect indeed, the storage metadata is fine. Please try this in ovirt-engine, then start the VM again: $ /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "UPDATE images SET volume_format = '4' WHERE image_guid = '06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1'" You should see type='qcow2' in the XML after the change above. Does it work for you as well?
<disk type='block' device='disk' snapshot='no'> <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/> <source dev='/rhev/data-center/mnt/blockSD/6e99da85-8414-4ec5-92c3-b6cf741fc125/images/12f8ecc3-f1b4-42ac-814c-af422aa49512/06b0a1ce-1e89-4c12-ab37-9b30fbb4f8e1' index='3'> <seclabel model='dac' relabel='no'/> </source> The db update fixed it indeed!
(In reply to Jean-Louis Dupond from comment #7) > <disk type='block' device='disk' snapshot='no'> > <driver name='qemu' type='qcow2' cache='none' error_policy='stop' > io='native'/> > <source > dev='/rhev/data-center/mnt/blockSD/6e99da85-8414-4ec5-92c3-b6cf741fc125/ > images/12f8ecc3-f1b4-42ac-814c-af422aa49512/06b0a1ce-1e89-4c12-ab37- > 9b30fbb4f8e1' index='3'> > <seclabel model='dac' relabel='no'/> > </source> > > The db update fixed it indeed! Thank you for confirming!
Verified with the following steps: 1. Create a VM with RAW Sparse disk on a file domain 2. Start the VM 3. Live migrate the disk to a block domain 4. Shutdown the VM 5. Start the VM Before the fix: Operation fails on booting as the volume type sent to Libvirt is wrong After the fix: Operation succeeded Version: engine-4.4.2.6-0.2
This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.