Description of problem: This is a tracker bug about a failure of RHV SHE deployment on CEPH iSCSI During HE deploy ansible playbook copy local HE VM to destination point and then HE VM is powered up. Problem is inside: # qemu-img convert -f qcow2 -O raw -t none -T none /var/tmp/localvm_ee80jvz/images/edf95fe8-29cc-4f85-85ff-de4ad2dfa4f6/c4143adf-1131-4318-8c31-71c299ef085b /rhev/data-center/mnt/blockSD/5588aa11-c93b-4071-bdf4-4aff26956933/images/7bb6ac1d-df41-475d-9c23-b30f9c7713af/bea84fe3-e376-46e5-91d5-f04082435a64 And then when I compare those images with the utility: [root@oncilla04 ~]# qemu-img compare /var/tmp/localvm_ee80jvz/images/edf95fe8-29cc-4f85-85ff-de4ad2dfa4f6/c4143adf-1131-4318-8c31-71c299ef085b /rhev/data-center/mnt/blockSD/5588aa11-c93b-4071-bdf4-4aff26956933/images//7bb6ac1d-df41-475d-9c23-b30f9c7713af/bea84fe3-e376-46e5-91d5-f04082435a64 Content mismatch at offset 541908992! Looks like HE VM is corrupted during the copying to the CEPH iSCSI destination It's necessary to check why it is happening and open a related bug to particular component Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-2.4.9-4.el8ev.noarch Ceph 4.2.8-115.el8cp qemu-img-5.1.0-20.module+el8.3.1+9918+230f5c26.x86_64 How reproducible: always Steps to Reproduce: 1. Hosted Engine deploy with iSCSI CEPH backend - playbook will copy image to iSCSI CEPH volume 2. Failing on liveliness check of the VM - VM is up but crashed to maintenance or (0. Set cluster to global maintenance after failure, power off HE VM) 1. qemu-img convert -f qcow2 -O raw -t none -T none <HE local temporary disk> <Ceph iSCSI destination> 2. qemu-img compare <HE local temporary disk> <Ceph iSCSI destination> Actual results: HE disk is probably corrupted, need further investigation why it is happening Expected results: Should work Additional info: This is working when using NetApp iSCSI backend
Fix in Ceph exist and verified[1] , moving this bug to ON_QA. We can not verify this also in RHV 4.4.7. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1934092
Verified with HE 4.4.7 deployed What was tested: - comparing qemu-img source and destination with "qemu-img compare" = Images were identical - fallocate punch holes and writing data worked as expected (detailed information how it was tested here: https://bugzilla.redhat.com/show_bug.cgi?id=1934092) - dd and fallocate to find any lun corruption - works as expected # ceph -v ceph version 14.2.11-179.el8cp (29de9ae52bcc20e38eb86cb8e4163bff2d1719c8) nautilus (stable) engine: ovirt-engine-4.4.7.4-0.9.el8ev.noarch
This bugzilla is included in oVirt 4.4.7 release, published on July 6th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.7 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.