Bug 1819125 - Cold storage migration to iscsi domain fails
Summary: Cold storage migration to iscsi domain fails
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.40.9
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.4.0
: 4.40.13
Assignee: Benny Zlotnik
QA Contact: Evelina Shames
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-31 09:18 UTC by Evelina Shames
Modified: 2020-05-20 19:59 UTC (History)
4 users (show)

Fixed In Version: vdsm-4.40.13
Clone Of:
Environment:
Last Closed: 2020-05-20 19:59:48 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.4+
pm-rhel: blocker?


Attachments (Terms of Use)
Logs (15.07 MB, application/zip)
2020-03-31 09:18 UTC, Evelina Shames
no flags Details
logs (1.39 MB, application/zip)
2020-04-12 07:34 UTC, Evelina Shames
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 108174 0 master MERGED tests: add a failing test for qemuimg.convert 2020-11-15 09:01:11 UTC
oVirt gerrit 108227 0 master MERGED qemuimg: use -B to specify backing file without creation 2020-11-15 09:01:32 UTC
oVirt gerrit 108236 0 master MERGED qemuimg: add qemuimg.compare 2020-11-15 09:01:11 UTC

Description Evelina Shames 2020-03-31 09:18:04 UTC
Created attachment 1675024 [details]
Logs

Description of problem:
Cold storage migration to iSCSI domain fails with the following errors:

Engine:
2020-03-28 00:36:06,519+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-69) [disks_syncAction_d8ab95ae-118b-4589] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyDataCommand' with failure.
2020-03-28 00:36:07,533+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupVolumesDataCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-11) [disks_syncAction_d8ab95ae-118b-4589] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupVolumesDataCommand' with failure.
2020-03-28 00:36:09,541+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupWithDataCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51) [disks_syncAction_d8ab95ae-118b-4589] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupWithDataCommand' with failure.
2020-03-28 00:36:09,733+03 WARN  [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-23) [] Can't find relative path for class "org.ovirt.engine.api.resource.StorageDomainVmDiskAttachmentsResource", will return null
2020-03-28 00:36:09,733+03 WARN  [org.ovirt.engine.api.restapi.util.LinkHelper] (default task-23) [] Can't find relative path for class "org.ovirt.engine.api.resource.StorageDomainVmDiskAttachmentsResource", will return null
2020-03-28 00:36:12,571+03 ERROR [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [disks_syncAction_d8ab95ae-118b-4589] Ending command 'org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand' with failure.
2020-03-28 00:36:12,575+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.MoveImageGroupCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [disks_syncAction_d8ab95ae-118b-4589] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.MoveImageGroupCommand' with failure.
2020-03-28 00:36:12,961+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-66) [] EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), User admin@internal-authz have failed to move disk vm_TestCase6004_2722314670_Disk_0 to domain iscsi_2.


Version-Release number of selected component (if applicable):
ovirt-engine-4.4.0-0.29.master.el8ev.noarch
vdsm-4.40.9-1.el8ev.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Create VM with disk (doesn't matter on which domain type)
2. Try to move the disk to iSCSI domain


Actual results:
Operation fails

Expected results:
Operation should succeed

Additional info:
Relevant logs are attached

Comment 1 Evelina Shames 2020-03-31 10:08:18 UTC
Adding 'Regression' since it worked in ovirt-engine-4.4.0-0.25.master.el8ev.noarch

Comment 2 Evelina Shames 2020-03-31 15:54:44 UTC
On Engine log we saw this issue that looks like a libvirt/qemu issue:

2020-03-27 23:05:40,730+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHostJobsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-53) [disks_syncAction_7cd80822-88d5-4efc] FINISH, GetHostJobsVDSCommand, return: {038fb9ac-9d3d-43ee-b3ed-2ef0dc081763=HostJobInfo:{id='038fb9ac-9d3d-43ee-b3ed-2ef0dc081763', type='storage', description='copy_data', status='failed', progress='51', error='VDSError:{code='GeneralException', message='General Exception: ("Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-n', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/b6f1e913-4e57-4c5a-ad6f-d8dc9ae21e8a/images/1a57c902-c789-4124-9443-50322e305e83/c7d0a3b0-b8ee-407c-95cc-c1b21b616936', '-O', 'qcow2', '-o', 'compat=1.1,backing_file=b0951a53-9e5c-4184-a883-7c04b4ccad1d,backing_fmt=qcow2', '/rhev/data-center/mnt/blockSD/e3ceb1c1-b883-4638-916a-82f153370644/images/1a57c902-c789-4124-9443-50322e305e83/c7d0a3b0-b8ee-407c-95cc-c1b21b616936'] failed with rc=1 out=b'' err=bytearray(b'qemu-img: warning: -o has no effect when skipping image creation\\nqemu-img: warning: This will become an error in future QEMU versions.\\nqemu-img: error while writing sector 2840576: No space left on device\\n')",)'}'}}, log id: 39cde416

On relevant host_mixed 1  VDSM log hypervisor-lynx01_vdsm.log :

2020-03-27 23:05:35,288+0300 ERROR (tasks/0) [root] Job '038fb9ac-9d3d-43ee-b3ed-2ef0dc081763' failed (jobs:223)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/jobs.py", line 159, in run
    self._run()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdm/api/copy_data.py", line 87, in _run
    self._operation.run()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/qemuimg.py", line 344, in run
    for data in self._operation.watch():
  File "/usr/lib/python3.6/site-packages/vdsm/storage/operation.py", line 106, in watch
    self._finalize(b"", err)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/operation.py", line 179, in _finalize
    raise cmdutils.Error(self._cmd, rc, out, err)
vdsm.common.cmdutils.Error: Command ['/usr/bin/qemu-img', 'convert', '-p', '-t', 'none', '-T', 'none', '-n', '-f', 'qcow2', '/rhev/data-center/mnt/blockSD/b6f1e913-4e57-4c5a-ad6f-d8dc9ae21e8a/images/1a57c902-c789-4124-9443-50322e305e83/c7d0a3b0-b8ee-407c-95cc-c1b21b616936', '-O', 'qcow2', '-o', 'compat=1.1,backing_file=b0951a53-9e5c-4184-a883-7c04b4ccad1d,backing_fmt=qcow2', '/rhev/data-center/mnt/blockSD/e3ceb1c1-b883-4638-916a-82f153370644/images/1a57c902-c789-4124-9443-50322e305e83/c7d0a3b0-b8ee-407c-95cc-c1b21b616936'] failed with rc=1 out=b'' err=bytearray(b'qemu-img: warning: -o has no effect when skipping image creation\nqemu-img: warning: This will become an error in future QEMU versions.\nqemu-img: error while writing sector 2840576: No space left on device\n')

Comment 3 RHEL Program Management 2020-04-01 08:53:08 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 4 Avihai 2020-04-02 08:33:36 UTC
Automation blocker added as this bug blocks multiple all TC's performing cold storage migration in ISCSI SD.

Comment 5 Evelina Shames 2020-04-12 07:34:03 UTC
Flow:
1. Create VM
2. Add new disks on: fc ,iscsi and nfs.
3. Move each disk to different iscsi domain.

Operation fails with the following errors:

VDSM:
2020-04-12 10:03:09,065+0300 ERROR (jsonrpc/7) [storage.HSM] Empty or not found image 581f5105-f3a0-4f31-80d4-e22b1984bf85 in SD 445ec27a-bfba-444c-918a-b1712bd79c08. {'54803176-36d8-41ff-96ba-9a550bb0fc12': ImgsPar(imgs=['6f5a097c-ded8-463d-b6d0-de4c8d625412'], parent='00000000-0000-0000-0000-000000000000'), '7995b85f-b4a7-4cb4-9627-1e859aeb410a': ImgsPar(imgs=['66d4c16c-cc61-4f18-903c-b447232cefb8'], parent='00000000-0000-0000-0000-000000000000'), '864a9a51-e135-4502-883c-fd5c5769063f': ImgsPar(imgs=['6d865c4e-dbd6-4ec0-9272-3c47d29c1f1d'], parent='00000000-0000-0000-0000-000000000000'), '974e3a23-9529-4c8b-946d-cecebce979be': ImgsPar(imgs=['b70c09b6-81b5-4de8-a7c2-5747f878bf71'], parent='00000000-0000-0000-0000-000000000000')} (hsm:1533)
2020-04-12 10:03:09,065+0300 INFO  (jsonrpc/7) [vdsm.api] FINISH deleteImage error=Image does not exist in domain: 'image=581f5105-f3a0-4f31-80d4-e22b1984bf85, domain=445ec27a-bfba-444c-918a-b1712bd79c08' from=::ffff:10.46.16.252,37262, flow_id=c8202a2c-8610-43d5-894c-7db9e9fab4c7, task_id=a3aa4fe7-f680-46b6-a2e0-b331e269235e (api:52)
2020-04-12 10:03:09,065+0300 ERROR (jsonrpc/7) [storage.TaskManager.Task] (Task='a3aa4fe7-f680-46b6-a2e0-b331e269235e') Unexpected error (task:880)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in _run
    return fn(*args, **kargs)
  File "<decorator-gen-63>", line 2, in deleteImage
  File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method
    ret = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 1534, in deleteImage
    raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
vdsm.storage.exception.ImageDoesNotExistInSD: Image does not exist in domain: 'image=581f5105-f3a0-4f31-80d4-e22b1984bf85, domain=445ec27a-bfba-444c-918a-b1712bd79c08'
2020-04-12 10:03:09,065+0300 INFO  (jsonrpc/7) [storage.TaskManager.Task] (Task='a3aa4fe7-f680-46b6-a2e0-b331e269235e') aborting: Task is aborted: "value=Image does not exist in domain: 'image=581f5105-f3a0-4f31-80d4-e22b1984bf85, domain=445ec27a-bfba-444c-918a-b1712bd79c08' abortedcode=268" (task:1190)

Engine:
2020-04-12 10:03:03,441+03 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MeasureVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-59) [c8202a2c-8610-43d5-894c-7db9e9fab4c7] Command 'MeasureVolumeVDS
Command(HostName = host_mixed_2, MeasureVolumeVDSCommandParameters:{hostId='1c73a4d6-147e-40f6-9ba8-5801f76102ab', storagePoolId='09c8b1bd-fb86-4ff4-9731-18c8e2feeb0d', storageDomainId='750e294a-fac4-4983-a91f-3c46c528a2d9', imageGroupId=
'581f5105-f3a0-4f31-80d4-e22b1984bf85', imageId='3f2540a2-9446-4227-a4eb-1498ac0de79a'})' execution failed: VDSGenericException: VDSErrorException: Failed to MeasureVolumeVDS, error = Command ['/usr/bin/qemu-img', 'measure', '--output', '
json', '-f', 'raw', '-O', 'raw', '/rhev/data-center/mnt/blockSD/750e294a-fac4-4983-a91f-3c46c528a2d9/images/581f5105-f3a0-4f31-80d4-e22b1984bf85/3f2540a2-9446-4227-a4eb-1498ac0de79a'] failed with rc=1 out=b'' err=b"qemu-img: Could not ope
n '/rhev/data-center/mnt/blockSD/750e294a-fac4-4983-a91f-3c46c528a2d9/images/581f5105-f3a0-4f31-80d4-e22b1984bf85/3f2540a2-9446-4227-a4eb-1498ac0de79a': Could not open '/rhev/data-center/mnt/blockSD/750e294a-fac4-4983-a91f-3c46c528a2d9/im
ages/581f5105-f3a0-4f31-80d4-e22b1984bf85/3f2540a2-9446-4227-a4eb-1498ac0de79a': No such file or directory\n", code = 100
2020-04-12 10:03:03,441+03 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MeasureVolumeVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-59) [c8202a2c-8610-43d5-894c-7db9e9fab4c7] FINISH, MeasureVolumeVDSC
ommand, return: , log id: 38f50f09
2020-04-12 10:03:03,442+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.MeasureVolumeCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-59) [c8202a2c-8610-43d5-894c-7db9e9fab4c7] Command 'org.ovirt.engine
.core.bll.storage.disk.image.MeasureVolumeCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to MeasureVolumeVDS, error = Command ['/usr/bin/qemu-i
mg', 'measure', '--output', 'json', '-f', 'raw', '-O', 'raw', '/rhev/data-center/mnt/blockSD/750e294a-fac4-4983-a91f-3c46c528a2d9/images/581f5105-f3a0-4f31-80d4-e22b1984bf85/3f2540a2-9446-4227-a4eb-1498ac0de79a'] failed with rc=1 out=b'' 
err=b"qemu-img: Could not open '/rhev/data-center/mnt/blockSD/750e294a-fac4-4983-a91f-3c46c528a2d9/images/581f5105-f3a0-4f31-80d4-e22b1984bf85/3f2540a2-9446-4227-a4eb-1498ac0de79a': Could not open '/rhev/data-center/mnt/blockSD/750e294a-f
ac4-4983-a91f-3c46c528a2d9/images/581f5105-f3a0-4f31-80d4-e22b1984bf85/3f2540a2-9446-4227-a4eb-1498ac0de79a': No such file or directory\n", code = 100 (Failed with error GeneralException and code 100)
2020-04-12 10:03:03,449+03 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-59) [c8202a2c-8610-43d5-894c-7db9e9fab4c7] Command 'CloneImageGroupVolu
mesStructure' id: 'a0501ccc-99f6-4f22-bf5d-a77ac5cdcf61' with children [] failed when attempting to perform the next operation, marking as 'ACTIVE'
2020-04-12 10:03:03,449+03 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-59) [c8202a2c-8610-43d5-894c-7db9e9fab4c7] Could not measure volume: ja
va.lang.RuntimeException: Could not measure volume
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand.determineImageInitialSize(CloneImageGroupVolumesStructureCommand.java:214)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand.createImage(CloneImageGroupVolumesStructureCommand.java:163)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand.performNextOperation(CloneImageGroupVolumesStructureCommand.java:133)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:80)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:175)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)
        at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
        at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)


Versions:
vdsm-4.40.13-1.el8ev.x86_64
engine-4.4.0-0.32.master.el8ev

Comment 6 Evelina Shames 2020-04-12 07:34:31 UTC
Created attachment 1678192 [details]
logs

Comment 7 Evelina Shames 2020-04-12 12:38:23 UTC
Sorry, I had a problem with the environment I used (the updated wasn't completed).

Verified again on another env:
ovirt-engine-4.4.0-0.32.master.el8ev.noarch
vdsm-4.40.13-1.el8ev.x86_64

Comment 8 Sandro Bonazzola 2020-05-20 19:59:48 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.