+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1750212 +++ ====================================================================== Description of problem: MERGE_STATUS fails with "Invalid UUID string: mapper" if the VM has a Direct LUN that has been hot-plugged in a certain way. 1. Start with a VM with 1 disk from Storage Domain. 2. Run the VM 3. Go to Storage->Disks and create a Direct LUN 4. Go to Compute->Virtual Machines->VM->Disks and hotplug the Direct LUN without changing any options. (see below) 5. Hotplug another disk from Storage Domain. 6. Create a snapshot 7. Delete a snapshot fails (see below) Some steps in more details: [4] Hotplug LUN by first creating the Direct LUN in the Disks tab and then going to the VM->Disks and attaching it. In this case the Direct LUN is is added as "device=disk" instead of "device=lun": 2019-09-09 10:46:19,693+10 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HotPlugDiskVDSCommand] (EE-ManagedThreadFactory-engine-Thread-459996) [5ecae6ed-44f7-4ed0-aefd-71fc46352a52] Disk hot-plug: <?xml version="1.0" encoding="UTF-8"?><hotplug> <devices> <disk snapshot="no" type="block" device="disk"> <----- disk, not lun <target dev="sda" bus="scsi"/> <source dev="/dev/mapper/36001405156d88f1cc594c4a94ffe1418"> <seclabel model="dac" type="none" relabel="no"/> </source> <driver name="qemu" io="native" type="raw" error_policy="stop" cache="none"/> <alias name="ua-0ba9f0fd-b3d3-4723-8959-1068b46550f8"/> <address bus="0" controller="0" unit="3" type="drive" target="0"/> </disk> </devices> [7] Delete Snapshot fails on MERGE_STATUS. It seems to be trying to get the Volume Chain of the DirectLUN, as its type is 'Disk' and not 'LUN' [A] 2019-09-09 10:23:56,219+10 ERROR [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-3) [4a758799-48e7-4389-9eba-54fe8c51034e] Exception: java.lang.IllegalArgumentException: Invalid UUID string: mapper at java.util.UUID.fromString(UUID.java:194) [rt.jar:1.8.0_222] at org.ovirt.engine.core.compat.Guid.<init>(Guid.java:67) [compat.jar:] at org.ovirt.engine.core.compat.Guid.createGuidFromStringWithDefault(Guid.java:87) [compat.jar:] at org.ovirt.engine.core.compat.Guid.createGuidFromString(Guid.java:76) [compat.jar:] at org.ovirt.engine.core.bll.MergeStatusCommand.getVolumeChain(MergeStatusCommand.java:152) [bll.jar:] at org.ovirt.engine.core.bll.MergeStatusCommand.attemptResolution(MergeStatusCommand.java:75) [bll.jar:] at org.ovirt.engine.core.bll.MergeStatusCommand.executeCommand(MergeStatusCommand.java:59) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1157) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1315) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1964) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1375) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:419) [bll.jar:] Version-Release number of selected component (if applicable): rhvm-4.3.5.4-0.1.el7.noarch How reproducible: Always Steps to Reproduce: As above. Actual results: - Snapshot delete fails Expected results: - Snapshot delete succeeds Additional info: [A] https://github.com/oVirt/ovirt-engine/blob/ovirt-engine-4.3.5.z/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/MergeStatusCommand.java#L151 (Originally by Diego Huertas Alvarez)
Diego, could you please attach the engine logs from our labs? (Originally by Germano Veit Michel)
Benny, looks related to the change you did in live merge lately, we probably don't filter by disk type when we get the volume chain info, please have a look (Originally by Tal Nisan)
(In reply to Tal Nisan from comment #3) > Benny, looks related to the change you did in live merge lately, we probably > don't filter by disk type when we get the volume chain info, please have a > look My latest change was introduced only in 4.3.6 this looks like https://bugzilla.redhat.com/show_bug.cgi?id=1598594 (Originally by Benny Zlotnik)
(In reply to Benny Zlotnik from comment #4) > (In reply to Tal Nisan from comment #3) > > Benny, looks related to the change you did in live merge lately, we probably > > don't filter by disk type when we get the volume chain info, please have a > > look > > My latest change was introduced only in 4.3.6 > > this looks like https://bugzilla.redhat.com/show_bug.cgi?id=1598594 I agree, this looks like a virt issue (same as bug 1598594) (Originally by Eyal Shenitzky)
(In reply to Eyal Shenitzky from comment #5) > (In reply to Benny Zlotnik from comment #4) > > (In reply to Tal Nisan from comment #3) > > > Benny, looks related to the change you did in live merge lately, we probably > > > don't filter by disk type when we get the volume chain info, please have a > > > look > > > > My latest change was introduced only in 4.3.6 > > > > this looks like https://bugzilla.redhat.com/show_bug.cgi?id=1598594 > > I agree, this looks like a virt issue (same as bug 1598594) On this bug the problem seems to be that the Direct LUN has VmDeviceType Disk instead of LUN, and then the engine tries to do volume lookup instead of ignoring it. (Originally by Germano Veit Michel)
Indeed seems like a Virt issue, most likely in the Domain XML part of hotplugging the disk which doesn't attach the device properly, Ryan can someone have a look? (Originally by Tal Nisan)
LUNs are always hotplugged as type=disk, and they have been for years. There haven't been any changes around that handling since 2017 Is there a reason why snapshot merging is even trying to touch unmanaged storage instead of filtering them out? It seems like a saner solution. (Originally by Ryan Barry)
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.3.z': '?'}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.3.z': '?'}', ] For more info please contact: rhv-devops
Verified - The delete snapshot succeeds 2019-12-15 11:19:02,212+02 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-9) [11345ff1-9bb2-4032-ba85-983cbdc07874] Successfully merged snapshot '5c9b489c-692a-43b1-b94a-8cff957863c1' images 'b9744599-23d6-4a95-837c-b9ea0db28ad0'..'ff90d51f-e098-47db-804a-293f49e5e999' 2019-12-15 11:19:02,232+02 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-9) [11345ff1-9bb2-4032-ba85-983cbdc07874] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand' successfully. 2019-12-15 11:19:02,234+02 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-9) [11345ff1-9bb2-4032-ba85-983cbdc07874] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[90c757ea-a9d7-4599-bc75-06dcc6a4fe60=TEMPLATE]'}' 2019-12-15 11:19:03,278+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-1) [11345ff1-9bb2-4032-ba85-983cbdc07874] Command 'RemoveSnapshot' id: '539618a2-0b13-479d-8c6c-90376ec8f808' child commands '[ef244aaa-3ac9-4850-b82c-5e4a98324906, 4e28b7fc-dd60-4e8b-92d9-dba950f6562d]' executions were completed, status 'SUCCEEDED' 2019-12-15 11:19:04,317+02 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-13) [11345ff1-9bb2-4032-ba85-983cbdc07874] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand' successfully. ovirt-engine-4.3.8.1-0.1.master.el7.noarch vdsm-4.30.39-1.el7ev.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:0498