Previously, if a user tried to perform a live merge of a snapshot that included unattached disks, the live merge did not finish and the snapshot remained locked. In the current release, live merge is blocked if the snapshot includes unattached disks.
Description of problem:
If any of the disks which are part of the snapshot is in unattached status, then the merge will run indefinitely and the snapshot will remain in locked status.
The merge will fail with the error below.
===
2018-02-14 06:08:52,571-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-120) [] EVENT_ID: USER_REMOVE_SNAPSHOT(342), Correlation ID: 692a7af2-46a3-4a13-aa61-ad41ea3db4b5, Job ID: 39da3f19-7854-4350-9c24-0dc199daaccb, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Snapshot 'snap' deletion for VM 'test_vm' was initiated by admin@internal-authz.
2018-02-14 06:08:54,753-05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-6) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] START, MergeVDSCommand(HostName = 10.74.130.111, MergeVDSCommandParameters:{runAsync='true', hostId='dba8d52f-1524-4f6c-8e30-0115de2990a4', vmId='33778b96-efd8-4eb8-abba-229f1eebdf28', storagePoolId='00000001-0001-0001-0001-000000000311', storageDomainId='92227e03-01c8-4f41-999e-032175f20116', imageGroupId='313aceb7-1871-477d-b1af-294dcc4cf9d4', imageId='73f41734-b3b9-4271-838c-a5f1a8d0c8d6', baseImageId='36a2032a-364d-403f-9b48-1c06ca9843c3', topImageId='73f41734-b3b9-4271-838c-a5f1a8d0c8d6', bandwidth='0'}), log id: 239eb571
2018-02-14 06:08:54,754-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-5) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] Failed in 'MergeVDS' method
2018-02-14 06:08:54,768-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-5) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM 10.74.130.111 command MergeVDS failed: Drive image file could not be found
===
The other disk completed successfully.
===
2018-02-14 06:08:54,753-05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-6) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] START, MergeVDSCommand(HostName = 10.74.130.111, MergeVDSCommandParameters:{runAsync='true', hostId='dba8d52f-1524-4f6c-8e30-0115de2990a4', vmId='33778b96-efd8-4eb8-abba-229f1eebdf28', storagePoolId='00000001-0001-0001-0001-000000000311', storageDomainId='92227e03-01c8-4f41-999e-032175f20116', imageGroupId='313aceb7-1871-477d-b1af-294dcc4cf9d4', imageId='73f41734-b3b9-4271-838c-a5f1a8d0c8d6', baseImageId='36a2032a-364d-403f-9b48-1c06ca9843c3', topImageId='73f41734-b3b9-4271-838c-a5f1a8d0c8d6', bandwidth='0'}), log id: 239eb571
2018-02-14 06:09:20,437-05 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler9) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] Merge command (jobId = 35505ae8-1999-4dc5-89c0-5b6fd006a13e) has completed for images '36a2032a-364d-403f-9b48-1c06ca9843c3'..'73f41734-b3b9-4271-838c-a5f1a8d0c8d6'
===
The job still exists in the database.
engine=# select job_id,status from job where job_id='39da3f19-7854-4350-9c24-0dc199daaccb' ;
job_id | status
--------------------------------------+---------
39da3f19-7854-4350-9c24-0dc199daaccb | STARTED
(1 row)
And the snapshots remains in locked status.
engine=# select snapshot_id,status from snapshots where vm_id = '33778b96-efd8-4eb8-abba-229f1eebdf28';
snapshot_id | status
--------------------------------------+--------
eac09eec-cc5f-4161-b2e9-85e3b19bbc53 | OK
a00edf4d-1f2d-490b-aafd-68e1dfc92dc7 | LOCKED
(2 rows)
The engine log will be having error below error running indefinitely.
2018-02-14 06:13:32,330-05 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler1) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] Failed invoking callback end method 'onFailed' for command '8844ec31-1089-415f-9a3c-ca8d42c5728f' with exception 'null', the callback is marked for end method retries
2018-02-14 06:13:34,337-05 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler1) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] Command 'RemoveSnapshot' (id: '57b257bf-1a08-4770-8dd8-571c554e567d') waiting on child command id: '8844ec31-1089-415f-9a3c-ca8d42c5728f' type:'RemoveSnapshotSingleDiskLive' to complete
Version-Release number of selected component (if applicable):
rhevm-4.1.8.2-0.1.el7.noarch
How reproducible:
100%
Steps to Reproduce:
1. Create a snapshot on two disks on a VM.
2. Detach one of the disk from the VM
3. Do a live merge.
Actual results:
Merge command is issued for the unattached disk as well and it will run indefinitely.
Expected results:
Don't issue the merge command to vdsm for the unattached disk since it will fail anyway. Didn't keep the snapshot as locked so that user can attempt the merge command again either online by attaching the disk again or by offline.
Additional info:
We have added a warning as per bug 1411572 if it contains unattached disk which is showing correctly.
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{'rhevm-4.2-ga': '?'}', ]
For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:
[Found non-acked flags: '{'rhevm-4.2-ga': '?'}', ]
For more info please contact: rhv-devops
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHEA-2018:1488
Description of problem: If any of the disks which are part of the snapshot is in unattached status, then the merge will run indefinitely and the snapshot will remain in locked status. The merge will fail with the error below. === 2018-02-14 06:08:52,571-05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-120) [] EVENT_ID: USER_REMOVE_SNAPSHOT(342), Correlation ID: 692a7af2-46a3-4a13-aa61-ad41ea3db4b5, Job ID: 39da3f19-7854-4350-9c24-0dc199daaccb, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: Snapshot 'snap' deletion for VM 'test_vm' was initiated by admin@internal-authz. 2018-02-14 06:08:54,753-05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-6) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] START, MergeVDSCommand(HostName = 10.74.130.111, MergeVDSCommandParameters:{runAsync='true', hostId='dba8d52f-1524-4f6c-8e30-0115de2990a4', vmId='33778b96-efd8-4eb8-abba-229f1eebdf28', storagePoolId='00000001-0001-0001-0001-000000000311', storageDomainId='92227e03-01c8-4f41-999e-032175f20116', imageGroupId='313aceb7-1871-477d-b1af-294dcc4cf9d4', imageId='73f41734-b3b9-4271-838c-a5f1a8d0c8d6', baseImageId='36a2032a-364d-403f-9b48-1c06ca9843c3', topImageId='73f41734-b3b9-4271-838c-a5f1a8d0c8d6', bandwidth='0'}), log id: 239eb571 2018-02-14 06:08:54,754-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-5) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] Failed in 'MergeVDS' method 2018-02-14 06:08:54,768-05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (pool-5-thread-5) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), Correlation ID: null, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VDSM 10.74.130.111 command MergeVDS failed: Drive image file could not be found === The other disk completed successfully. === 2018-02-14 06:08:54,753-05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MergeVDSCommand] (pool-5-thread-6) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] START, MergeVDSCommand(HostName = 10.74.130.111, MergeVDSCommandParameters:{runAsync='true', hostId='dba8d52f-1524-4f6c-8e30-0115de2990a4', vmId='33778b96-efd8-4eb8-abba-229f1eebdf28', storagePoolId='00000001-0001-0001-0001-000000000311', storageDomainId='92227e03-01c8-4f41-999e-032175f20116', imageGroupId='313aceb7-1871-477d-b1af-294dcc4cf9d4', imageId='73f41734-b3b9-4271-838c-a5f1a8d0c8d6', baseImageId='36a2032a-364d-403f-9b48-1c06ca9843c3', topImageId='73f41734-b3b9-4271-838c-a5f1a8d0c8d6', bandwidth='0'}), log id: 239eb571 2018-02-14 06:09:20,437-05 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (DefaultQuartzScheduler9) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] Merge command (jobId = 35505ae8-1999-4dc5-89c0-5b6fd006a13e) has completed for images '36a2032a-364d-403f-9b48-1c06ca9843c3'..'73f41734-b3b9-4271-838c-a5f1a8d0c8d6' === The job still exists in the database. engine=# select job_id,status from job where job_id='39da3f19-7854-4350-9c24-0dc199daaccb' ; job_id | status --------------------------------------+--------- 39da3f19-7854-4350-9c24-0dc199daaccb | STARTED (1 row) And the snapshots remains in locked status. engine=# select snapshot_id,status from snapshots where vm_id = '33778b96-efd8-4eb8-abba-229f1eebdf28'; snapshot_id | status --------------------------------------+-------- eac09eec-cc5f-4161-b2e9-85e3b19bbc53 | OK a00edf4d-1f2d-490b-aafd-68e1dfc92dc7 | LOCKED (2 rows) The engine log will be having error below error running indefinitely. 2018-02-14 06:13:32,330-05 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler1) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] Failed invoking callback end method 'onFailed' for command '8844ec31-1089-415f-9a3c-ca8d42c5728f' with exception 'null', the callback is marked for end method retries 2018-02-14 06:13:34,337-05 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler1) [692a7af2-46a3-4a13-aa61-ad41ea3db4b5] Command 'RemoveSnapshot' (id: '57b257bf-1a08-4770-8dd8-571c554e567d') waiting on child command id: '8844ec31-1089-415f-9a3c-ca8d42c5728f' type:'RemoveSnapshotSingleDiskLive' to complete Version-Release number of selected component (if applicable): rhevm-4.1.8.2-0.1.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Create a snapshot on two disks on a VM. 2. Detach one of the disk from the VM 3. Do a live merge. Actual results: Merge command is issued for the unattached disk as well and it will run indefinitely. Expected results: Don't issue the merge command to vdsm for the unattached disk since it will fail anyway. Didn't keep the snapshot as locked so that user can attempt the merge command again either online by attaching the disk again or by offline. Additional info: We have added a warning as per bug 1411572 if it contains unattached disk which is showing correctly.