Bug 1593564 - Snapshot creation failed due duplicate uuid for new image
Summary: Snapshot creation failed due duplicate uuid for new image
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.2.5
: ---
Assignee: Benny Zlotnik
QA Contact: Yosi Ben Shimon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-21 05:45 UTC by Germano Veit Michel
Modified: 2021-12-10 16:25 UTC (History)
9 users (show)

Fixed In Version: ovirt-engine-4.2.5.2
Doc Type: Bug Fix
Doc Text:
Cause: engine 4.2+ allows passing image and disk ID via REST API (https://bugzilla.redhat.com/show_bug.cgi?id=1405805) Consequence: A user might pass an already existing image id which will violate the primary key causing the operation to fail Fix: Passing an already existing image ID is now blocked with a proper message Result: It will no longer be possible to pass existing image IDs and the operation will not fail in the manner it did in this bug
Clone Of:
Environment:
Last Closed: 2018-09-03 13:03:48 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3495111 0 None None None 2018-06-22 01:13:31 UTC
oVirt gerrit 92903 0 master MERGED core: validate image id's passed from REST 2020-12-09 13:06:28 UTC
oVirt gerrit 93031 0 ovirt-engine-4.2 MERGED core: validate image id's passed from REST 2020-12-09 13:06:26 UTC

Description Germano Veit Michel 2018-06-21 05:45:39 UTC
Description of problem:

After upgrading to 4.2, the first snapshot creation via api (v3) resulted in failure, also wiping the disk from the database and clearing the snapshots table for the VM.

Please see the details:

The VM was running fine, freshly started after upgrading to 4.2. It has 3 disks and no snapshots.

Via sdk3, the user requests the creation of a new snapshot:

vm.snapshots.add(
    params.Snapshot(
        description="Test Snapshot",
        vm=vm,
        persist_memorystate=False,
    )
)

1. On the engine side we see CreateAllSnapshots as expected:

2018-06-18 10:43:04,491+02 INFO  [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] (default task-13) [10f48fdc-759d-48c1-a9e1-a70a02be25b6] Running command: CreateAllSnapshotsFromVmCommand internal: false. Entities affected :  ID: 291df27e-d031-491b-ad36-89d608eb8f71 Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER

2. Then CreateSnapshotCommand for the first disk

2018-06-18 10:43:04,544+02 INFO  [org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand] (default task-13) [10f48fdc-759d-48c1-a9e1-a70a02be25b6] Running command: CreateSnapshotCommand internal: true. Entities affected :  ID: 00000000-0000-0000-0000-000000000000 Type: Storage

3. Then CreateVolumeVDSCommand

But here there is a problem, we have a duplicate uuid. The previous active layer has the same uuid of the new image.

imageId='4ca87da0-356f-4b45-9878-2df348524846'
newImageId='4ca87da0-356f-4b45-9878-2df348524846'

2018-06-18 10:43:04,645+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand] (default task-13) [10f48fdc-759d-48c1-a9e1-a70a02be25b6] START, CreateVolumeVDSCommand( CreateVolumeVDSCommandParameters:{storagePoolId='59b7af54-0155-01c2-0248-000000000195', ignoreFailoverLimit='false', storageDomainId='5e955c12-e5ce-46bb-b363-24356b6228d4', imageGroupId='df562aba-a0c6-4139-9102-1ca6ce4f3846', imageSizeInBytes='59055800320', volumeFormat='COW', newImageId='4ca87da0-356f-4b45-9878-2df348524846', imageType='Sparse', newImageDescription='', imageInitialSizeInBytes='0', imageId='4ca87da0-356f-4b45-9878-2df348524846', sourceImageGroupId='df562aba-a0c6-4139-9102-1ca6ce4f3846'}), log id: 7b429730

Looking at the code I don't think this uuid can even be provided via API. Can it? If not, it must have been the engine.

4. This fails on the storage domain (LV already exists) and on the engine there is a SQL exception due to trying to insert a duplicate image in the DB.

2018-06-18 10:43:04,762+02 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand] (default task-13) [10f48fdc-759d-48c1-a9e1-a70a02be25b6] Command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand' failed: CallableStatementCallback; SQL [{call insertimage(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}]; ERROR: duplicate key value violates unique constraint "pk_images"
  Detail: Key (image_guid)=(4ca87da0-356f-4b45-9878-2df348524846) already exists.

5. Some more NPEs and roll-back attempts. The result is this disk is now missing from the DB entirely and the VM has not a single snapshot in the snapshots table, not even 'Active', its empty.

Some additional details:
- Same API script has been used sucessfully on 4.1 for several VMs for some time. You can see this VM being snapshoted on the 15th exactly the same way, all good.
- Upgrade to 4.2 on the 16th. First snapshot attempt on the 18th was the one above, with duplicate uuid.

I tried reproducing this on 4.2.3 several times in our labs, no luck.

Could it have been just some extreme bad luck to get a duplicate uuid?

Version-Release number of selected component (if applicable):
rhvm-4.2.3.8-0.1.el7.noarch

How reproducible:
No after ~10 attempts. Used same number of disks with similar configuration.

Comment 4 Benny Zlotnik 2018-06-21 10:01:15 UTC
I haven't reproduced it yet, but looking at the 4.2 code, it does seem possible since it looks like in 4.2 support for passing the snapshot id via REST[1] was added while in 4.1 a new UUID was always generated. So my guess is it is probably a combination of v3 not supporting passing the image UUID and 4.2 expecting it that led to this unfortunate outcome

I have managed to reproduce it using this request:

POST http://localhost:8080/ovirt-engine/api/v3/vms/502c2717-45dc-4580-abed-40c7c4e613d1/snapshots


<snapshot>
    <description>snap</description>
    <vm id="66e3a14a-ded0-4166-b3cc-16fb5f271c71"/>
    <disks>
    	<disk id="b6287fc3-38a2-45ff-938c-04d0a470f7be">
			<image_id>0bf85090-2785-442c-971f-5e6ab09845f3</image_id>
    	</disk>
    </disks>
</snapshot>

The key here being is that image_id is passed, looking at their script in the case:
                    vm.snapshots.add(
                        params.Snapshot(
                            description=config.get_snapshot_description(),
                            vm=vm,
                            disks=discoSnap,
                            persist_memorystate=config.get_persist_memorystate(),
                        )
                    )

While I am not familiar with APIv3, it probably means discoSnap contains the entire disk definition along with the image id which is the reason why it was used by the engine 

While in V4 we would do something like this:
types.Snapshot(
        description=SNAPSHOT_DESC_1,
        persist_memorystate=False,
        disk_attachments=[
            types.DiskAttachment(
                disk=types.Disk(
                    id=disk.id
                )
            )
        ]
    )

Which sends only the disk's id, perhaps something similar can be done in V3


Regardless, we need to handle this in the engine as well to avoid such failures


[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1405805

Comment 5 Benny Zlotnik 2018-06-21 10:03:52 UTC
(In reply to Benny Zlotnik from comment #4)
> I haven't reproduced it yet, but looking at the 4.2 code, it does seem
Sorry for not proofreading, obviously I've managed to reproduce it by the time the comment was posted :)

Comment 6 Germano Veit Michel 2018-06-22 00:32:37 UTC
Ahhh, one needs to send the VM configuration to trigger it!

Could you please consider 4.2.z this? When the roll-back kicks in, it removes the "new image", which is the same as the "old image" from the DB. Several entries in several tables are removed, its not trivial to put everything back.
Thanks Benny

Comment 7 Germano Veit Michel 2018-06-22 01:23:17 UTC
Just to publicly document here as well, the following tables need to be repopulated with the lines removed by the roll-back mechanism of the failed snapshot:

base_disks
images
snapshots
image_storage_domain_map
disk_vm_element
vm_ovf_generations (OVF corrected, adding the disk back)

Comment 8 RHV bug bot 2018-07-24 12:26:57 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops

Comment 9 Yosi Ben Shimon 2018-07-25 12:29:59 UTC
Hi Benny,
Can you provide the steps to reproduce this bug?
Please move this bug to ON_QA.

Thank you

Comment 10 Benny Zlotnik 2018-07-25 12:34:42 UTC
(In reply to Yosi Ben Shimon from comment #9)
> Hi Benny,
> Can you provide the steps to reproduce this bug?
> Please move this bug to ON_QA.
> 
> Thank you

You can:
1. Create a VM with a disk
2. Create a snapshot
3. Send a request (change the UUIDs to match yours)
POST http://localhost:8080/ovirt-engine/api/v3/vms/502c2717-45dc-4580-abed-40c7c4e613d1/snapshots


<snapshot>
    <description>snap</description>
    <vm id="66e3a14a-ded0-4166-b3cc-16fb5f271c71"/>
    <disks>
    	<disk id="b6287fc3-38a2-45ff-938c-04d0a470f7be">
			<image_id>0bf85090-2785-442c-971f-5e6ab09845f3</image_id>
    	</disk>
    </disks>
</snapshot>

vm id=<your VM id>
disk id=<your disk id>
image_id=<the image id of the created snapshot>

Comment 11 Yosi Ben Shimon 2018-07-25 13:39:05 UTC
Tested using:
ovirt-engine-4.2.5.2-0.1.el7ev.noarch

I used the steps that Benny provided in comment #10 and the actual result is:

- The response on the REST API was:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <reason>Operation Failed</reason>
    <detail>[Cannot create Snapshot. Image IDs 	a964b123-57b0-4086-aef5-b6a1a052ae24 are already used by a snapshot.]</detail>
</fault>


- On the engine log:

2018-07-25 16:29:43,544+03 INFO  [org.ovirt.engine.core.bll.AddDeprecatedApiEventCommand] (default task-12) [219bc3e3] Running command: AddDeprecatedApiEventCommand internal: false.
2018-07-25 16:29:43,660+03 INFO  [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (default task-12) [77653f00-517a-4367-8f2e-629bf3d9216d] Lock Acquired to object 'EngineLock:{exclusiveLocks='[e7e70c77-4aee-4907-86a2-d0e3e27156bd=VM]', sharedLocks=''}'
2018-07-25 16:29:43,692+03 WARN  [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (default task-12) [77653f00-517a-4367-8f2e-629bf3d9216d] Validation of action 'CreateSnapshotForVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__CREATE,VAR__TYPE__SNAPSHOT,ACTION_TYPE_FAILED_VM_SNAPSHOT_IMAGE_ALREADY_EXISTS,$ImageIds 	a964b123-57b0-4086-aef5-b6a1a052ae24,$ImageIds_COUNTER 1
2018-07-25 16:29:43,693+03 INFO  [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (default task-12) [77653f00-517a-4367-8f2e-629bf3d9216d] Lock freed to object 'EngineLock:{exclusiveLocks='[e7e70c77-4aee-4907-86a2-d0e3e27156bd=VM]', sharedLocks=''}'
2018-07-25 16:29:43,715+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-12) [] Operation Failed: [Cannot create Snapshot. Image IDs 	a964b123-57b0-4086-aef5-b6a1a052ae24 are already used by a snapshot.]
2018-07-25 16:29:43,730+03 INFO  


Moving to VERIFIED

Comment 12 Marina Kalinin 2018-08-31 20:02:26 UTC
Benny / Yosi,

This bug is on verified and 4.2.5 is out.
Why is this bug still open?
Who can confirm the correct status of it?

Thank you!

Comment 13 Tal Nisan 2018-09-03 12:10:26 UTC
Anton, this bug as well, what shall we do with it? It was not moved to ON_QA and I'm assuming not in the Errata as well

Comment 15 Marina Kalinin 2018-09-04 14:27:45 UTC
https://access.redhat.com/errata/RHBA-2018:2471

Comment 16 Daniel Gur 2019-08-28 13:11:32 UTC
sync2jira

Comment 17 Daniel Gur 2019-08-28 13:15:43 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.