Bug 1660742 - Successful snapshot status returned by API although the snapshot creation got failed
Summary: Successful snapshot status returned by API although the snapshot creation got...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.7
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ovirt-4.3.5
: 4.3.0
Assignee: Fred Rolland
QA Contact: meital avital
URL:
Whiteboard:
: 1702188 (view as bug list)
Depends On:
Blocks: 1660997
TreeView+ depends on / blocked
 
Reported: 2018-12-19 06:25 UTC by nijin ashok
Modified: 2020-08-03 15:28 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-16 09:12:19 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4098241 0 Configure None Successful snapshot status returned by API although the snapshot creation got failed 2019-05-01 11:30:15 UTC

Description nijin ashok 2018-12-19 06:25:17 UTC
Description of problem:

If a backup application creates a snapshot using RHV API, it will return the newly created snapshot id to the application. Then the application can check the status of the snapshot operation by querying it using snapshot ID. If it changes from "LOCKED" to "OK" it can conclude that the snapshot operation is complete.

However, if the snapshot operation fails while it sends the snapshot command to the kvm domain, it will automatically delete the newly created snapshot and the volumes. But in this process, the newly created snapshot ID will be marked as "Active VM". So the application which is querying the status using snapshot ID will get status as "OK" and will think that the operation was a success although it got failed leading into doing other incorrect jobs.

To reproduce the issue, I killed the vdsm pid just before it sends the snapshot command to the libvirt. Pasting the output of db and the curl when during the snapshot operation and after it failed and deleted the snapshot automatically.

During snapshot operation.
====

 description |             snapshot_id              | status 
-------------+--------------------------------------+--------
 Active VM   | 86cc6641-9da5-4265-bcbe-8287783ea1c3 | OK
 backup      | 592fc15d-0b12-4156-826b-a15827cf79ab | LOCKED
(2 rows)

    <snapshot href="/ovirt-engine/api/vms/acfba9f2-5de8-4c50-a30e-04024013ab28/snapshots/86cc6641-9da5-4265-bcbe-8287783ea1c3" id="86cc6641-9da5-4265-bcbe-8287783ea1c3">
        <description>Active VM</description>
        <snapshot_status>ok</snapshot_status>
    <snapshot href="/ovirt-engine/api/vms/acfba9f2-5de8-4c50-a30e-04024013ab28/snapshots/592fc15d-0b12-4156-826b-a15827cf79ab" id="592fc15d-0b12-4156-826b-a15827cf79ab">
        <description>backup</description>
        <snapshot_status>locked</snapshot_status>
            <description></description>
            <status>up</status>

After it failed.
====

 description |             snapshot_id              | status 
-------------+--------------------------------------+--------
 Active VM   | 592fc15d-0b12-4156-826b-a15827cf79ab | OK
(1 row)


    <snapshot href="/ovirt-engine/api/vms/acfba9f2-5de8-4c50-a30e-04024013ab28/snapshots/592fc15d-0b12-4156-826b-a15827cf79ab" id="592fc15d-0b12-4156-826b-a15827cf79ab">
        <description>Active VM</description>
        <snapshot_status>ok</snapshot_status>
===

The 592fc15d was the new UUID and if the application checks the status, it will get the "OK" status and can incorrectly interpret that the operation was success.


Version-Release number of selected component (if applicable):

RHV 4.2.7

How reproducible:

100%

Steps to Reproduce:

See above.

Actual results:

Checking the snapshot status by querying it using UUID is leading to the incorrect interpretation of operation.

Expected results:

The snapshot status should be provided correctly or there should be some other way so that an external application can check the status of the snapshot.

Additional info:

Comment 2 nijin ashok 2018-12-19 06:31:26 UTC
The issue was observed while doing the backup using Commvault where the Commvault support provides the info that it's getting the success status when it queries the snapshot id although it failed in RHV.

Comment 3 Elad 2018-12-21 00:52:14 UTC
Sounds similar to bug 1660997, which was reported for upstream

Comment 4 nijin ashok 2018-12-21 04:02:43 UTC
(In reply to Elad from comment #3)
> Sounds similar to bug 1660997, which was reported for upstream

I think this is different. In my case, the snapshot operation is marked as "failed" in the engine and is also getting cleaned up automatically.

Comment 5 Sandro Bonazzola 2019-01-28 09:39:58 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 7 Eyal Shenitzky 2019-03-17 11:16:33 UTC
Nijin, 
Can you please add engine and VDSM logs?

Comment 8 nijin ashok 2019-03-18 01:46:03 UTC
(In reply to Eyal Shenitzky from comment #7)
> Nijin, 
> Can you please add engine and VDSM logs?

I don't have the same environment now. However, it was easy to reproduce. Could you please try at your end?

Comment 9 Fred Rolland 2019-04-22 08:43:36 UTC
Nijin hi,

It seems that looking only at the status of the snapshot entry only is not enough.

You could try one of the following:

1. Same as described, check the status of the snapshot but once the status is OK, check the 'snapshot_type'.
   - If it got back to 'ACTIVE', then it means that the operation failed.
   - If it is 'REGULAR' and the status is 'OK', then the operation is successful.

2. Add a correlation ID when creating the snapshot, and check that all jobs with this ID are finished without failures.
This is the way it is implemented in oVirt system tests:
   - Add correlation ID:
         https://github.com/oVirt/ovirt-system-tests/blob/master/basic-suite-master/test-scenarios/004_basic_sanity.py#L363
   - Search jobs with correlation ID:
         https://github.com/oVirt/ovirt-system-tests/blob/10d0662f1a34d0f1ac5e27b80ad7a79a5fda3779/basic-suite-master/test_utils/__init__.py#L211


I don't think that we plan to change the current logic of the snapshot statuses in the near future.

Please tell me what you think about the above propositions.

Thanks,
Freddy

Comment 10 nijin ashok 2019-04-22 12:52:10 UTC
(In reply to Fred Rolland from comment #9)
> Nijin hi,
> 
> It seems that looking only at the status of the snapshot entry only is not
> enough.
> 
> You could try one of the following:
> 
> 1. Same as described, check the status of the snapshot but once the status
> is OK, check the 'snapshot_type'.
>    - If it got back to 'ACTIVE', then it means that the operation failed.
>    - If it is 'REGULAR' and the status is 'OK', then the operation is
> successful.
> 
> 2. Add a correlation ID when creating the snapshot, and check that all jobs
> with this ID are finished without failures.
> This is the way it is implemented in oVirt system tests:
>    - Add correlation ID:
>         
> https://github.com/oVirt/ovirt-system-tests/blob/master/basic-suite-master/
> test-scenarios/004_basic_sanity.py#L363
>    - Search jobs with correlation ID:
>         
> https://github.com/oVirt/ovirt-system-tests/blob/
> 10d0662f1a34d0f1ac5e27b80ad7a79a5fda3779/basic-suite-master/test_utils/
> __init__.py#L211
> 
> 
Thank you Fred. I have asked the customer to forward this feedback to Commvault team.

Comment 11 Marina Kalinin 2019-04-30 19:09:39 UTC
Nijin, 
Can you please put this in a KCS as well?

Comment 12 Marina Kalinin 2019-05-03 17:57:36 UTC
Thanks, Nijin!

Should we close the bug now?

Comment 13 dev-unix-virtualization 2019-05-07 18:32:04 UTC
Freddy, 

Can you please post the correct syntax for the XML request to create a snap using  correlation id ? 

We tried the forms below and always receive a 400 from the API server.


XML Req 1 : 
<snapshot>
  <description>My snapshot</description>
  <persist_memorystate>false</persist_memorystate>
  <query>correlation_id=test</query>
</snapshot>


XML Req 2 : 
<snapshot>
  <description>My snapshot</description>
  <persist_memorystate>false</persist_memorystate>
  <query><correlation_id>test</correlation_id></query>
</snapshot>


JSON Req ===> 

{
	
		"description" : "My snap2",
		"query": { 
			"correlation_id": "test"
		}
}

Resp: 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fault>
    <detail>For correct usage, see: https://172.24.25.3/ovirt-engine/apidoc#services/snapshots/methods/add</detail>
    <reason>Request syntactically incorrect.</reason>
</fault>

Comment 14 Benny Zlotnik 2019-05-13 12:14:14 UTC
Hi,

You can pass the correlation_id as follows:
POST /ovirt-engine/api/vms/{vm_id}/snapshots?correlation_id=097d3014-b5c4-4ab0-96d9-003f310a1b31

and to search for it you can use:
GET /ovirt-engine/api/jobs?search=correlation_id%3D097d3014-b5c4-4ab0-96d9-003f310a1b31

Comment 15 Tal Nisan 2019-05-13 14:12:59 UTC
*** Bug 1702188 has been marked as a duplicate of this bug. ***

Comment 16 Fred Rolland 2019-05-16 09:07:15 UTC
Nijin,

Can we close the bug?

Thanks

Comment 17 nijin ashok 2019-05-16 09:10:38 UTC
Sure Fred. I think we can close it.


Note You need to log in before you can comment on or make changes to this bug.