Bug 1628909 - Engine marks the snapshot status as OK before the actual snapshot operation
Summary: Engine marks the snapshot status as OK before the actual snapshot operation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.5
Hardware: All
OS: Linux
medium
urgent
Target Milestone: ovirt-4.3.0
: 4.3.0
Assignee: Benny Zlotnik
QA Contact: Yosi Ben Shimon
URL:
Whiteboard:
: 1620087 (view as bug list)
Depends On:
Blocks: 1635189
TreeView+ depends on / blocked
 
Reported: 2018-09-14 10:37 UTC by nijin ashok
Modified: 2021-12-10 17:37 UTC (History)
15 users (show)

Fixed In Version: ovirt-engine-4.3.0_alpha
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1635189 (view as bug list)
Environment:
Last Closed: 2019-05-08 12:38:22 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-44279 0 None None None 2021-12-10 17:37:49 UTC
Red Hat Knowledge Base (Solution) 3623261 0 None None None 2018-09-26 04:34:57 UTC
Red Hat Knowledge Base (Solution) 3645732 0 None None None 2018-10-10 04:41:32 UTC
Red Hat Product Errata RHEA-2019:1085 0 None None None 2019-05-08 12:38:39 UTC
oVirt gerrit 94584 0 'None' MERGED core: unlock snapshot at the end of the command 2021-01-19 02:50:44 UTC
oVirt gerrit 94626 0 'None' MERGED core: unlock snapshot at the end of the command 2021-01-19 02:50:04 UTC

Description nijin ashok 2018-09-14 10:37:36 UTC
Description of problem:

The ovirt-engine is marking the snapshot status as OK before it sends the "snapshot" command to the VM. So if there is a backup automation tool like Commvault is checking the status of the snapshot by looking into this "status", it would assume that the snapshot operation is complete since it would return "OK". So the tool will proceed to the next step of attaching the snapshot disk to the agent VM. Since the status is OK, it will also complete successfully. This ends up in attaching the snapshot disk to the backup agent VM before the actual snapshot operation is complete. Also, the SnapshotVDSCommand can get failed (example bug  1572801) and this will result in attaching an "invalid" snapshot disk to the backup agent VM.


Version-Release number of selected component (if applicable):

rhvm-4.2.6.4-0.1.el7ev.noarch

How reproducible:

100%

Steps to Reproduce:

1. I added a delay in the vdsm code where it freezes the guest filesystem so that I can replicate a snapshot failure.
2. The snapshot status will be changed to OK before it sends the snapshot command to the VM.

Actual results:

The snapshot status is changed to OK immediately after it creates the volume and before it sends the "snapshot" command to the VM.

Expected results:

The snapshot status should be changed to "OK" only after the complete snapshot operation.

Additional info:

Comment 1 Tal Nisan 2018-09-16 11:30:06 UTC
Eyal please have a look.
Arik, do you have any insights from Virt side?

Comment 3 Arik 2018-09-16 11:45:35 UTC
(In reply to Tal Nisan from comment #1)
> Eyal please have a look.
> Arik, do you have any insights from Virt side?

That looks like a regression caused by the relatively recent changes in the create-snapshot command. The snapshot should indeed remain locked until all tasks are finished.

Comment 4 Eyal Shenitzky 2018-09-16 12:04:12 UTC
Benny,
Can you please take a look?

Comment 5 Arik 2018-09-16 12:14:02 UTC
(In reply to Arik from comment #3)
> (In reply to Tal Nisan from comment #1)
> > Eyal please have a look.
> > Arik, do you have any insights from Virt side?
> 
> That looks like a regression caused by the relatively recent changes in the
> create-snapshot command. The snapshot should indeed remain locked until all
> tasks are finished.

Actually, I was wrong, it seems that we unlocked the snapshot before calling the live-snapshot verb also in 4.1 [1], before those changes.

[1] https://github.com/oVirt/ovirt-engine/blob/ovirt-engine-4.1/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/snapshots/CreateAllSnapshotsFromVmCommand.java#L401-L404

Comment 6 Tal Nisan 2018-09-16 12:44:14 UTC
So, is it a Virt issue or Storage?

Comment 7 Arik 2018-09-16 12:52:51 UTC
(In reply to Tal Nisan from comment #6)
> So, is it a Virt issue or Storage?

It can go either way but I would keep it as Storage since the storage team is the last to introduce a major change to the way this command operates.

Comment 8 Tal Nisan 2018-09-16 13:02:45 UTC
*** Bug 1620087 has been marked as a duplicate of this bug. ***

Comment 15 RHV bug bot 2018-12-10 15:12:59 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops

Comment 16 Yosi Ben Shimon 2018-12-25 16:05:38 UTC
Tested using:
ovirt-engine-4.3.0-0.6.alpha2.el7.noarch
vdsm-4.30.4-1.el7ev.x86_64

- Tested both with and without 30 seconds delay in VDSM.
- Tested with both VM states - up and down.
- There were 5 preallocated disks of size 20G each.

The snapshot creation process took much more than 10 seconds (~60 seconds).

The whole operation time (till it completed), the snapshot status was locked (using REST API).

Moving to VERIFIED

Comment 17 RHV bug bot 2019-01-15 23:35:27 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{'rhevm-4.3-ga': '?'}', ]

For more info please contact: rhv-devops

Comment 19 errata-xmlrpc 2019-05-08 12:38:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085

Comment 20 Daniel Gur 2019-08-28 13:12:09 UTC
sync2jira

Comment 21 Daniel Gur 2019-08-28 13:16:22 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.