1663626 – [RFE] block simultaneously running cluster upgrades

Bug 1663626 - [RFE] block simultaneously running cluster upgrades

Summary: [RFE] block simultaneously running cluster upgrades

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Infra
Sub Component:
Version:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	ovirt-4.3.3
Target Release:	---
Assignee:	Martin Perina
QA Contact:	Petr Kubica
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1686808 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-01-05 17:51 UTC by Greg Sheremeta
Modified:	2019-04-16 13:58 UTC (History)
CC List:	9 users (show)
Fixed In Version:	ovirt-engine-4.3.3.1
Clone Of:
Environment:
Last Closed:	2019-04-16 13:58:18 UTC
oVirt Team:	Infra
Embargoed:
Dependent Products:
Flags:	pm-rhel: ovirt-4.3+ mtessun: planning_ack+ mperina: devel_ack+ lleistne: testing_ack+

Attachments	(Terms of Use)
Demo of the confirmation dialog (2.34 MB, video/webm) 2019-02-09 04:52 UTC, Scott Dickerson	no flags	Details
Demo of the confirmation dialog (gif) (2.02 MB, image/gif) 2019-02-09 04:53 UTC, Scott Dickerson	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
oVirt gerrit	97661	master	MERGED	Check and warn if a Cluster is in cluster_maintenance	2020-04-22 12:12:15 UTC
oVirt gerrit	98164	master	MERGED	engine : Add upgrade_running to cluster	2020-04-22 12:12:15 UTC
oVirt gerrit	98527	master	MERGED	engine: Introduce StartClusterUpgrade and FinishClusterUpgrade commands	2020-04-22 12:12:15 UTC
oVirt gerrit	98528	master	MERGED	restapi: Add upgrade action to cluster	2020-04-22 12:12:15 UTC
oVirt gerrit	98529	master	MERGED	Add Upgrade action to cluster	2020-04-22 12:12:15 UTC
oVirt gerrit	98640	master	MERGED	engine : Add service to reset upgrade_running flag of cluster	2020-04-22 12:12:16 UTC
oVirt gerrit	98713	ovirt-engine-4.3	MERGED	engine : Add upgrade_running to cluster	2020-04-22 12:12:16 UTC
oVirt gerrit	98714	ovirt-engine-4.3	MERGED	engine: Introduce StartClusterUpgrade and FinishClusterUpgrade commands	2020-04-22 12:12:16 UTC
oVirt gerrit	98715	ovirt-engine-4.3	MERGED	engine : Add service to reset upgrade_running flag of cluster	2020-04-22 12:12:16 UTC
oVirt gerrit	98855	ovirt-engine-4.3	MERGED	restapi: Add upgrade action to cluster	2020-04-22 12:12:16 UTC
oVirt gerrit	98872	master	MERGED	restapi: Update to model 4.3.22	2020-04-22 12:12:16 UTC
oVirt gerrit	98879	ovirt-engine-4.3	MERGED	restapi: Update to model 4.3.22	2020-04-22 12:12:16 UTC

Description Greg Sheremeta 2019-01-05 17:51:25 UTC

Description of problem:
[RFE] block simultaneously running cluster upgrades

There needs to be a lock such that only one cluster upgrade can ever run at a time per cluster. It could cause bad environmental consequences if multiple instances of the upgrade role were running simultaneously. Even if it somehow succeeded on the backend, the notification stream in the Admin Portal would be confusing.

Version-Release number of selected component (if applicable):
4.3, master

How reproducible:
always

Steps to Reproduce:
1. Run cluster upgrade
2. Quickly run cluster upgrade again on that same cluster before the previous upgrade finishes

Actual results:
You can run cluster upgrade again on that same cluster before the previous upgrade finishes

Expected results:
You should not be able to run cluster upgrade again on that same cluster before any existing upgrade on that cluster finishes.

Additional info:
The role can (and often should) be run manually and/or via Tower, so we need some shared lock, maybe at role or playbook level. Engine would also need to gracefully detect that failure-due-to-lock condition.

Comment 1 Greg Sheremeta 2019-01-05 17:54:37 UTC

@Ondra and Martin, can you share your thoughts?

Comment 2 Greg Sheremeta 2019-01-09 19:41:11 UTC

@Ondra and Martin, same comment as in https://bugzilla.redhat.com/show_bug.cgi?id=1664844.
This is kind of a mostly infra team feature :) I have it on UX now and Scott can lead, but let's see how it progresses. If heavy role work is needed, perhaps Ondra can assist or lead.

Comment 3 Scott Dickerson 2019-02-09 04:52:35 UTC

Created attachment 1528289 [details]
Demo of the confirmation dialog

This attachment is a demo gif of the new spinner and "cluster_maintenance" warning dialog.

Comment 4 Scott Dickerson 2019-02-09 04:53:38 UTC

Created attachment 1528290 [details]
Demo of the confirmation dialog (gif)

This attachment is a demo gif of the new spinner and "cluster_maintenance" warning dialog.

Comment 5 Scott Dickerson 2019-02-09 04:57:57 UTC

See the demo for how the confirm dialog works.  The cluster upgrade role is unchanged but the dialog should deter people from running the operation twice (assuming they did not uncheck the box change cluster to maintenance mode on the options step).

Comment 6 Greg Sheremeta 2019-02-20 16:19:42 UTC

Ondra agreed to take over the backend part. Assigning to him for now.

Comment 7 Martin Perina 2019-02-20 19:18:20 UTC

I don't see any other reliable way how to prevent 2 simultanoeus cluster upgrade processes running on the same cluster other than:

1. Create a field in cluster entity indicating that cluster upgrade is currently running for the cluster
2. Each when cluster upgrade is going to be executed (either from Ansible playbook or UI), we will check to the backend if there is no other cluster upgrade process running on the same flow (if it is fail this new cluster upgrade process)
3. If not, mark the field to indicate cluster upgrade and continue with the cluster upgrade flow
4. Upon successfully finish or error, clear the field to allow additional cluster upgrades
5. We will need to provide "force" option in cluster upgrade role (which mean also in Ansible module and RESTAPI) to forcefully execute cluster upgrade, which may be required if for example Ansible process if forcefully killed. But usage of this parameter will be logged into audit log and additional issues

Above change means change in database, RESTAPI, Ansible module and finally cluster-upgrade role, which is unlikely to happen till downstream GA, so targeting to 4.4 and let's hope we will be able to backport into some later 4.3.z

Comment 8 Martin Perina 2019-03-08 14:21:22 UTC

*** Bug 1686808 has been marked as a duplicate of this bug. ***

Comment 9 Scott Dickerson 2019-03-12 03:34:31 UTC

Note: See BZ1687645

Comment 10 Petr Kubica 2019-04-03 11:53:23 UTC

Verified in 
ovirt-ansible-cluster-upgrade-1.1.13-1.el7ev.noarch
ovirt-engine-4.3.3.1-0.1.el7.noarch

Comment 11 Sandro Bonazzola 2019-04-16 13:58:18 UTC

This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.