Bug 1663626 - [RFE] block simultaneously running cluster upgrades
Summary: [RFE] block simultaneously running cluster upgrades
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.3.3
: ---
Assignee: Martin Perina
QA Contact: Petr Kubica
URL:
Whiteboard:
: 1686808 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-05 17:51 UTC by Greg Sheremeta
Modified: 2019-04-16 13:58 UTC (History)
9 users (show)

Fixed In Version: ovirt-engine-4.3.3.1
Clone Of:
Environment:
Last Closed: 2019-04-16 13:58:18 UTC
oVirt Team: Infra
Embargoed:
pm-rhel: ovirt-4.3+
mtessun: planning_ack+
mperina: devel_ack+
lleistne: testing_ack+


Attachments (Terms of Use)
Demo of the confirmation dialog (2.34 MB, video/webm)
2019-02-09 04:52 UTC, Scott Dickerson
no flags Details
Demo of the confirmation dialog (gif) (2.02 MB, image/gif)
2019-02-09 04:53 UTC, Scott Dickerson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 97661 0 master MERGED Check and warn if a Cluster is in cluster_maintenance 2020-04-22 12:12:15 UTC
oVirt gerrit 98164 0 master MERGED engine : Add upgrade_running to cluster 2020-04-22 12:12:15 UTC
oVirt gerrit 98527 0 master MERGED engine: Introduce StartClusterUpgrade and FinishClusterUpgrade commands 2020-04-22 12:12:15 UTC
oVirt gerrit 98528 0 master MERGED restapi: Add upgrade action to cluster 2020-04-22 12:12:15 UTC
oVirt gerrit 98529 0 master MERGED Add Upgrade action to cluster 2020-04-22 12:12:15 UTC
oVirt gerrit 98640 0 master MERGED engine : Add service to reset upgrade_running flag of cluster 2020-04-22 12:12:16 UTC
oVirt gerrit 98713 0 ovirt-engine-4.3 MERGED engine : Add upgrade_running to cluster 2020-04-22 12:12:16 UTC
oVirt gerrit 98714 0 ovirt-engine-4.3 MERGED engine: Introduce StartClusterUpgrade and FinishClusterUpgrade commands 2020-04-22 12:12:16 UTC
oVirt gerrit 98715 0 ovirt-engine-4.3 MERGED engine : Add service to reset upgrade_running flag of cluster 2020-04-22 12:12:16 UTC
oVirt gerrit 98855 0 ovirt-engine-4.3 MERGED restapi: Add upgrade action to cluster 2020-04-22 12:12:16 UTC
oVirt gerrit 98872 0 master MERGED restapi: Update to model 4.3.22 2020-04-22 12:12:16 UTC
oVirt gerrit 98879 0 ovirt-engine-4.3 MERGED restapi: Update to model 4.3.22 2020-04-22 12:12:16 UTC

Description Greg Sheremeta 2019-01-05 17:51:25 UTC
Description of problem:
[RFE] block simultaneously running cluster upgrades

There needs to be a lock such that only one cluster upgrade can ever run at a time per cluster. It could cause bad environmental consequences if multiple instances of the upgrade role were running simultaneously. Even if it somehow succeeded on the backend, the notification stream in the Admin Portal would be confusing.

Version-Release number of selected component (if applicable):
4.3, master

How reproducible:
always

Steps to Reproduce:
1. Run cluster upgrade
2. Quickly run cluster upgrade again on that same cluster before the previous upgrade finishes

Actual results:
You can run cluster upgrade again on that same cluster before the previous upgrade finishes

Expected results:
You should not be able to run cluster upgrade again on that same cluster before any existing upgrade on that cluster finishes.

Additional info:
The role can (and often should) be run manually and/or via Tower, so we need some shared lock, maybe at role or playbook level. Engine would also need to gracefully detect that failure-due-to-lock condition.

Comment 1 Greg Sheremeta 2019-01-05 17:54:37 UTC
@Ondra and Martin, can you share your thoughts?

Comment 2 Greg Sheremeta 2019-01-09 19:41:11 UTC
@Ondra and Martin, same comment as in https://bugzilla.redhat.com/show_bug.cgi?id=1664844.
This is kind of a mostly infra team feature :) I have it on UX now and Scott can lead, but let's see how it progresses. If heavy role work is needed, perhaps Ondra can assist or lead.

Comment 3 Scott Dickerson 2019-02-09 04:52:35 UTC
Created attachment 1528289 [details]
Demo of the confirmation dialog

This attachment is a demo gif of the new spinner and "cluster_maintenance" warning dialog.

Comment 4 Scott Dickerson 2019-02-09 04:53:38 UTC
Created attachment 1528290 [details]
Demo of the confirmation dialog (gif)

This attachment is a demo gif of the new spinner and "cluster_maintenance" warning dialog.

Comment 5 Scott Dickerson 2019-02-09 04:57:57 UTC
See the demo for how the confirm dialog works.  The cluster upgrade role is unchanged but the dialog should deter people from running the operation twice (assuming they did not uncheck the box change cluster to maintenance mode on the options step).

Comment 6 Greg Sheremeta 2019-02-20 16:19:42 UTC
Ondra agreed to take over the backend part. Assigning to him for now.

Comment 7 Martin Perina 2019-02-20 19:18:20 UTC
I don't see any other reliable way how to prevent 2 simultanoeus cluster upgrade processes running on the same cluster other than:

1. Create a field in cluster entity indicating that cluster upgrade is currently running for the cluster
2. Each when cluster upgrade is going to be executed (either from Ansible playbook or UI), we will check to the backend if there is no other cluster upgrade process running on the same flow (if it is fail this new cluster upgrade process)
3. If not, mark the field to indicate cluster upgrade and continue with the cluster upgrade flow
4. Upon successfully finish or error, clear the field to allow additional cluster upgrades
5. We will need to provide "force" option in cluster upgrade role (which mean also in Ansible module and RESTAPI) to forcefully execute cluster upgrade, which may be required if for example Ansible process if forcefully killed. But usage of this parameter will be logged into audit log and additional issues

Above change means change in database, RESTAPI, Ansible module and finally cluster-upgrade role, which is unlikely to happen till downstream GA, so targeting to 4.4 and let's hope we will be able to backport into some later 4.3.z

Comment 8 Martin Perina 2019-03-08 14:21:22 UTC
*** Bug 1686808 has been marked as a duplicate of this bug. ***

Comment 9 Scott Dickerson 2019-03-12 03:34:31 UTC
Note: See BZ1687645

Comment 10 Petr Kubica 2019-04-03 11:53:23 UTC
Verified in 
ovirt-ansible-cluster-upgrade-1.1.13-1.el7ev.noarch
ovirt-engine-4.3.3.1-0.1.el7.noarch

Comment 11 Sandro Bonazzola 2019-04-16 13:58:18 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.