Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1663626

Summary: [RFE] block simultaneously running cluster upgrades
Product: [oVirt] ovirt-engine Reporter: Greg Sheremeta <gshereme>
Component: BLL.InfraAssignee: Martin Perina <mperina>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Kubica <pkubica>
Severity: high Docs Contact:
Priority: high    
Version: 4.3.0CC: bugs, lleistne, michal.skrivanek, mperina, mtessun, nashok, omachace, rnori, sdickers
Target Milestone: ovirt-4.3.3Keywords: FutureFeature
Target Release: ---Flags: pm-rhel: ovirt-4.3+
mtessun: planning_ack+
mperina: devel_ack+
lleistne: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.3.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-16 13:58:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Demo of the confirmation dialog
none
Demo of the confirmation dialog (gif) none

Description Greg Sheremeta 2019-01-05 17:51:25 UTC
Description of problem:
[RFE] block simultaneously running cluster upgrades

There needs to be a lock such that only one cluster upgrade can ever run at a time per cluster. It could cause bad environmental consequences if multiple instances of the upgrade role were running simultaneously. Even if it somehow succeeded on the backend, the notification stream in the Admin Portal would be confusing.

Version-Release number of selected component (if applicable):
4.3, master

How reproducible:
always

Steps to Reproduce:
1. Run cluster upgrade
2. Quickly run cluster upgrade again on that same cluster before the previous upgrade finishes

Actual results:
You can run cluster upgrade again on that same cluster before the previous upgrade finishes

Expected results:
You should not be able to run cluster upgrade again on that same cluster before any existing upgrade on that cluster finishes.

Additional info:
The role can (and often should) be run manually and/or via Tower, so we need some shared lock, maybe at role or playbook level. Engine would also need to gracefully detect that failure-due-to-lock condition.

Comment 1 Greg Sheremeta 2019-01-05 17:54:37 UTC
@Ondra and Martin, can you share your thoughts?

Comment 2 Greg Sheremeta 2019-01-09 19:41:11 UTC
@Ondra and Martin, same comment as in https://bugzilla.redhat.com/show_bug.cgi?id=1664844.
This is kind of a mostly infra team feature :) I have it on UX now and Scott can lead, but let's see how it progresses. If heavy role work is needed, perhaps Ondra can assist or lead.

Comment 3 Scott Dickerson 2019-02-09 04:52:35 UTC
Created attachment 1528289 [details]
Demo of the confirmation dialog

This attachment is a demo gif of the new spinner and "cluster_maintenance" warning dialog.

Comment 4 Scott Dickerson 2019-02-09 04:53:38 UTC
Created attachment 1528290 [details]
Demo of the confirmation dialog (gif)

This attachment is a demo gif of the new spinner and "cluster_maintenance" warning dialog.

Comment 5 Scott Dickerson 2019-02-09 04:57:57 UTC
See the demo for how the confirm dialog works.  The cluster upgrade role is unchanged but the dialog should deter people from running the operation twice (assuming they did not uncheck the box change cluster to maintenance mode on the options step).

Comment 6 Greg Sheremeta 2019-02-20 16:19:42 UTC
Ondra agreed to take over the backend part. Assigning to him for now.

Comment 7 Martin Perina 2019-02-20 19:18:20 UTC
I don't see any other reliable way how to prevent 2 simultanoeus cluster upgrade processes running on the same cluster other than:

1. Create a field in cluster entity indicating that cluster upgrade is currently running for the cluster
2. Each when cluster upgrade is going to be executed (either from Ansible playbook or UI), we will check to the backend if there is no other cluster upgrade process running on the same flow (if it is fail this new cluster upgrade process)
3. If not, mark the field to indicate cluster upgrade and continue with the cluster upgrade flow
4. Upon successfully finish or error, clear the field to allow additional cluster upgrades
5. We will need to provide "force" option in cluster upgrade role (which mean also in Ansible module and RESTAPI) to forcefully execute cluster upgrade, which may be required if for example Ansible process if forcefully killed. But usage of this parameter will be logged into audit log and additional issues

Above change means change in database, RESTAPI, Ansible module and finally cluster-upgrade role, which is unlikely to happen till downstream GA, so targeting to 4.4 and let's hope we will be able to backport into some later 4.3.z

Comment 8 Martin Perina 2019-03-08 14:21:22 UTC
*** Bug 1686808 has been marked as a duplicate of this bug. ***

Comment 9 Scott Dickerson 2019-03-12 03:34:31 UTC
Note: See BZ1687645

Comment 10 Petr Kubica 2019-04-03 11:53:23 UTC
Verified in 
ovirt-ansible-cluster-upgrade-1.1.13-1.el7ev.noarch
ovirt-engine-4.3.3.1-0.1.el7.noarch

Comment 11 Sandro Bonazzola 2019-04-16 13:58:18 UTC
This bugzilla is included in oVirt 4.3.3 release, published on April 16th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.