1598822 – RFE - A 'openshift_node_upgrade_pre_drain_hook' hook available during install / upgrade

Bug 1598822 - RFE - A 'openshift_node_upgrade_pre_drain_hook' hook available during install / upgrade

Summary: RFE - A 'openshift_node_upgrade_pre_drain_hook' hook available during install...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Michael Gugino
QA Contact:	Weihua Meng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-06 14:08 UTC by Brian Dooley
Modified:	2019-02-20 14:11 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-02-20 14:11:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0326	0	None	None	None	2019-02-20 14:11:07 UTC

Description Brian Dooley 2018-07-06 14:08:42 UTC

1. Proposed title of this feature request
'openshift_node_upgrade_pre_drain_hook' hook available during install / upgrade

3. What is the nature and description of the request?
In OCP v3.7, it was possible to implement ansible hooks into upgrade procedures, which was useful as it allowed the running of custom tasks for every node before the drain and after the node upgrade.

In OCP v3.9, it is no longer possible to implement hooks pre drain, see https://access.redhat.com/documentation/en-us/openshift_container_platform/3.9/html/release_notes/release-notes-ocp-3-9-release-notes for the bugzilla 1572786.

Customer is requesting a 'openshift_node_upgrade_pre_drain_hook' feature so that they can implement hooks before the drain as they used to be able to do.

4. Why does the customer need this? (List the business requirements here)
Customer has custom tasks that need to be performed before the drain while upgrading.

5. How would the customer like to achieve this? (List the functional requirements here)
Have a 'openshift_node_upgrade_pre_drain_hook' hook or reverse the change in 3.9.
Otherwise, suggest a workaround for this.

6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.
Perform an upgrade, with a task automatically run pre drain during the process.

7. Is there already an existing RFE upstream or in Red Hat Bugzilla?
No.

8. Does the customer have any specific timeline dependencies and which release would they like to target (i.e. RHEL5, RHEL6)?
ASAP.

9. Is the sales team involved in this request and do they have any additional input?
Not to my knowledge

10. List any affected packages or components.
OCP 3.9
Ansible

11. Would the customer be able to assist in testing this functionality if implemented?
I am sure they would be willing to test this out, or test a possible workaround, as it is quite important to them

Comment 1 Scott Dodson 2018-07-06 14:35:14 UTC

I feel like we should just move openshift_node_upgrade_pre_hook ahead of the drain. I'm not sure this was possible in 3.7 as described however.

Can the customer provide some info on what sort of tasks they'd like to achieve prior to draining the node? I'm curious about the use case.

Comment 2 Brian Dooley 2018-07-11 16:18:30 UTC

I have requested more information from the customer and they have responded as follows

______________________________________________________________________________________
Before I answer your question: I hate to say it, but I don't think changing the sequence of the openshift_node_upgrade_pre_hook before the drain will satisfy all customers. Sure, it will help us a lot. But it was already like this on v3.7 until someone submitted an enhancement request and now you want to set it back because of an enhancement request of another customer? As I have said before - the best solution would be an additional hook like 'openshift_node_upgrade_pre_drain_hook'. Anyhow - changing the sequence again is be a solution I personally would be happy with...

In our cluster we have two dedicated router nodes. To keep the microservices available, only one of these nodes can be upgraded at a time. Before upgrading (before the drain), the host (node) is taken down in the external loadbalancer host group via a script, to avoid connection rquests being sent to a non-functioning host, otherwise these connections result in "Connection reset by peer" errors. This step and setting our monitoring for this particular host to 'maintenance mode' would be in the openshift_node_upgrade_pre_hook. After the upgrade and the haproxy pod started again, the host gets activated in the external loadbalancer host group again via the openshift_node_upgrade_post_hook. After this the playbook continues with the next node. For availability reasons, only one host can be taken out of the loadbalancer hostgroup at a time, so it isn't possible to just take both hosts down before the upgrade.
______________________________________________________________________________________

Comment 5 Michael Gugino 2018-11-29 17:39:40 UTC

This is simple enough to implement, I will implement in 3.11 and backport to 3.9.  I'm unsure if this feature will survive in 4.0, but is likely we'll need to support something similar even if by different name.

Comment 6 Michael Gugino 2018-12-03 17:18:48 UTC

PR Created in 3.11: https://github.com/openshift/openshift-ansible/pull/10811

Comment 7 Scott Dodson 2019-01-24 17:51:25 UTC

In openshift-ansible-3.11.61-1 and later

Comment 8 Weihua Meng 2019-01-28 08:11:13 UTC

Fixed.

openshift-ansible-3.11.75-1.git.0.95e8e2a.el7.noarch

TASK [debug] ******************************************************************************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_nodes.yml:35
Monday 28 January 2019  16:00:25 +0800 (0:00:01.091)       0:18:36.663 ********
ok: [wmengr310-node-registry-router-1.0128-zua.qe.rhcloud.com] => {
    "msg": "Running node pre-drain-upgrade hook /root/wmeng/openshift_node_upgrade_pre_drain_hook.yml"
}

TASK [include_tasks] **********************************************************************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_nodes.yml:38
Monday 28 January 2019  16:00:25 +0800 (0:00:00.070)       0:18:36.733 ********
included: /root/wmeng/openshift_node_upgrade_pre_drain_hook.yml for wmengr310-node-registry-router-1.0128-zua.qe.rhcloud.com

TASK [Note openshift_node_upgrade_pre_drain_hook starts] **********************************************************************************************************************************************************
task path: /root/wmeng/openshift_node_upgrade_pre_drain_hook.yml:2
Monday 28 January 2019  16:00:25 +0800 (0:00:00.072)       0:18:36.805 ********
ok: [wmengr310-node-registry-router-1.0128-zua.qe.rhcloud.com] => {
    "msg": "openshift_node_upgrade_pre_drain_hook of wmengr310-node-registry-router-1.0128-zua.qe.rhcloud.com is about to start"
}

TASK [Drain Node for Kubelet upgrade] *****************************************************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_nodes.yml:41

Comment 10 errata-xmlrpc 2019-02-20 14:11:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0326

Note You need to log in before you can comment on or make changes to this bug.