1. Proposed title of this feature request 'openshift_node_upgrade_pre_drain_hook' hook available during install / upgrade 3. What is the nature and description of the request? In OCP v3.7, it was possible to implement ansible hooks into upgrade procedures, which was useful as it allowed the running of custom tasks for every node before the drain and after the node upgrade. In OCP v3.9, it is no longer possible to implement hooks pre drain, see https://access.redhat.com/documentation/en-us/openshift_container_platform/3.9/html/release_notes/release-notes-ocp-3-9-release-notes for the bugzilla 1572786. Customer is requesting a 'openshift_node_upgrade_pre_drain_hook' feature so that they can implement hooks before the drain as they used to be able to do. 4. Why does the customer need this? (List the business requirements here) Customer has custom tasks that need to be performed before the drain while upgrading. 5. How would the customer like to achieve this? (List the functional requirements here) Have a 'openshift_node_upgrade_pre_drain_hook' hook or reverse the change in 3.9. Otherwise, suggest a workaround for this. 6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented. Perform an upgrade, with a task automatically run pre drain during the process. 7. Is there already an existing RFE upstream or in Red Hat Bugzilla? No. 8. Does the customer have any specific timeline dependencies and which release would they like to target (i.e. RHEL5, RHEL6)? ASAP. 9. Is the sales team involved in this request and do they have any additional input? Not to my knowledge 10. List any affected packages or components. OCP 3.9 Ansible 11. Would the customer be able to assist in testing this functionality if implemented? I am sure they would be willing to test this out, or test a possible workaround, as it is quite important to them
I feel like we should just move openshift_node_upgrade_pre_hook ahead of the drain. I'm not sure this was possible in 3.7 as described however. Can the customer provide some info on what sort of tasks they'd like to achieve prior to draining the node? I'm curious about the use case.
I have requested more information from the customer and they have responded as follows ______________________________________________________________________________________ Before I answer your question: I hate to say it, but I don't think changing the sequence of the openshift_node_upgrade_pre_hook before the drain will satisfy all customers. Sure, it will help us a lot. But it was already like this on v3.7 until someone submitted an enhancement request and now you want to set it back because of an enhancement request of another customer? As I have said before - the best solution would be an additional hook like 'openshift_node_upgrade_pre_drain_hook'. Anyhow - changing the sequence again is be a solution I personally would be happy with... In our cluster we have two dedicated router nodes. To keep the microservices available, only one of these nodes can be upgraded at a time. Before upgrading (before the drain), the host (node) is taken down in the external loadbalancer host group via a script, to avoid connection rquests being sent to a non-functioning host, otherwise these connections result in "Connection reset by peer" errors. This step and setting our monitoring for this particular host to 'maintenance mode' would be in the openshift_node_upgrade_pre_hook. After the upgrade and the haproxy pod started again, the host gets activated in the external loadbalancer host group again via the openshift_node_upgrade_post_hook. After this the playbook continues with the next node. For availability reasons, only one host can be taken out of the loadbalancer hostgroup at a time, so it isn't possible to just take both hosts down before the upgrade. ______________________________________________________________________________________
This is simple enough to implement, I will implement in 3.11 and backport to 3.9. I'm unsure if this feature will survive in 4.0, but is likely we'll need to support something similar even if by different name.
PR Created in 3.11: https://github.com/openshift/openshift-ansible/pull/10811
In openshift-ansible-3.11.61-1 and later
Fixed. openshift-ansible-3.11.75-1.git.0.95e8e2a.el7.noarch TASK [debug] ****************************************************************************************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_nodes.yml:35 Monday 28 January 2019 16:00:25 +0800 (0:00:01.091) 0:18:36.663 ******** ok: [wmengr310-node-registry-router-1.0128-zua.qe.rhcloud.com] => { "msg": "Running node pre-drain-upgrade hook /root/wmeng/openshift_node_upgrade_pre_drain_hook.yml" } TASK [include_tasks] ********************************************************************************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_nodes.yml:38 Monday 28 January 2019 16:00:25 +0800 (0:00:00.070) 0:18:36.733 ******** included: /root/wmeng/openshift_node_upgrade_pre_drain_hook.yml for wmengr310-node-registry-router-1.0128-zua.qe.rhcloud.com TASK [Note openshift_node_upgrade_pre_drain_hook starts] ********************************************************************************************************************************************************** task path: /root/wmeng/openshift_node_upgrade_pre_drain_hook.yml:2 Monday 28 January 2019 16:00:25 +0800 (0:00:00.072) 0:18:36.805 ******** ok: [wmengr310-node-registry-router-1.0128-zua.qe.rhcloud.com] => { "msg": "openshift_node_upgrade_pre_drain_hook of wmengr310-node-registry-router-1.0128-zua.qe.rhcloud.com is about to start" } TASK [Drain Node for Kubelet upgrade] ***************************************************************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_nodes.yml:41
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0326