Description of problem: Since the upgrade does an in-place upgrade, we would really like to be able to upgrade the base os (operating system, RHEL) while the node is drained and not running. As it current stands, we need to drain a node 2x during an upgrade to get Openshift and then RHEL upgraded. If we don't have an option, we'd like to have a hook that we can plug in to and run while openshift node is not running and the node is drained.
For master / 3.10 https://github.com/openshift/openshift-ansible/pull/7743 https://github.com/openshift/openshift-ansible/pull/7736
Verify this bug with openshift-ansible-3.10.0-0.32.0.git.0.bb50d68.el7.noarch. With node pre-upgrade hook, we could do OS upgrade after node is unschedulable and drained, also with upgrade hook which would run after node is upgraded and before being schedulable again, we could also finished a server reboot. Add the hooks definition in ansible inventory while doing upgrade: openshift_node_upgrade_pre_hook=/root/workspace/pre_node.yml openshift_node_upgrade_hook=/root/workspace/node.yml [root@gpei-preserved ~]# cat /root/workspace/pre_node.yml --- - name: Note the start of node OS upgrade debug: msg: "Node OS upgrade of {{ inventory_hostname }} is about to start" - name: Upgrade the OS yum: name=* state=latest - name: debug: msg: "OS upgrade of {{ inventory_hostname }} finished" [root@gpei-preserved ~]# cat /root/workspace/node.yml - name: Note the reboot of node debug: msg: "Node {{ inventory_hostname }} is upgraded, going to be rebooted..." - name: Restart server shell: sleep 2 && shutdown -r now "Ansible updates triggered" async: 1 poll: 0 become: true ignore_errors: true - name: Waiting for the server to come back wait_for_connection: delay: 120 timeout: 300 - name: Ensure that required services are running service: name: "{{ item }}" state: started enabled: yes with_items: - docker - atomic-openshift-node.service - dnsmasq Run 3.9 -> 3.10 upgrade, hooks were executed successfully. And the whole upgrade is finished. TASK [Drain Node for Kubelet upgrade] ***************************************************************************************************************************************changed: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com -> qe-gpei-391-master-etcd-1.0503-rlo.qe.rhcloud.com] => {"attempts": 1, "changed": true, "cmd": ["oc", "adm", "drain", "qe-gpei-391-node-registry-router-1", "--config=/etc/origin/master/admin.kubeconfig", "--force", "--delete-local-data", "--ignore-daemonsets", "--timeout=0s"], ... "node \"qe-gpei-391-node-registry-router-1\" drained"]} TASK [debug] **************************************************************************************************************************************************************** ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => { "failed": false, "msg": "Running node pre-upgrade hook /root/workspace/pre_node.yml" } TASK [include_tasks] ******************************************************************************************************************************************************** included: /root/workspace/pre_node.yml for qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com TASK [Note the start of node OS upgrade] ************************************************************************************************************************************ ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => { "failed": false, "msg": "Node OS upgrade of qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com is about to start" } TASK [Upgrade the OS] ******************************************************************************************************************************************************* changed: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {"changed": true, "failed": false, "msg": "", "rc": 0, "results": ["Loaded plugins: product-id, search-disabled-repos, ... Updated:\n python-setuptools.noarch 0:17.1.1-4.el7 \n\nReplaced:\n python-urllib3.noarch 0:1.10.2-5.el7 \n\nComplete!\n"]} TASK [debug] **************************************************************************************************************************************************************** ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => { "failed": false, "msg": "OS upgrade of qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com finished" } ... TASK [debug] **************************************************************************************************************************************************************** ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => { "failed": false, "msg": "Running node upgrade hook /root/workspace/node.yml" } TASK [include_tasks] ******************************************************************************************************************************************************** included: /root/workspace/node.yml for qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com TASK [Note the reboot of node] ********************************************************************************************************************************************** ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => { "failed": false, "msg": "Node qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com is upgraded, going to be rebooted..." } TASK [Restart server] ******************************************************************************************************************************************************* changed: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {"ansible_job_id": "840647723302.122367", "changed": true, "failed": false, "finished": 0, "results_ file": "/root/.ansible_async/840647723302.122367", "started": 1} TASK [Waiting for the server to come back] ********************************************************************************************************************************** ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {"changed": false, "elapsed": 123, "failed": false} TASK [Ensure that required services are running] **************************************************************************************************************************** ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => (item=docker) => {"changed": false, "enabled": true, "failed": false, "item": "docker", "name": "docker", "state": "started", "status": ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816