Bug 1559143 - RFE - Need a way to upgrade OS during upgrade
Summary: RFE - Need a way to upgrade OS during upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.10.0
Assignee: Scott Dodson
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks: 1572786 1572798
TreeView+ depends on / blocked
 
Reported: 2018-03-21 19:15 UTC by Matt Woodson
Modified: 2019-02-28 18:36 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
You may now define a set of hooks to run arbitrary tasks during the node upgrade process. To implement these hooks set openshift_node_upgrade_pre_hook, openshift_node_upgrade_hook, or openshift_node_upgrade_post_hook to the path of the task file you wish to execute. The openshift_node_upgrade_pre_hook hook is executed after draining the node and before it has been upgraded. The openshift_node_upgrade_hook is executed after the node has been drained and packages updated but before it's marked schedulable again. The openshift_node_upgrade_post_hook hook is executed after the node has been marked schedulable immediately before moving on to other nodes.
Clone Of:
: 1572786 (view as bug list)
Environment:
Last Closed: 2018-07-30 19:10:48 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:11:27 UTC

Description Matt Woodson 2018-03-21 19:15:35 UTC
Description of problem:

Since the upgrade does an in-place upgrade, we would really like to be able to upgrade the base os (operating system, RHEL) while the node is drained and not running.  As it current stands, we need to drain a node 2x during an upgrade to get Openshift and then RHEL upgraded.

If we don't have an option, we'd like to have a hook that we can plug in to and run while openshift node is not running and the node is drained.

Comment 2 Gaoyun Pei 2018-05-04 10:18:42 UTC
Verify this bug with openshift-ansible-3.10.0-0.32.0.git.0.bb50d68.el7.noarch.

With node pre-upgrade hook, we could do OS upgrade after node is unschedulable and drained, also with upgrade hook which would run after node is upgraded and before being schedulable again, we could also finished a server reboot.


Add the hooks definition in ansible inventory  while doing upgrade:

openshift_node_upgrade_pre_hook=/root/workspace/pre_node.yml
openshift_node_upgrade_hook=/root/workspace/node.yml

[root@gpei-preserved ~]# cat /root/workspace/pre_node.yml
---
- name: Note the start of node OS upgrade
  debug:
      msg: "Node OS upgrade of {{ inventory_hostname }} is about to start"

- name: Upgrade the OS
  yum: name=* state=latest

- name: 
  debug:
      msg: "OS upgrade of {{ inventory_hostname }} finished"

[root@gpei-preserved ~]# cat /root/workspace/node.yml
- name: Note the reboot of node
  debug:
      msg: "Node {{ inventory_hostname }} is upgraded, going to be rebooted..."

- name: Restart server
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  become: true
  ignore_errors: true

- name: Waiting for the server to come back
  wait_for_connection:
    delay: 120
    timeout: 300

- name: Ensure that required services are running
  service:
    name: "{{ item }}"
    state: started
    enabled: yes
  with_items:
    - docker
    - atomic-openshift-node.service
    - dnsmasq


Run 3.9 -> 3.10 upgrade, hooks were executed successfully. And the whole upgrade is finished.

TASK [Drain Node for Kubelet upgrade] ***************************************************************************************************************************************changed: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com -> qe-gpei-391-master-etcd-1.0503-rlo.qe.rhcloud.com] => {"attempts": 1, "changed": true, "cmd": ["oc", "adm", "drain", "qe-gpei-391-node-registry-router-1", "--config=/etc/origin/master/admin.kubeconfig", "--force", "--delete-local-data", "--ignore-daemonsets", "--timeout=0s"], 
...
"node \"qe-gpei-391-node-registry-router-1\" drained"]}

TASK [debug] ****************************************************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {
    "failed": false, 
    "msg": "Running node pre-upgrade hook /root/workspace/pre_node.yml"
}

TASK [include_tasks] ********************************************************************************************************************************************************
included: /root/workspace/pre_node.yml for qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com

TASK [Note the start of node OS upgrade] ************************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {
    "failed": false,
    "msg": "Node OS upgrade of qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com is about to start"
}

TASK [Upgrade the OS] *******************************************************************************************************************************************************
changed: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {"changed": true, "failed": false, "msg": "", "rc": 0, "results": ["Loaded plugins: product-id, search-disabled-repos, 
...
Updated:\n  python-setuptools.noarch 0:17.1.1-4.el7                                       \n\nReplaced:\n  python-urllib3.noarch 0:1.10.2-5.el7                                          \n\nComplete!\n"]}


TASK [debug] ****************************************************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {
    "failed": false,
    "msg": "OS upgrade of qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com finished"
}

...

TASK [debug] ****************************************************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {
    "failed": false, 
    "msg": "Running node upgrade hook /root/workspace/node.yml"
}

TASK [include_tasks] ********************************************************************************************************************************************************
included: /root/workspace/node.yml for qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com

TASK [Note the reboot of node] **********************************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {
    "failed": false, 
    "msg": "Node qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com is upgraded, going to be rebooted..."
}

TASK [Restart server] *******************************************************************************************************************************************************
changed: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {"ansible_job_id": "840647723302.122367", "changed": true, "failed": false, "finished": 0, "results_
file": "/root/.ansible_async/840647723302.122367", "started": 1}

TASK [Waiting for the server to come back] **********************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {"changed": false, "elapsed": 123, "failed": false}

TASK [Ensure that required services are running] ****************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => (item=docker) => {"changed": false, "enabled": true, "failed": false, "item": "docker", "name": "docker", "state": "started", "status": 

...

Comment 4 errata-xmlrpc 2018-07-30 19:10:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.