Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1559143

Summary: RFE - Need a way to upgrade OS during upgrade
Product: OpenShift Container Platform Reporter: Matt Woodson <mwoodson>
Component: Cluster Version OperatorAssignee: Scott Dodson <sdodson>
Status: CLOSED ERRATA QA Contact: Gaoyun Pei <gpei>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.7.0CC: aos-bugs, dmoessne, jokerman, mmccomas, nraghava, wmeng
Target Milestone: ---Keywords: OpsBlocker
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
You may now define a set of hooks to run arbitrary tasks during the node upgrade process. To implement these hooks set openshift_node_upgrade_pre_hook, openshift_node_upgrade_hook, or openshift_node_upgrade_post_hook to the path of the task file you wish to execute. The openshift_node_upgrade_pre_hook hook is executed after draining the node and before it has been upgraded. The openshift_node_upgrade_hook is executed after the node has been drained and packages updated but before it's marked schedulable again. The openshift_node_upgrade_post_hook hook is executed after the node has been marked schedulable immediately before moving on to other nodes.
Story Points: ---
Clone Of:
: 1572786 (view as bug list) Environment:
Last Closed: 2018-07-30 19:10:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1572786, 1572798    

Description Matt Woodson 2018-03-21 19:15:35 UTC
Description of problem:

Since the upgrade does an in-place upgrade, we would really like to be able to upgrade the base os (operating system, RHEL) while the node is drained and not running.  As it current stands, we need to drain a node 2x during an upgrade to get Openshift and then RHEL upgraded.

If we don't have an option, we'd like to have a hook that we can plug in to and run while openshift node is not running and the node is drained.

Comment 2 Gaoyun Pei 2018-05-04 10:18:42 UTC
Verify this bug with openshift-ansible-3.10.0-0.32.0.git.0.bb50d68.el7.noarch.

With node pre-upgrade hook, we could do OS upgrade after node is unschedulable and drained, also with upgrade hook which would run after node is upgraded and before being schedulable again, we could also finished a server reboot.


Add the hooks definition in ansible inventory  while doing upgrade:

openshift_node_upgrade_pre_hook=/root/workspace/pre_node.yml
openshift_node_upgrade_hook=/root/workspace/node.yml

[root@gpei-preserved ~]# cat /root/workspace/pre_node.yml
---
- name: Note the start of node OS upgrade
  debug:
      msg: "Node OS upgrade of {{ inventory_hostname }} is about to start"

- name: Upgrade the OS
  yum: name=* state=latest

- name: 
  debug:
      msg: "OS upgrade of {{ inventory_hostname }} finished"

[root@gpei-preserved ~]# cat /root/workspace/node.yml
- name: Note the reboot of node
  debug:
      msg: "Node {{ inventory_hostname }} is upgraded, going to be rebooted..."

- name: Restart server
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  become: true
  ignore_errors: true

- name: Waiting for the server to come back
  wait_for_connection:
    delay: 120
    timeout: 300

- name: Ensure that required services are running
  service:
    name: "{{ item }}"
    state: started
    enabled: yes
  with_items:
    - docker
    - atomic-openshift-node.service
    - dnsmasq


Run 3.9 -> 3.10 upgrade, hooks were executed successfully. And the whole upgrade is finished.

TASK [Drain Node for Kubelet upgrade] ***************************************************************************************************************************************changed: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com -> qe-gpei-391-master-etcd-1.0503-rlo.qe.rhcloud.com] => {"attempts": 1, "changed": true, "cmd": ["oc", "adm", "drain", "qe-gpei-391-node-registry-router-1", "--config=/etc/origin/master/admin.kubeconfig", "--force", "--delete-local-data", "--ignore-daemonsets", "--timeout=0s"], 
...
"node \"qe-gpei-391-node-registry-router-1\" drained"]}

TASK [debug] ****************************************************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {
    "failed": false, 
    "msg": "Running node pre-upgrade hook /root/workspace/pre_node.yml"
}

TASK [include_tasks] ********************************************************************************************************************************************************
included: /root/workspace/pre_node.yml for qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com

TASK [Note the start of node OS upgrade] ************************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {
    "failed": false,
    "msg": "Node OS upgrade of qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com is about to start"
}

TASK [Upgrade the OS] *******************************************************************************************************************************************************
changed: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {"changed": true, "failed": false, "msg": "", "rc": 0, "results": ["Loaded plugins: product-id, search-disabled-repos, 
...
Updated:\n  python-setuptools.noarch 0:17.1.1-4.el7                                       \n\nReplaced:\n  python-urllib3.noarch 0:1.10.2-5.el7                                          \n\nComplete!\n"]}


TASK [debug] ****************************************************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {
    "failed": false,
    "msg": "OS upgrade of qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com finished"
}

...

TASK [debug] ****************************************************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {
    "failed": false, 
    "msg": "Running node upgrade hook /root/workspace/node.yml"
}

TASK [include_tasks] ********************************************************************************************************************************************************
included: /root/workspace/node.yml for qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com

TASK [Note the reboot of node] **********************************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {
    "failed": false, 
    "msg": "Node qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com is upgraded, going to be rebooted..."
}

TASK [Restart server] *******************************************************************************************************************************************************
changed: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {"ansible_job_id": "840647723302.122367", "changed": true, "failed": false, "finished": 0, "results_
file": "/root/.ansible_async/840647723302.122367", "started": 1}

TASK [Waiting for the server to come back] **********************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => {"changed": false, "elapsed": 123, "failed": false}

TASK [Ensure that required services are running] ****************************************************************************************************************************
ok: [qe-gpei-391-node-registry-router-1.0503-rlo.qe.rhcloud.com] => (item=docker) => {"changed": false, "enabled": true, "failed": false, "item": "docker", "name": "docker", "state": "started", "status": 

...

Comment 4 errata-xmlrpc 2018-07-30 19:10:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816