Bug 1572798

Summary:	[3.7] RFE - Need a way to upgrade OS during upgrade
Product:	OpenShift Container Platform	Reporter:	Scott Dodson <sdodson>
Component:	Cluster Version Operator	Assignee:	Scott Dodson <sdodson>
Status:	CLOSED ERRATA	QA Contact:	Gaoyun Pei <gpei>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.7.1	CC:	aos-bugs, gpei, jiajliu, jokerman, mmccomas, mwoodson, nraghava, wmeng
Target Milestone:	---	Keywords:	OpsBlocker
Target Release:	3.7.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	You may now define a set of hooks to run arbitrary tasks during the node upgrade process. To implement these hooks set openshift_node_upgrade_pre_hook, openshift_node_upgrade_hook, or openshift_node_upgrade_post_hook to the path of the task file you wish to execute. The openshift_node_upgrade_pre_hook hook is executed after draining the node and before it has been upgraded. The openshift_node_upgrade_hook is executed after the node has been drained and packages updated but before it's marked schedulable again. The openshift_node_upgrade_post_hook hook is executed after the node has been marked schedulable immediately before moving on to other nodes.	Story Points:	---
Clone Of:	1572786	Environment:
Last Closed:	2018-05-18 03:54:46 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1559143, 1572786
Bug Blocks:

Description Scott Dodson 2018-04-27 21:42:35 UTC

+++ This bug was initially created as a clone of Bug #1572786 +++

+++ This bug was initially created as a clone of Bug #1559143 +++

Description of problem:

Since the upgrade does an in-place upgrade, we would really like to be able to upgrade the base os (operating system, RHEL) while the node is drained and not running.  As it current stands, we need to drain a node 2x during an upgrade to get Openshift and then RHEL upgraded.

If we don't have an option, we'd like to have a hook that we can plug in to and run while openshift node is not running and the node is drained.

--- Additional comment from Scott Dodson on 2018-04-26 14:55:21 EDT ---

For master / 3.10

https://github.com/openshift/openshift-ansible/pull/7743
https://github.com/openshift/openshift-ansible/pull/7736

--- Additional comment from Scott Dodson on 2018-04-27 16:57:31 EDT ---

community provided backport pr
https://github.com/openshift/openshift-ansible/pull/8095

Should be in openshift-ansible-3.9.27-1 and later

Comment 1 Scott Dodson 2018-04-27 21:44:50 UTC

https://github.com/openshift/openshift-ansible/pull/8094

Comment 3 Gaoyun Pei 2018-05-08 06:50:00 UTC

Verify this bug with openshift-ansible-3.7.46-1.git.0.37f607e.el7.noarch

With node pre-upgrade hook, we could do OS upgrade after node is unschedulable and drained, with upgrade hook which would run after node is upgraded and before being schedulable again, we could also finished a server reboot.


Add the hooks definition in ansible inventory  while doing upgrade:

openshift_node_upgrade_pre_hook=/root/workspace/pre_node.yml
openshift_node_upgrade_hook=/root/workspace/node.yml

[root@gpei-preserved ~]# cat /root/workspace/pre_node.yml
---
- name: Note the start of node OS upgrade
  debug:
      msg: "Node OS upgrade of {{ inventory_hostname }} is about to start"

- name: Upgrade the OS
  yum: name=* state=latest

- name: 
  debug:
      msg: "OS upgrade of {{ inventory_hostname }} finished"

[root@gpei-preserved ~]# cat /root/workspace/node.yml
- name: Note the reboot of node
  debug:
      msg: "Node {{ inventory_hostname }} is upgraded, going to be rebooted..."

- name: Restart server
  shell: sleep 2 && shutdown -r now "Ansible updates triggered"
  async: 1
  poll: 0
  become: true
  ignore_errors: true

- name: Waiting for the server to come back
  wait_for_connection:
    delay: 120
    timeout: 300

- name: Ensure that required services are running
  service:
    name: "{{ item }}"
    state: started
    enabled: yes
  with_items:
    - docker
    - atomic-openshift-node.service
    - dnsmasq



Run 3.6 -> 3.7 upgrade, /root/workspace/pre_node.yml and /root/workspace/node.yml were executed on each node successfully. And the whole upgrade is finished.

Comment 6 errata-xmlrpc 2018-05-18 03:54:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1576