Bug 1634004 - Upgrade to OCP 3.9 fails at task "Fail when OpenShift is not installed"
Summary: Upgrade to OCP 3.9 fails at task "Fail when OpenShift is not installed"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.9.0
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.9.z
Assignee: Patrick Dillon
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-28 12:00 UTC by Joel Rosental R.
Modified: 2022-03-13 15:38 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: openshift_facts uses default ose images to determine version in containerized install. When customer uses custom images and internal registry, the default images cannot be pulled. Consequence: openshift.common.version is left empty and upgrade fails on task Fail when OpenShift is not installed. Fix: if there is a custom_image name pass it to openshift_facts to use. Result: openshift.common.version is set in containerized installs with registry mirror and installation succeeds.
Clone Of:
Environment:
Last Closed: 2018-12-13 19:27:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3748 0 None None None 2018-12-13 19:27:15 UTC

Description Joel Rosental R. 2018-09-28 12:00:57 UTC
Description of problem:

Customer reports that while attempting to upgrade a containerized installation of OCP version 3.7.57 to 3.9.41 fails at task "Fail when OpenShift is not installed"

TASK [Fail when OpenShift is not installed] **********************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre/verify_upgrade_targets.yml:2
skipping: [master1.example.com] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
skipping: [master2.example.com] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
skipping: [master3.example.com] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
fatal: [node1.example.com]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "msg": "Verify OpenShift is already installed"
}
fatal: [node2.example.com]: FAILED! => {
    "changed": false, 
    "failed": true, 
    "msg": "Verify OpenShift is already installed"
}

[...]

Failure summary:


  1. Hosts:    node1.example.com, node2.example.com
     Play:     Verify upgrade targets
     Task:     Fail when OpenShift is not installed
     Message:  Verify OpenShift is already installed


Checking on https://github.com/openshift/openshift-ansible/blob/release-3.9/roles/openshift_facts/library/openshift_facts.py#L903-L921 it looks like this is where the fact gets defined from, however /etc/sysconfig/atomic-openshift-node seems to have the IMAGE_VERSION properly set:

# cat /etc/sysconfig/atomic-openshift-node
OPTIONS=--loglevel=0
CONFIG_FILE=/etc/origin/node/node-config.yaml
IMAGE_VERSION=v3.7.57

Customer tried to remove the facts from /etc/ansible/facts.d and then retry the upgrade procedure in case there was a corrupted fact file, but got same outcome.

Version-Release number of the following components:
# rpm -qa | grep atomic
atomic-openshift-clients-3.9.41-1.git.0.67432b0.el7.x86_64
bash-4.2$ ansible --version
ansible 2.4.6.0
  config file = /opt/myuser/home/openshift-leun/.ansible.cfg
  configured module search path = [u'/opt/myuser/home/openshift-leun/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /bin/ansible
  python version = 2.7.5 (default, May 31 2018, 09:41:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]


How reproducible:
I could not reproduce it by myself.

Steps to Reproduce:
1. I could not reproduce it by myself.


Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:
It should define this fact correctly and upgrade finish smoothly.

Comment 13 Johnny Liu 2018-10-26 16:26:11 UTC
The key root cause is openshift_facts is getting version from "openshift/ose" hard-coded image.

I could reproduce this bug with openshift-ansible-3.9.43-1.git.0.d0bc600.el7.noarch when upgrading a containerized cluster from 3.7.64 to 3.9.41. The pure node upgrade would failed at the step.

TASK [Fail when OpenShift is not installed] ************************************
skipping: [host-8-251-31.host.centralci.eng.rdu2.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False"}
fatal: [host-8-251-237.host.centralci.eng.rdu2.redhat.com]: FAILED! => {"changed": false, "msg": "Verify OpenShift is already installed"}

Comment 14 Johnny Liu 2018-10-26 17:04:07 UTC
Verified this bug with openshift-ansible-3.9.48-1.git.0.09f6c01.el7.noarch, and PASS.

openshift_image_tag=v3.9.41
openshift_release=v3.9
openshift_cockpit_deployer_prefix=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/testing/ocp3/
oreg_url=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/testing/ocp3/ose-${component}:${version}
openshift_docker_additional_registries=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000
openshift_docker_insecure_registries=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000
osm_etcd_image=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/rhel7/etcd
osm_image=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/testing/ocp3/ose
osn_image=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/testing/ocp3/node
osn_ovs_image=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/testing/ocp3/openvswitch


PLAY [Verify upgrade targets] **************************************************

TASK [Gathering Facts] *********************************************************
ok: [host-8-251-237.host.centralci.eng.rdu2.redhat.com]
ok: [host-8-251-31.host.centralci.eng.rdu2.redhat.com]

TASK [include_tasks] ***********************************************************
included: /home/slave2/workspace/Run-Ansible-Playbooks-Nextge/private-openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre/verify_upgrade_targets.yml for host-8-251-31.host.centralci.eng.rdu2.redhat.com, host-8-251-237.host.centralci.eng.rdu2.redhat.com

TASK [Fail when OpenShift is not installed] ************************************
skipping: [host-8-251-31.host.centralci.eng.rdu2.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False"}
skipping: [host-8-251-237.host.centralci.eng.rdu2.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False"}

After upgrade, check:
[root@host-172-16-122-61 ~]# oc get node
NAME            STATUS    ROLES     AGE       VERSION
172.16.122.35   Ready     compute   32m       v1.9.1+a0ce1bc657
172.16.122.61   Ready     master    34m       v1.9.1+a0ce1bc657
[root@host-172-16-122-61 ~]# oc version
oc v3.9.41
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.16.122.61:8443
openshift v3.9.41
kubernetes v1.9.1+a0ce1bc657

Comment 17 errata-xmlrpc 2018-12-13 19:27:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748


Note You need to log in before you can comment on or make changes to this bug.