Description of problem: Customer reports that while attempting to upgrade a containerized installation of OCP version 3.7.57 to 3.9.41 fails at task "Fail when OpenShift is not installed" TASK [Fail when OpenShift is not installed] ********************************************************************************************************************************************** task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre/verify_upgrade_targets.yml:2 skipping: [master1.example.com] => { "changed": false, "skip_reason": "Conditional result was False", "skipped": true } skipping: [master2.example.com] => { "changed": false, "skip_reason": "Conditional result was False", "skipped": true } skipping: [master3.example.com] => { "changed": false, "skip_reason": "Conditional result was False", "skipped": true } fatal: [node1.example.com]: FAILED! => { "changed": false, "failed": true, "msg": "Verify OpenShift is already installed" } fatal: [node2.example.com]: FAILED! => { "changed": false, "failed": true, "msg": "Verify OpenShift is already installed" } [...] Failure summary: 1. Hosts: node1.example.com, node2.example.com Play: Verify upgrade targets Task: Fail when OpenShift is not installed Message: Verify OpenShift is already installed Checking on https://github.com/openshift/openshift-ansible/blob/release-3.9/roles/openshift_facts/library/openshift_facts.py#L903-L921 it looks like this is where the fact gets defined from, however /etc/sysconfig/atomic-openshift-node seems to have the IMAGE_VERSION properly set: # cat /etc/sysconfig/atomic-openshift-node OPTIONS=--loglevel=0 CONFIG_FILE=/etc/origin/node/node-config.yaml IMAGE_VERSION=v3.7.57 Customer tried to remove the facts from /etc/ansible/facts.d and then retry the upgrade procedure in case there was a corrupted fact file, but got same outcome. Version-Release number of the following components: # rpm -qa | grep atomic atomic-openshift-clients-3.9.41-1.git.0.67432b0.el7.x86_64 bash-4.2$ ansible --version ansible 2.4.6.0 config file = /opt/myuser/home/openshift-leun/.ansible.cfg configured module search path = [u'/opt/myuser/home/openshift-leun/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /bin/ansible python version = 2.7.5 (default, May 31 2018, 09:41:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] How reproducible: I could not reproduce it by myself. Steps to Reproduce: 1. I could not reproduce it by myself. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: It should define this fact correctly and upgrade finish smoothly.
https://github.com/openshift/openshift-ansible/pull/10413
The key root cause is openshift_facts is getting version from "openshift/ose" hard-coded image. I could reproduce this bug with openshift-ansible-3.9.43-1.git.0.d0bc600.el7.noarch when upgrading a containerized cluster from 3.7.64 to 3.9.41. The pure node upgrade would failed at the step. TASK [Fail when OpenShift is not installed] ************************************ skipping: [host-8-251-31.host.centralci.eng.rdu2.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False"} fatal: [host-8-251-237.host.centralci.eng.rdu2.redhat.com]: FAILED! => {"changed": false, "msg": "Verify OpenShift is already installed"}
Verified this bug with openshift-ansible-3.9.48-1.git.0.09f6c01.el7.noarch, and PASS. openshift_image_tag=v3.9.41 openshift_release=v3.9 openshift_cockpit_deployer_prefix=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/testing/ocp3/ oreg_url=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/testing/ocp3/ose-${component}:${version} openshift_docker_additional_registries=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000 openshift_docker_insecure_registries=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000 osm_etcd_image=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/rhel7/etcd osm_image=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/testing/ocp3/ose osn_image=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/testing/ocp3/node osn_ovs_image=host-8-241-45.host.centralci.eng.rdu2.redhat.com:5000/testing/ocp3/openvswitch PLAY [Verify upgrade targets] ************************************************** TASK [Gathering Facts] ********************************************************* ok: [host-8-251-237.host.centralci.eng.rdu2.redhat.com] ok: [host-8-251-31.host.centralci.eng.rdu2.redhat.com] TASK [include_tasks] *********************************************************** included: /home/slave2/workspace/Run-Ansible-Playbooks-Nextge/private-openshift-ansible/playbooks/common/openshift-cluster/upgrades/pre/verify_upgrade_targets.yml for host-8-251-31.host.centralci.eng.rdu2.redhat.com, host-8-251-237.host.centralci.eng.rdu2.redhat.com TASK [Fail when OpenShift is not installed] ************************************ skipping: [host-8-251-31.host.centralci.eng.rdu2.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False"} skipping: [host-8-251-237.host.centralci.eng.rdu2.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False"} After upgrade, check: [root@host-172-16-122-61 ~]# oc get node NAME STATUS ROLES AGE VERSION 172.16.122.35 Ready compute 32m v1.9.1+a0ce1bc657 172.16.122.61 Ready master 34m v1.9.1+a0ce1bc657 [root@host-172-16-122-61 ~]# oc version oc v3.9.41 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://172.16.122.61:8443 openshift v3.9.41 kubernetes v1.9.1+a0ce1bc657
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748