Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1579513

Summary: 3.9 Upgrade Fails When Masters Are On Older Version Than Available
Product: OpenShift Container Platform Reporter: Matthew Robson <mrobson>
Component: InstallerAssignee: Russell Teague <rteague>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.9.0CC: acomabon, aos-bugs, bbilgin, boris.ruppert, dmoessne, dzhukous, jack.ottofaro, jiajliu, jkaur, jokerman, mbarnes, mmccomas, mnozell, rhowe, rteague, szobair, wmeng
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: yum command would install the latest version available of any dependent package which resulted in the latest version of node packages to be installed. Fix: Ansible task was updated to include all related node packages with the version specified. Result: Expected version was installed instead of latest.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-03 12:23:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Testing package upgrades with master package installed none

Description Matthew Robson 2018-05-17 20:13:16 UTC
Description of problem:

Failure summary:


  1. Hosts:    xxx,xxx,xxx,xxx
     Play:     Ensure the requested version packages are available.
     Task:     Fail if rpm version and docker image version are different
     Message:  OCP rpm version 3.9.27 is different from OCP image version 3.9.25

real    36m9.508s
user    15m52.722s
sys     6m48.987s

grep 3.9 /etc/ansible/hosts
openshift_pkg_version=-3.9.25
openshift_image_tag=v3.9.25
openshift_metrics_image_version=v3.9.25
openshift_logging_image_version=v3.9.25
openshift_release=3.9

Upgraded the masters yesterday to .25 and was continuing with the nodes today. The .27 errata came out in between.

check_available_rpms.yml does a repo query what only returns the latest version:

- name: Get available {{ openshift_service_type}} version
  repoquery:
    name: "{{ openshift_service_type}}{{ '-' ~ openshift_release ~ '*' if openshift_release is defined else '' }}"
    ignore_excluders: true
  register: rpm_results

repoquery --plugins --quiet atomic-openshift-3.9*
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64

masters_and_nodes.yml compares that which is different from the openshift_version now.

- block:
  - name: Check openshift_version for rpm installation
    include_tasks: check_available_rpms.yml
  - name: Fail if rpm version and docker image version are different
    fail:
      msg: "OCP rpm version {{ rpm_results.results.versions.available_versions.0 }} is different from OCP image version {{ openshift_version }}"
    # Both versions have the same string representation
    when: rpm_results.results.versions.available_versions.0 != openshift_version
  # block when
  when: not openshift_is_atomic | bool


Version-Release number of selected component (if applicable):

3.9.25 / 3.9.27

How reproducible:

Always

Steps to Reproduce:
1. Masters on one version
2. New release
3. Try upgrading nodes 

Actual results:

Upgrades fails

Expected results:

It adheres to the set version we're trying to stick to.

Additional info:


Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Scott Dodson 2018-05-17 20:58:24 UTC
Workaround, exclude undesirable versions by adding entries to /etc/yum.conf.

example of excluding 3.9.27 and 3.9.29 to get 3.9.25

[root@ose3-master ~]# grep exclude /etc/yum.conf                                                                                                                                                                                               
exclude= docker*1.20*  docker*1.19*  docker*1.18*  docker*1.17*  docker*1.16*  docker*1.15*  docker*1.14* *atomic-openshift*3.9.27* *atomic-openshift*3.9.29*
[root@ose3-master ~]# atomic-openshift-excluder unexclude                                                                                                                                                                                      
[root@ose3-master ~]# repoquery --plugins --quiet atomic-openshift-3.9*      
atomic-openshift-0:3.9.25-1.git.0.6bc473e.el7.x86_64

Comment 10 Russell Teague 2018-05-18 20:39:34 UTC
Proposed: https://github.com/openshift/openshift-ansible/pull/8446

Comment 11 Russell Teague 2018-05-25 12:22:10 UTC
Waiting for a 3.9 build with f8f497e2bcb088553447c36974779a7c43483384

Comment 12 Russell Teague 2018-05-29 12:27:44 UTC
openshift-ansible-3.9.30-1

Comment 13 liujia 2018-05-30 05:23:14 UTC
A known issue at https://bugzilla.redhat.com/show_bug.cgi?id=1556740

Comment 14 Russell Teague 2018-05-30 12:52:38 UTC
*** Bug 1556740 has been marked as a duplicate of this bug. ***

Comment 16 Scott Dodson 2018-06-04 12:07:29 UTC
*** Bug 1585603 has been marked as a duplicate of this bug. ***

Comment 25 Shah Zobair 2018-06-11 14:37:35 UTC
The suggested workaround is not solving the problem. Details below:

[ansiblesvd@ul-pca-dm-mas01 ~]$ ansible -i inventory/hosts all -m shell -a "repoquery --plugins --quiet atomic-openshift-3.9*"
ul-pca-dm-inf01.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64

ul-pca-dm-mas01.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64

ul-pca-dm-inf02.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64

ul-pca-dm-mas02.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64

ul-pca-dm-mas03.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64

ul-pca-dm-nod02.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64

ul-pca-dm-nod01.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64


[Snipped from the deploy_cluster output]
...

TASK [openshift_version : set_fact] *******************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca]

TASK [openshift_version : debug] **********************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca] => {
    "openshift_release": "3.9"
}

TASK [openshift_version : debug] **********************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca] => {
    "openshift_image_tag": "v3.9.27"
}

TASK [openshift_version : debug] **********************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca] => {
    "openshift_pkg_version": "-3.9.27"
}

TASK [openshift_version : debug] **********************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca] => {
    "openshift_version": "3.9.27"
}

TASK [debug] ******************************************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca] => {
    "msg": "openshift_pkg_version set to -3.9.27"
}

PLAY [Set openshift_version for etcd, node, and master hosts] *****************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************
ok: [ul-pca-dm-inf01.ul.ca]
ok: [ul-pca-dm-mas03.ul.ca]
ok: [ul-pca-dm-inf02.ul.ca]
ok: [ul-pca-dm-mas02.ul.ca]
ok: [ul-pca-dm-nod01.ul.ca]
ok: [ul-pca-dm-nod02.ul.ca]

TASK [set_fact] ***************************************************************************************************************************
ok: [ul-pca-dm-mas02.ul.ca]
ok: [ul-pca-dm-mas03.ul.ca]
ok: [ul-pca-dm-inf01.ul.ca]
ok: [ul-pca-dm-inf02.ul.ca]
ok: [ul-pca-dm-nod01.ul.ca]
ok: [ul-pca-dm-nod02.ul.ca]

PLAY [Ensure the requested version packages are available.] *******************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************
ok: [ul-pca-dm-inf01.ul.ca]
ok: [ul-pca-dm-mas03.ul.ca]
ok: [ul-pca-dm-mas02.ul.ca]
ok: [ul-pca-dm-inf02.ul.ca]
ok: [ul-pca-dm-nod01.ul.ca]
ok: [ul-pca-dm-nod02.ul.ca]

TASK [include_role] ***********************************************************************************************************************

TASK [openshift_version : Check openshift_version for rpm installation] *******************************************************************
included: /usr/share/ansible/openshift-ansible/roles/openshift_version/tasks/check_available_rpms.yml for ul-pca-dm-mas02.ul.ca, ul-pca-dm-mas03.ul.ca, ul-pca-dm-inf01.ul.ca, ul-pca-dm-inf02.ul.ca, ul-pca-dm-nod01.ul.ca, ul-pca-dm-nod02.ul.ca

TASK [openshift_version : Get available atomic-openshift version] *************************************************************************
ok: [ul-pca-dm-inf01.ul.ca]
ok: [ul-pca-dm-mas03.ul.ca]
ok: [ul-pca-dm-inf02.ul.ca]
ok: [ul-pca-dm-mas02.ul.ca]
ok: [ul-pca-dm-nod01.ul.ca]
ok: [ul-pca-dm-nod02.ul.ca]

TASK [openshift_version : fail] ***********************************************************************************************************
skipping: [ul-pca-dm-mas02.ul.ca]
skipping: [ul-pca-dm-mas03.ul.ca]
skipping: [ul-pca-dm-inf01.ul.ca]
skipping: [ul-pca-dm-inf02.ul.ca]
skipping: [ul-pca-dm-nod01.ul.ca]
skipping: [ul-pca-dm-nod02.ul.ca]

TASK [openshift_version : Fail if rpm version and docker image version are different] *****************************************************
fatal: [ul-pca-dm-mas02.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
fatal: [ul-pca-dm-mas03.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
fatal: [ul-pca-dm-inf01.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
fatal: [ul-pca-dm-inf02.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
fatal: [ul-pca-dm-nod01.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
fatal: [ul-pca-dm-nod02.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
 [WARNING]: Could not create retry file '/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'.         [Errno 13]
Permission denied: u'/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'


PLAY RECAP ********************************************************************************************************************************
localhost                  : ok=11   changed=0    unreachable=0    failed=0   
ul-pca-dm-inf01.ul.ca      : ok=20   changed=0    unreachable=0    failed=1   
ul-pca-dm-inf02.ul.ca      : ok=20   changed=0    unreachable=0    failed=1   
ul-pca-dm-mas01.ul.ca      : ok=34   changed=0    unreachable=0    failed=0   
ul-pca-dm-mas02.ul.ca      : ok=24   changed=0    unreachable=0    failed=1   
ul-pca-dm-mas03.ul.ca      : ok=24   changed=0    unreachable=0    failed=1   
ul-pca-dm-nod01.ul.ca      : ok=20   changed=0    unreachable=0    failed=1   
ul-pca-dm-nod02.ul.ca      : ok=20   changed=0    unreachable=0    failed=1   


INSTALLER STATUS **************************************************************************************************************************
Initialization             : In Progress (0:00:35)



Failure summary:


  1. Hosts:    ul-pca-dm-inf01.ul.ca, ul-pca-dm-inf02.ul.ca, ul-pca-dm-mas02.ul.ca, ul-pca-dm-mas03.ul.ca, ul-pca-dm-nod01.ul.ca, ul-pca-dm-nod02.ul.ca
     Play:     Ensure the requested version packages are available.
     Task:     Fail if rpm version and docker image version are different
     Message:  OCP rpm version 3.9.30 is different from OCP image version 3.9.27

Comment 26 Shah Zobair 2018-06-11 15:07:46 UTC
I found that the parameter "ignore_excluders: true" in /usr/share/ansible/openshift-ansible/roles/openshift_version/tasks/check_available_rpms.yml" is not respecting our package exclusion.

I changed it to false and it solved the problem. 


- name: Get available {{ openshift_service_type}} version
  repoquery:
    name: "{{ openshift_service_type}}{{ '-' ~ openshift_release ~ '*' if openshift_release is defined else '' }}"
    ignore_excluders: false
  register: rpm_results


Any comments on that parameter?

Comment 28 Scott Dodson 2018-06-13 16:01:54 UTC
From the sosreport in case 02119689 we see this is where the packages get updated

ansible-command[31461]: Invoked with warn=True executable=None _uses_shell=False _raw_params=yum install -y atomic-openshift-3.9.27 atomic-openshift-node-3.9.27 atomic-openshift-sdn-ovs-3.9.27 atomic-openshift-clients-3.9.27 PyYAML removes=None creates=None chdir=None stdin=None
ansible-command[31461]: [WARNING] Consider using yum module rather than running yum 

yum[31462]: Installed: libtalloc-2.1.10-1.el7.x86_64
yum[31462]: Installed: libtdb-1.3.15-1.el7.x86_64
yum[31462]: Installed: libtevent-0.9.33-2.el7.x86_64
groupadd[31473]: group added to /etc/group: name=printadmin, GID=992
groupadd[31473]: group added to /etc/gshadow: name=printadmin
groupadd[31473]: new group: name=printadmin, GID=992
yum[31462]: Installed: samba-common-4.7.1-6.el7.noarch
yum[31462]: Installed: libldb-1.2.2-1.el7.x86_64
yum[31462]: Installed: samba-common-libs-4.7.1-6.el7.x86_64
yum[31462]: Installed: libwbclient-4.7.1-6.el7.x86_64
yum[31462]: Installed: samba-client-libs-4.7.1-6.el7.x86_64
yum[31462]: Installed: cifs-utils-6.2-10.el7.x86_64
yum[31462]: Updated: atomic-openshift-clients-3.9.30-1.git.0.dec1ba7.el7.x86_64
yum[31462]: Updated: atomic-openshift-3.9.30-1.git.0.dec1ba7.el7.x86_64
yum[31462]: Installed: atomic-openshift-node-3.9.30-1.git.0.dec1ba7.el7.x86_64
yum[31462]: Updated: atomic-openshift-sdn-ovs-3.9.30-1.git.0.dec1ba7.el7.x86_64
yum[31462]: Updated: atomic-openshift-master-3.9.30-1.git.0.dec1ba7.el7.x86_64
systemd[1]: Reloading.
systemd[1]: Started Flexible Branding Service. 
systemd[1]: Starting Flexible Branding Service...
systemd[1]: Reloading.
systemd[1]: Started Flexible Branding Service. 
systemd[1]: Starting Flexible Branding Service...
yum[31462]: Erased: tuned-profiles-atomic-openshift-node-3.7.42-1.git.0.5a85d33.el7.x86_64

It's either happening because we've obsoleted tuned-profiles-atomic-openshift-node or because the master package is installed on this host which is not a master. I'll test out these two theories, hopefully it's the latter rather than the former.

Comment 29 Scott Dodson 2018-06-13 17:18:57 UTC
Yeah, it's happening because the host has atomic-openshift-master but it's not a master host. Non master hosts should not have the master package installed on them. I imagine this happened manually at some point. Remove that package and the upgrade should behave as expected.

I'll attach logs from where I've tested this to prove my point.

There's a few other unexpected packages on the host that we should cleanup too as they may cause problems in similar ways in the future. These packages are meant for installation only in the container images that we ship.

atomic-openshift-cluster-capacity-3.7.42-1.git.0.5a85d33.el7.x86_64
atomic-openshift-dockerregistry-3.7.23-1.git.4.0634b89.el7.x86_64
atomic-openshift-pod-3.7.23-1.git.4.0634b89.el7.x86_64
atomic-openshift-service-catalog-3.7.42-1.git.0.5a85d33.el7.x86_64
atomic-openshift-template-service-broker-3.7.42-1.git.0.5a85d33.el7.x86_64
atomic-openshift-tests-3.7.23-1.git.4.0634b89.el7.x86_64

If you want to test what will happen on a host you can run this command after enabling the 3.9 repo, verify that all the packages are the versions you desire.

yum install --disableexcludes=* atomic-openshift-3.9.27 atomic-openshift-node-3.9.27 atomic-openshift-sdn-ovs-3.9.27 atomic-openshift-clients-3.9.27 PyYAML

Comment 30 Scott Dodson 2018-06-13 17:22:00 UTC
Created attachment 1451000 [details]
Testing package upgrades with master package installed

Comment 31 liujia 2018-06-19 09:26:41 UTC
Version: openshift-ansible-3.9.31-1.git.34.154617d.el7.noarch

scenario 1:
1. Install ocp v3.7
2. Enable v3.8 and v3.9.30 repos on all hosts
3. Upgrade master to v3.9.30(with openshift_release=v3.9 set in inventory file)
4. Enable v3.9.31(latest) repos on all hosts
5. Upgrade node to v3.9(with openshift_release=v3.9 set in inventory file)
Upgrade succeed with node was upgraded to v3.9.30 except excluder(This is another issue, will be tracked in another bug)

scenario 2:
1. Install ocp v3.7
2. Enable v3.8 and v3.9.30 repos on all hosts
3. Upgrade master to v3.9.30(with openshift_release=v3.9 set in inventory file)
4. Enable v3.9.31(latest) repos on all hosts
5. Upgrade node to v3.9(with openshift_release=v3.9 and openshift_pkg_version=-3.9.30 set in inventory file)
Upgrade succeed with node was upgraded to v3.9.30.

Comment 32 Scott Dodson 2018-07-03 12:23:14 UTC
This was resolved in openshift-ansible-3.9.31.

Comment 33 Matthew Barnes 2018-08-20 18:30:11 UTC
*** Bug 1616439 has been marked as a duplicate of this bug. ***