Bug 1579513
| Summary: | 3.9 Upgrade Fails When Masters Are On Older Version Than Available | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Matthew Robson <mrobson> | ||||
| Component: | Installer | Assignee: | Russell Teague <rteague> | ||||
| Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 3.9.0 | CC: | acomabon, aos-bugs, bbilgin, boris.ruppert, dmoessne, dzhukous, jack.ottofaro, jiajliu, jkaur, jokerman, mbarnes, mmccomas, mnozell, rhowe, rteague, szobair, wmeng | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.9.z | ||||||
| Hardware: | All | ||||||
| OS: | All | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause: yum command would install the latest version available of any dependent package which resulted in the latest version of node packages to be installed.
Fix: Ansible task was updated to include all related node packages with the version specified.
Result: Expected version was installed instead of latest.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-07-03 12:23:14 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Workaround, exclude undesirable versions by adding entries to /etc/yum.conf. example of excluding 3.9.27 and 3.9.29 to get 3.9.25 [root@ose3-master ~]# grep exclude /etc/yum.conf exclude= docker*1.20* docker*1.19* docker*1.18* docker*1.17* docker*1.16* docker*1.15* docker*1.14* *atomic-openshift*3.9.27* *atomic-openshift*3.9.29* [root@ose3-master ~]# atomic-openshift-excluder unexclude [root@ose3-master ~]# repoquery --plugins --quiet atomic-openshift-3.9* atomic-openshift-0:3.9.25-1.git.0.6bc473e.el7.x86_64 Waiting for a 3.9 build with f8f497e2bcb088553447c36974779a7c43483384 openshift-ansible-3.9.30-1 A known issue at https://bugzilla.redhat.com/show_bug.cgi?id=1556740 *** Bug 1556740 has been marked as a duplicate of this bug. *** *** Bug 1585603 has been marked as a duplicate of this bug. *** The suggested workaround is not solving the problem. Details below:
[ansiblesvd@ul-pca-dm-mas01 ~]$ ansible -i inventory/hosts all -m shell -a "repoquery --plugins --quiet atomic-openshift-3.9*"
ul-pca-dm-inf01.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64
ul-pca-dm-mas01.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64
ul-pca-dm-inf02.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64
ul-pca-dm-mas02.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64
ul-pca-dm-mas03.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64
ul-pca-dm-nod02.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64
ul-pca-dm-nod01.ul.ca | SUCCESS | rc=0 >>
atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64
[Snipped from the deploy_cluster output]
...
TASK [openshift_version : set_fact] *******************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca]
TASK [openshift_version : debug] **********************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca] => {
"openshift_release": "3.9"
}
TASK [openshift_version : debug] **********************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca] => {
"openshift_image_tag": "v3.9.27"
}
TASK [openshift_version : debug] **********************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca] => {
"openshift_pkg_version": "-3.9.27"
}
TASK [openshift_version : debug] **********************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca] => {
"openshift_version": "3.9.27"
}
TASK [debug] ******************************************************************************************************************************
ok: [ul-pca-dm-mas01.ul.ca] => {
"msg": "openshift_pkg_version set to -3.9.27"
}
PLAY [Set openshift_version for etcd, node, and master hosts] *****************************************************************************
TASK [Gathering Facts] ********************************************************************************************************************
ok: [ul-pca-dm-inf01.ul.ca]
ok: [ul-pca-dm-mas03.ul.ca]
ok: [ul-pca-dm-inf02.ul.ca]
ok: [ul-pca-dm-mas02.ul.ca]
ok: [ul-pca-dm-nod01.ul.ca]
ok: [ul-pca-dm-nod02.ul.ca]
TASK [set_fact] ***************************************************************************************************************************
ok: [ul-pca-dm-mas02.ul.ca]
ok: [ul-pca-dm-mas03.ul.ca]
ok: [ul-pca-dm-inf01.ul.ca]
ok: [ul-pca-dm-inf02.ul.ca]
ok: [ul-pca-dm-nod01.ul.ca]
ok: [ul-pca-dm-nod02.ul.ca]
PLAY [Ensure the requested version packages are available.] *******************************************************************************
TASK [Gathering Facts] ********************************************************************************************************************
ok: [ul-pca-dm-inf01.ul.ca]
ok: [ul-pca-dm-mas03.ul.ca]
ok: [ul-pca-dm-mas02.ul.ca]
ok: [ul-pca-dm-inf02.ul.ca]
ok: [ul-pca-dm-nod01.ul.ca]
ok: [ul-pca-dm-nod02.ul.ca]
TASK [include_role] ***********************************************************************************************************************
TASK [openshift_version : Check openshift_version for rpm installation] *******************************************************************
included: /usr/share/ansible/openshift-ansible/roles/openshift_version/tasks/check_available_rpms.yml for ul-pca-dm-mas02.ul.ca, ul-pca-dm-mas03.ul.ca, ul-pca-dm-inf01.ul.ca, ul-pca-dm-inf02.ul.ca, ul-pca-dm-nod01.ul.ca, ul-pca-dm-nod02.ul.ca
TASK [openshift_version : Get available atomic-openshift version] *************************************************************************
ok: [ul-pca-dm-inf01.ul.ca]
ok: [ul-pca-dm-mas03.ul.ca]
ok: [ul-pca-dm-inf02.ul.ca]
ok: [ul-pca-dm-mas02.ul.ca]
ok: [ul-pca-dm-nod01.ul.ca]
ok: [ul-pca-dm-nod02.ul.ca]
TASK [openshift_version : fail] ***********************************************************************************************************
skipping: [ul-pca-dm-mas02.ul.ca]
skipping: [ul-pca-dm-mas03.ul.ca]
skipping: [ul-pca-dm-inf01.ul.ca]
skipping: [ul-pca-dm-inf02.ul.ca]
skipping: [ul-pca-dm-nod01.ul.ca]
skipping: [ul-pca-dm-nod02.ul.ca]
TASK [openshift_version : Fail if rpm version and docker image version are different] *****************************************************
fatal: [ul-pca-dm-mas02.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
fatal: [ul-pca-dm-mas03.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
fatal: [ul-pca-dm-inf01.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
fatal: [ul-pca-dm-inf02.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
fatal: [ul-pca-dm-nod01.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
fatal: [ul-pca-dm-nod02.ul.ca]: FAILED! => {"changed": false, "failed": true, "msg": "OCP rpm version 3.9.30 is different from OCP image version 3.9.27"}
[WARNING]: Could not create retry file '/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'. [Errno 13]
Permission denied: u'/usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.retry'
PLAY RECAP ********************************************************************************************************************************
localhost : ok=11 changed=0 unreachable=0 failed=0
ul-pca-dm-inf01.ul.ca : ok=20 changed=0 unreachable=0 failed=1
ul-pca-dm-inf02.ul.ca : ok=20 changed=0 unreachable=0 failed=1
ul-pca-dm-mas01.ul.ca : ok=34 changed=0 unreachable=0 failed=0
ul-pca-dm-mas02.ul.ca : ok=24 changed=0 unreachable=0 failed=1
ul-pca-dm-mas03.ul.ca : ok=24 changed=0 unreachable=0 failed=1
ul-pca-dm-nod01.ul.ca : ok=20 changed=0 unreachable=0 failed=1
ul-pca-dm-nod02.ul.ca : ok=20 changed=0 unreachable=0 failed=1
INSTALLER STATUS **************************************************************************************************************************
Initialization : In Progress (0:00:35)
Failure summary:
1. Hosts: ul-pca-dm-inf01.ul.ca, ul-pca-dm-inf02.ul.ca, ul-pca-dm-mas02.ul.ca, ul-pca-dm-mas03.ul.ca, ul-pca-dm-nod01.ul.ca, ul-pca-dm-nod02.ul.ca
Play: Ensure the requested version packages are available.
Task: Fail if rpm version and docker image version are different
Message: OCP rpm version 3.9.30 is different from OCP image version 3.9.27
I found that the parameter "ignore_excluders: true" in /usr/share/ansible/openshift-ansible/roles/openshift_version/tasks/check_available_rpms.yml" is not respecting our package exclusion.
I changed it to false and it solved the problem.
- name: Get available {{ openshift_service_type}} version
repoquery:
name: "{{ openshift_service_type}}{{ '-' ~ openshift_release ~ '*' if openshift_release is defined else '' }}"
ignore_excluders: false
register: rpm_results
Any comments on that parameter?
From the sosreport in case 02119689 we see this is where the packages get updated ansible-command[31461]: Invoked with warn=True executable=None _uses_shell=False _raw_params=yum install -y atomic-openshift-3.9.27 atomic-openshift-node-3.9.27 atomic-openshift-sdn-ovs-3.9.27 atomic-openshift-clients-3.9.27 PyYAML removes=None creates=None chdir=None stdin=None ansible-command[31461]: [WARNING] Consider using yum module rather than running yum yum[31462]: Installed: libtalloc-2.1.10-1.el7.x86_64 yum[31462]: Installed: libtdb-1.3.15-1.el7.x86_64 yum[31462]: Installed: libtevent-0.9.33-2.el7.x86_64 groupadd[31473]: group added to /etc/group: name=printadmin, GID=992 groupadd[31473]: group added to /etc/gshadow: name=printadmin groupadd[31473]: new group: name=printadmin, GID=992 yum[31462]: Installed: samba-common-4.7.1-6.el7.noarch yum[31462]: Installed: libldb-1.2.2-1.el7.x86_64 yum[31462]: Installed: samba-common-libs-4.7.1-6.el7.x86_64 yum[31462]: Installed: libwbclient-4.7.1-6.el7.x86_64 yum[31462]: Installed: samba-client-libs-4.7.1-6.el7.x86_64 yum[31462]: Installed: cifs-utils-6.2-10.el7.x86_64 yum[31462]: Updated: atomic-openshift-clients-3.9.30-1.git.0.dec1ba7.el7.x86_64 yum[31462]: Updated: atomic-openshift-3.9.30-1.git.0.dec1ba7.el7.x86_64 yum[31462]: Installed: atomic-openshift-node-3.9.30-1.git.0.dec1ba7.el7.x86_64 yum[31462]: Updated: atomic-openshift-sdn-ovs-3.9.30-1.git.0.dec1ba7.el7.x86_64 yum[31462]: Updated: atomic-openshift-master-3.9.30-1.git.0.dec1ba7.el7.x86_64 systemd[1]: Reloading. systemd[1]: Started Flexible Branding Service. systemd[1]: Starting Flexible Branding Service... systemd[1]: Reloading. systemd[1]: Started Flexible Branding Service. systemd[1]: Starting Flexible Branding Service... yum[31462]: Erased: tuned-profiles-atomic-openshift-node-3.7.42-1.git.0.5a85d33.el7.x86_64 It's either happening because we've obsoleted tuned-profiles-atomic-openshift-node or because the master package is installed on this host which is not a master. I'll test out these two theories, hopefully it's the latter rather than the former. Yeah, it's happening because the host has atomic-openshift-master but it's not a master host. Non master hosts should not have the master package installed on them. I imagine this happened manually at some point. Remove that package and the upgrade should behave as expected. I'll attach logs from where I've tested this to prove my point. There's a few other unexpected packages on the host that we should cleanup too as they may cause problems in similar ways in the future. These packages are meant for installation only in the container images that we ship. atomic-openshift-cluster-capacity-3.7.42-1.git.0.5a85d33.el7.x86_64 atomic-openshift-dockerregistry-3.7.23-1.git.4.0634b89.el7.x86_64 atomic-openshift-pod-3.7.23-1.git.4.0634b89.el7.x86_64 atomic-openshift-service-catalog-3.7.42-1.git.0.5a85d33.el7.x86_64 atomic-openshift-template-service-broker-3.7.42-1.git.0.5a85d33.el7.x86_64 atomic-openshift-tests-3.7.23-1.git.4.0634b89.el7.x86_64 If you want to test what will happen on a host you can run this command after enabling the 3.9 repo, verify that all the packages are the versions you desire. yum install --disableexcludes=* atomic-openshift-3.9.27 atomic-openshift-node-3.9.27 atomic-openshift-sdn-ovs-3.9.27 atomic-openshift-clients-3.9.27 PyYAML Created attachment 1451000 [details]
Testing package upgrades with master package installed
Version: openshift-ansible-3.9.31-1.git.34.154617d.el7.noarch scenario 1: 1. Install ocp v3.7 2. Enable v3.8 and v3.9.30 repos on all hosts 3. Upgrade master to v3.9.30(with openshift_release=v3.9 set in inventory file) 4. Enable v3.9.31(latest) repos on all hosts 5. Upgrade node to v3.9(with openshift_release=v3.9 set in inventory file) Upgrade succeed with node was upgraded to v3.9.30 except excluder(This is another issue, will be tracked in another bug) scenario 2: 1. Install ocp v3.7 2. Enable v3.8 and v3.9.30 repos on all hosts 3. Upgrade master to v3.9.30(with openshift_release=v3.9 set in inventory file) 4. Enable v3.9.31(latest) repos on all hosts 5. Upgrade node to v3.9(with openshift_release=v3.9 and openshift_pkg_version=-3.9.30 set in inventory file) Upgrade succeed with node was upgraded to v3.9.30. This was resolved in openshift-ansible-3.9.31. *** Bug 1616439 has been marked as a duplicate of this bug. *** |
Description of problem: Failure summary: 1. Hosts: xxx,xxx,xxx,xxx Play: Ensure the requested version packages are available. Task: Fail if rpm version and docker image version are different Message: OCP rpm version 3.9.27 is different from OCP image version 3.9.25 real 36m9.508s user 15m52.722s sys 6m48.987s grep 3.9 /etc/ansible/hosts openshift_pkg_version=-3.9.25 openshift_image_tag=v3.9.25 openshift_metrics_image_version=v3.9.25 openshift_logging_image_version=v3.9.25 openshift_release=3.9 Upgraded the masters yesterday to .25 and was continuing with the nodes today. The .27 errata came out in between. check_available_rpms.yml does a repo query what only returns the latest version: - name: Get available {{ openshift_service_type}} version repoquery: name: "{{ openshift_service_type}}{{ '-' ~ openshift_release ~ '*' if openshift_release is defined else '' }}" ignore_excluders: true register: rpm_results repoquery --plugins --quiet atomic-openshift-3.9* atomic-openshift-0:3.9.27-1.git.0.964617d.el7.x86_64 masters_and_nodes.yml compares that which is different from the openshift_version now. - block: - name: Check openshift_version for rpm installation include_tasks: check_available_rpms.yml - name: Fail if rpm version and docker image version are different fail: msg: "OCP rpm version {{ rpm_results.results.versions.available_versions.0 }} is different from OCP image version {{ openshift_version }}" # Both versions have the same string representation when: rpm_results.results.versions.available_versions.0 != openshift_version # block when when: not openshift_is_atomic | bool Version-Release number of selected component (if applicable): 3.9.25 / 3.9.27 How reproducible: Always Steps to Reproduce: 1. Masters on one version 2. New release 3. Try upgrading nodes Actual results: Upgrades fails Expected results: It adheres to the set version we're trying to stick to. Additional info: Description of problem: Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag