Bug 1440167
| Summary: | Control Plane Upgrade Fails if Nodes Do Not Have Access To Latest Excluder | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Devan Goodwin <dgoodwin> |
| Component: | Cluster Version Operator | Assignee: | Jan Chaloupka <jchaloup> |
| Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.5.0 | CC: | aos-bugs, jchaloup, jiajliu, jokerman, mmccomas, sdodson |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openshift-ansible-3.5.78-1 | Doc Type: | Bug Fix |
| Doc Text: |
During the control plan upgrade subset of pre-check and verification tasks for upgrade is run. Unfortunately, the tasks were run over non-control plane nodes as well. Some of the tasks need excluders to be disabled in order to work properly. Given the excluders are disable on control plane hosts only, the tasks run over the remaining nodes caused a failure. With this fix all the pre-check and verification tasks are run over control plane nodes only.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-06-29 13:33:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1456093 | ||
| Bug Blocks: | 1436348 | ||
If I enable a 3.5 repo on these old nodes and then re-try, the excluder is upgraded to 3.5, but no other packages are affected. However the excluder is then *disabled* even on the old nodes, which should not be getting upgraded: [root@ded-stage-aws-node-compute-a39c2 ~]# atomic-openshift-excluder status unexclude -- At least one package not excluded Control plane upgrade now succeeds. This may be an acceptable workaround for now provided use of the 3.5 excluder will not cause problems on a 3.4 system. However long term: - old nodes should not need access to new openshift repos for a control plane upgrade - old nodes should not have rpms updated during a control plane upgrade - old nodes should not get their excluder disabled during a control plane upgrade Upstream PR: https://github.com/openshift/openshift-ansible/pull/3879, possible fix. With https://github.com/openshift/openshift-ansible/pull/4321 merged, control plane upgrade and nodes upgrade have separate pre-verification tasks now. Version: atomic-openshift-utils-3.5.82-1.git.0.e3e25f6.el7.noarch Step: 1. install ocp3.4(one master/node + one node) 2. ensure atomic-openshift-excluder and atomic-openshift-docker-excluder installed and enabled on all hosts 3. only enable 3.5 repo on master run upgrade_control_plane.yml to upgrade masters first # ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade_control_plane.yml Result: Upgrade master succeed with no failure. Excluders are upgraded only on master host. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1666 |
Description of problem: It appears that the control plane upgrade will attempt to perform excluder tasks on all nodes in the cluster. If those nodes do not have the new repo enabled (in this case for 3.5), they fail and while the masters are then upgraded relatively as expected, the overall ansible operation reports a failure much later due to the problems on those nodes. This is a problem particularly for blue green upgrades, old nodes should probably not be modifying their repos to access 3.5 (the new version), as they will not be using it. I suspect if the repos were available it would also try to update to 3.5 packages. Version-Release number of selected component (if applicable): openshift-ansible 3.5.48 How reproducible: I believe 100%. Steps to Reproduce: 1. Ensure only masters have 3.5 repos enabled, nodes should not. (as they will be replaced by new nodes) 2. Run control plane upgrade. Actual results: 2017-04-07 08:07:25,582 p=19738 u=dgoodwin | TASK [openshift_excluder : Evalute if docker excluder is to be enabled] ******** 2017-04-07 08:07:25,610 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-ec552] 2017-04-07 08:07:25,621 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-cbc60] 2017-04-07 08:07:25,643 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-28671] 2017-04-07 08:07:25,655 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-a39c2] 2017-04-07 08:07:25,667 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-f5ad4] 2017-04-07 08:07:25,667 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-037a7] 2017-04-07 08:07:25,667 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-bfd99] 2017-04-07 08:07:25,676 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-95fd7] 2017-04-07 08:07:25,687 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-01651] 2017-04-07 08:07:25,692 p=19738 u=dgoodwin | TASK [openshift_excluder : debug] ********************************************** 2017-04-07 08:07:25,720 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-ec552] => { "docker_excluder_on": true } 2017-04-07 08:07:25,735 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-cbc60] => { "docker_excluder_on": true } 2017-04-07 08:07:25,743 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-bfd99] => { "docker_excluder_on": true } 2017-04-07 08:07:25,753 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-28671] => { "docker_excluder_on": true } 2017-04-07 08:07:25,764 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-f5ad4] => { "docker_excluder_on": true } 2017-04-07 08:07:25,776 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-a39c2] => { "docker_excluder_on": true } 2017-04-07 08:07:25,777 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-037a7] => { "docker_excluder_on": true } 2017-04-07 08:07:25,779 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-95fd7] => { "docker_excluder_on": true } 2017-04-07 08:07:25,794 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-01651] => { "docker_excluder_on": true } 2017-04-07 08:07:25,799 p=19738 u=dgoodwin | TASK [openshift_excluder : Evalute if openshift excluder is to be enabled] ***** 2017-04-07 08:07:25,827 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-ec552] 2017-04-07 08:07:25,838 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-cbc60] 2017-04-07 08:07:25,859 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-28671] 2017-04-07 08:07:25,870 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-a39c2] 2017-04-07 08:07:25,882 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-f5ad4] 2017-04-07 08:07:25,883 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-bfd99] 2017-04-07 08:07:25,884 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-037a7] 2017-04-07 08:07:25,891 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-95fd7] 2017-04-07 08:07:25,903 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-01651] 2017-04-07 08:07:25,907 p=19738 u=dgoodwin | TASK [openshift_excluder : debug] ********************************************** 2017-04-07 08:07:25,943 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-cbc60] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,955 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-ec552] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,964 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-a39c2] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,973 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-bfd99] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,985 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-037a7] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,985 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-f5ad4] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,986 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-28671] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,993 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-95fd7] => { "openshift_excluder_on": true } 2017-04-07 08:07:26,001 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-01651] => { "openshift_excluder_on": true } 2017-04-07 08:07:26,005 p=19738 u=dgoodwin | TASK [openshift_excluder : Install docker excluder] **************************** 2017-04-07 08:07:35,580 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-compute-f5ad4]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:35,670 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-compute-bfd99]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:35,773 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-compute-037a7]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:35,889 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-compute-a39c2]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:36,149 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-infra-01651]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:36,478 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-infra-95fd7]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:44,225 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-ec552] 2017-04-07 08:07:45,199 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-cbc60] 2017-04-07 08:07:45,491 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-28671] 2017-04-07 08:07:45,496 p=19738 u=dgoodwin | TASK [openshift_excluder : Install openshift excluder] ************************* 2017-04-07 08:08:06,447 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-ec552] 2017-04-07 08:08:06,549 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-28671] 2017-04-07 08:08:06,553 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-cbc60] 2017-04-07 08:08:06,559 p=19738 u=dgoodwin | TASK [openshift_excluder : Check for docker-excluder] ************************** 2017-04-07 08:08:06,730 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-cbc60] 2017-04-07 08:08:06,738 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-ec552] 2017-04-07 08:08:06,742 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-28671] 2017-04-07 08:08:06,747 p=19738 u=dgoodwin | TASK [openshift_excluder : Enable docker excluder] ***************************** 2017-04-07 08:08:06,944 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-cbc60] 2017-04-07 08:08:06,951 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-28671] 2017-04-07 08:08:06,962 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-ec552] Much later the end result is: 2017-04-07 08:29:32,125 p=19738 u=dgoodwin | PLAY RECAP ********************************************************************* 2017-04-07 08:29:32,125 p=19738 u=dgoodwin | ded-stage-aws-master-28671 : ok=308 changed=31 unreachable=0 failed=0 2017-04-07 08:29:32,125 p=19738 u=dgoodwin | ded-stage-aws-master-cbc60 : ok=308 changed=31 unreachable=0 failed=0 2017-04-07 08:29:32,125 p=19738 u=dgoodwin | ded-stage-aws-master-ec552 : ok=447 changed=66 unreachable=0 failed=0 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-compute-037a7 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-compute-a39c2 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-compute-bfd99 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-compute-f5ad4 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-infra-01651 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-infra-95fd7 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | localhost : ok=35 changed=0 unreachable=0 failed=0 Expected results: Nodes should not be touched during a control plane upgrade, and should not require access to the latest excluder rpms. Additional info: