Description of problem: It appears that the control plane upgrade will attempt to perform excluder tasks on all nodes in the cluster. If those nodes do not have the new repo enabled (in this case for 3.5), they fail and while the masters are then upgraded relatively as expected, the overall ansible operation reports a failure much later due to the problems on those nodes. This is a problem particularly for blue green upgrades, old nodes should probably not be modifying their repos to access 3.5 (the new version), as they will not be using it. I suspect if the repos were available it would also try to update to 3.5 packages. Version-Release number of selected component (if applicable): openshift-ansible 3.5.48 How reproducible: I believe 100%. Steps to Reproduce: 1. Ensure only masters have 3.5 repos enabled, nodes should not. (as they will be replaced by new nodes) 2. Run control plane upgrade. Actual results: 2017-04-07 08:07:25,582 p=19738 u=dgoodwin | TASK [openshift_excluder : Evalute if docker excluder is to be enabled] ******** 2017-04-07 08:07:25,610 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-ec552] 2017-04-07 08:07:25,621 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-cbc60] 2017-04-07 08:07:25,643 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-28671] 2017-04-07 08:07:25,655 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-a39c2] 2017-04-07 08:07:25,667 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-f5ad4] 2017-04-07 08:07:25,667 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-037a7] 2017-04-07 08:07:25,667 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-bfd99] 2017-04-07 08:07:25,676 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-95fd7] 2017-04-07 08:07:25,687 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-01651] 2017-04-07 08:07:25,692 p=19738 u=dgoodwin | TASK [openshift_excluder : debug] ********************************************** 2017-04-07 08:07:25,720 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-ec552] => { "docker_excluder_on": true } 2017-04-07 08:07:25,735 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-cbc60] => { "docker_excluder_on": true } 2017-04-07 08:07:25,743 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-bfd99] => { "docker_excluder_on": true } 2017-04-07 08:07:25,753 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-28671] => { "docker_excluder_on": true } 2017-04-07 08:07:25,764 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-f5ad4] => { "docker_excluder_on": true } 2017-04-07 08:07:25,776 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-a39c2] => { "docker_excluder_on": true } 2017-04-07 08:07:25,777 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-037a7] => { "docker_excluder_on": true } 2017-04-07 08:07:25,779 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-95fd7] => { "docker_excluder_on": true } 2017-04-07 08:07:25,794 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-01651] => { "docker_excluder_on": true } 2017-04-07 08:07:25,799 p=19738 u=dgoodwin | TASK [openshift_excluder : Evalute if openshift excluder is to be enabled] ***** 2017-04-07 08:07:25,827 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-ec552] 2017-04-07 08:07:25,838 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-cbc60] 2017-04-07 08:07:25,859 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-28671] 2017-04-07 08:07:25,870 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-a39c2] 2017-04-07 08:07:25,882 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-f5ad4] 2017-04-07 08:07:25,883 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-bfd99] 2017-04-07 08:07:25,884 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-037a7] 2017-04-07 08:07:25,891 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-95fd7] 2017-04-07 08:07:25,903 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-01651] 2017-04-07 08:07:25,907 p=19738 u=dgoodwin | TASK [openshift_excluder : debug] ********************************************** 2017-04-07 08:07:25,943 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-cbc60] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,955 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-ec552] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,964 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-a39c2] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,973 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-bfd99] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,985 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-037a7] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,985 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-compute-f5ad4] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,986 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-28671] => { "openshift_excluder_on": true } 2017-04-07 08:07:25,993 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-95fd7] => { "openshift_excluder_on": true } 2017-04-07 08:07:26,001 p=19738 u=dgoodwin | ok: [ded-stage-aws-node-infra-01651] => { "openshift_excluder_on": true } 2017-04-07 08:07:26,005 p=19738 u=dgoodwin | TASK [openshift_excluder : Install docker excluder] **************************** 2017-04-07 08:07:35,580 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-compute-f5ad4]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:35,670 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-compute-bfd99]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:35,773 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-compute-037a7]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:35,889 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-compute-a39c2]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:36,149 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-infra-01651]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:36,478 p=19738 u=dgoodwin | fatal: [ded-stage-aws-node-infra-95fd7]: FAILED! => {"changed": false, "failed": true, "msg": "No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated", "rc": 126, "results": ["No package matching 'atomic-openshift-docker-excluder-3.5.5.3*' found available, installed or updated"]} 2017-04-07 08:07:44,225 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-ec552] 2017-04-07 08:07:45,199 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-cbc60] 2017-04-07 08:07:45,491 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-28671] 2017-04-07 08:07:45,496 p=19738 u=dgoodwin | TASK [openshift_excluder : Install openshift excluder] ************************* 2017-04-07 08:08:06,447 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-ec552] 2017-04-07 08:08:06,549 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-28671] 2017-04-07 08:08:06,553 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-cbc60] 2017-04-07 08:08:06,559 p=19738 u=dgoodwin | TASK [openshift_excluder : Check for docker-excluder] ************************** 2017-04-07 08:08:06,730 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-cbc60] 2017-04-07 08:08:06,738 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-ec552] 2017-04-07 08:08:06,742 p=19738 u=dgoodwin | ok: [ded-stage-aws-master-28671] 2017-04-07 08:08:06,747 p=19738 u=dgoodwin | TASK [openshift_excluder : Enable docker excluder] ***************************** 2017-04-07 08:08:06,944 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-cbc60] 2017-04-07 08:08:06,951 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-28671] 2017-04-07 08:08:06,962 p=19738 u=dgoodwin | changed: [ded-stage-aws-master-ec552] Much later the end result is: 2017-04-07 08:29:32,125 p=19738 u=dgoodwin | PLAY RECAP ********************************************************************* 2017-04-07 08:29:32,125 p=19738 u=dgoodwin | ded-stage-aws-master-28671 : ok=308 changed=31 unreachable=0 failed=0 2017-04-07 08:29:32,125 p=19738 u=dgoodwin | ded-stage-aws-master-cbc60 : ok=308 changed=31 unreachable=0 failed=0 2017-04-07 08:29:32,125 p=19738 u=dgoodwin | ded-stage-aws-master-ec552 : ok=447 changed=66 unreachable=0 failed=0 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-compute-037a7 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-compute-a39c2 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-compute-bfd99 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-compute-f5ad4 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-infra-01651 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | ded-stage-aws-node-infra-95fd7 : ok=48 changed=2 unreachable=0 failed=1 2017-04-07 08:29:32,126 p=19738 u=dgoodwin | localhost : ok=35 changed=0 unreachable=0 failed=0 Expected results: Nodes should not be touched during a control plane upgrade, and should not require access to the latest excluder rpms. Additional info:
If I enable a 3.5 repo on these old nodes and then re-try, the excluder is upgraded to 3.5, but no other packages are affected. However the excluder is then *disabled* even on the old nodes, which should not be getting upgraded: [root@ded-stage-aws-node-compute-a39c2 ~]# atomic-openshift-excluder status unexclude -- At least one package not excluded Control plane upgrade now succeeds. This may be an acceptable workaround for now provided use of the 3.5 excluder will not cause problems on a 3.4 system. However long term: - old nodes should not need access to new openshift repos for a control plane upgrade - old nodes should not have rpms updated during a control plane upgrade - old nodes should not get their excluder disabled during a control plane upgrade
Upstream PR: https://github.com/openshift/openshift-ansible/pull/3879, possible fix.
With https://github.com/openshift/openshift-ansible/pull/4321 merged, control plane upgrade and nodes upgrade have separate pre-verification tasks now.
Version: atomic-openshift-utils-3.5.82-1.git.0.e3e25f6.el7.noarch Step: 1. install ocp3.4(one master/node + one node) 2. ensure atomic-openshift-excluder and atomic-openshift-docker-excluder installed and enabled on all hosts 3. only enable 3.5 repo on master run upgrade_control_plane.yml to upgrade masters first # ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade_control_plane.yml Result: Upgrade master succeed with no failure. Excluders are upgraded only on master host.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1666