Description of problem: Install failed at TASK [openshift_control_plane : Ensure that Cluster Monitoring Operator has nodes to run on], which should not happen [root@wmen1gr311-master-etcd-1 ~]# /usr/bin/oc get node --selector=node-role.kubernetes.io/infra=true NAME STATUS ROLES AGE VERSION wmen1gr311-node-infra-1 Ready infra 14m v1.11.0+d4cacc0 wmen1gr311-node-infra-2 Ready infra 14m v1.11.0+d4cacc0 Version-Release number of the following components: openshift-ansible-3.11.1 How reproducible: 70% on openstack Steps to Reproduce: 1. Install OCP v3.11 HA on openstack Actual results: Install failed TASK [openshift_control_plane : Retrieve list of schedulable nodes matching selector] *** Wednesday 12 September 2018 20:45:12 +0800 (0:00:00.162) 0:22:29.935 *** ok: [host-8-252-232.host.centralci.eng.rdu2.redhat.com] => {"changed": false, "results": {"cmd": "/usr/bin/oc get node --selector=node-role.kubernetes.io/infra=true --field-selector=spec.unschedulable!=true -o json -n default", "results": [{"apiVersion": "v1", "items": [], "kind": "List", "metadata": {"resourceVersion": "", "selfLink": ""}}], "returncode": 0}, "state": "list"} TASK [openshift_control_plane : Ensure that Cluster Monitoring Operator has nodes to run on] *** Wednesday 12 September 2018 20:45:13 +0800 (0:00:00.477) 0:22:30.413 *** fatal: [host-8-252-232.host.centralci.eng.rdu2.redhat.com]: FAILED! => { "assertion": false, "changed": false, "evaluated_to": false, "msg": "No schedulable nodes found matching node selector for Cluster Monitoring Operator - 'node-role.kubernetes.io/infra=true'" } Expected results: Install succeeds Additional info: Please attach logs from ansible-playbook with the -vvv flag
Same reason as in #1609019 - sync DS restarts infra nodes to apply node labels, but openshift-ansible proceeds, finds no infra nodes and fails PR https://github.com/openshift/openshift-ansible/pull/9983 would resolve that, so waiting for Clayton to approve the solution there
*** Bug 1628357 has been marked as a duplicate of this bug. ***
https://github.com/openshift/openshift-ansible/pull/10039 release-3.11 pick
In openshift-ansible-3.11.2-1
Fixed. openshift-ansible-3.11.4-1.git.0.d727082.el7_5.noarch Kernel Version: 3.10.0-862.11.6.el7.x86_64 Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
I just hit this even after manually building the openshift ansible docker image on the release-3.11 branch. Heres my ansible run output in a gist: https://gist.github.com/chancez/064ed08c5016513e5c4fd67edb43c6bc You can see in the output it gets to the tasks added in https://github.com/openshift/openshift-ansible/pull/10039 eg: "Wait for sync DS to set annotations on all nodes"
(In reply to Chance Zibolski from comment #8) > I just hit this even after manually building the openshift ansible docker > image on the release-3.11 branch. > > Heres my ansible run output in a gist: > https://gist.github.com/chancez/064ed08c5016513e5c4fd67edb43c6bc > > You can see in the output it gets to the tasks added in > https://github.com/openshift/openshift-ansible/pull/10039 eg: "Wait for sync > DS to set annotations on all nodes" make sure you have nodes labeled with node-role.kubernetes.io/infra=true, the error indicates you don't have nodes labeled with node-role.kubernetes.io/infra=true. you can check labels via # oc get node --show-labels | grep node-role.kubernetes.io/infra=true If you want to use other nodeSelector, you can set with openshift_cluster_monitoring_operator_node_selector parameter eg: openshift_cluster_monitoring_operator_node_selector={'role': 'node'}
(In reply to Junqi Zhao from comment #9) > (In reply to Chance Zibolski from comment #8) > > I just hit this even after manually building the openshift ansible docker > > image on the release-3.11 branch. > > > > Heres my ansible run output in a gist: > > https://gist.github.com/chancez/064ed08c5016513e5c4fd67edb43c6bc > > > > You can see in the output it gets to the tasks added in > > https://github.com/openshift/openshift-ansible/pull/10039 eg: "Wait for sync > > DS to set annotations on all nodes" > > make sure you have nodes labeled with node-role.kubernetes.io/infra=true, > the error indicates you don't have nodes labeled with > node-role.kubernetes.io/infra=true. you can check labels via > # oc get node --show-labels | grep node-role.kubernetes.io/infra=true > > If you want to use other nodeSelector, you can set with > openshift_cluster_monitoring_operator_node_selector parameter > eg: > openshift_cluster_monitoring_operator_node_selector={'role': 'node'} Sorry, checked again, you have node labeled with node-role.kubernetes.io/infra=true, it seems the issue reproduced again
Tested today, I did not find this issue in my env, please make sure you have nodes labeled with node-role.kubernetes.io/infra=true during the installation progress. openshift-ansible version: openshift-ansible-3.11.21-1.git.0.7dc17ca.el7.noarch
I used the GCP playbook used by origin CI https://github.com/openshift/release/tree/master/cluster/test-deploy which automatically creates the VMs such that they should have the correct node labels.
Well I confirmed they're not getting labeled, and I think i see why. It's config related.
Closing bugs that were verified and targeted for GA but for some reason were not picked up by errata. This bug fix should be present in current 3.11 release content.