During and install of 3.11 latest, the task 'openshift_control_plane : Report control plane errors' fails as the control plane pods did not come up[1]. The openshift-sdn namespace does not exist and we see the node service fails to come up due to CNI issues[2]. Looking in the api/controller logs(sosreport-ip-10-31-217-33-02416046-2019-06-28-jbriqyw/sos_commands/origin/*), we do not really see much indicating an unhealthy controller or api. Version: atomic-openshift-3.11.98-1 Steps to reproduce: [ ] Run installer https://docs.openshift.com/container-platform/3.11/install/running_install.html#running-the-advanced-installation-rpm [ ] Playbook fails in task 'openshift_control_plane : Report control plane errors' Expected Results: [ ] Cluster installed with healthy control plane. https://github.com/openshift/openshift-ansible/blob/release-3.11/roles/openshift_control_plane/tasks/main.yml#L268 ~~~~ - name: Wait for control plane pods to appear oc_obj: state: list kind: pod name: "master-{{ item }}-{{ l_kubelet_node_name | lower }}" namespace: kube-system register: control_plane_pods until: - control_plane_pods.module_results is defined - control_plane_pods.module_results.results is defined - control_plane_pods.module_results.results | length > 0 retries: 60 delay: 5 with_items: - "{{ 'etcd' if inventory_hostname in groups['oo_etcd_to_config'] else omit }}" - api - controllers ignore_errors: true - when: control_plane_pods is failed block: - name: Check status in the kube-system namespace command: > {{ openshift_client_binary }} status --config={{ openshift.common.config_base }}/master/admin.kubeconfig -n kube-system register: control_plane_status ignore_errors: true - debug: msg: "{{ control_plane_status.stdout_lines }}" - name: Get pods in the kube-system namespace command: > {{ openshift_client_binary }} get pods --config={{ openshift.common.config_base }}/master/admin.kubeconfig -n kube-system -o wide register: control_plane_pods_list ignore_errors: true - debug: msg: "{{ control_plane_pods_list.stdout_lines }}" - name: Get events in the kube-system namespace command: > {{ openshift_client_binary }} get events --config={{ openshift.common.config_base }}/master/admin.kubeconfig -n kube-system register: control_plane_events ignore_errors: true - debug: msg: "{{ control_plane_events.stdout_lines }}" - name: Get node logs command: journalctl --no-pager -n 300 -u {{ openshift_service_type }}-node register: logs_node ignore_errors: true - debug: msg: "{{ logs_node.stdout_lines }}" - name: Report control plane errors fail: msg: Control plane pods didn't come up ~~~~ =============================================================== [1] ~~~~ TASK [openshift_control_plane : Report control plane errors] ******************* task path: /usr/share/ansible/openshift-ansible/roles/openshift_control_plane/tasks/main.yml:256 fatal: [ip-10-31-217-33.ec2.internal]: FAILED! => { "changed": false, "msg": "Control plane pods didn't come up" } fatal: [ip-10-31-217-81.ec2.internal]: FAILED! => { "changed": false, "msg": "Control plane pods didn't come up" } fatal: [ip-10-31-217-145.ec2.internal]: FAILED! => { "changed": false, "msg": "Control plane pods didn't come up" } ..8< PLAY RECAP ********************************************************************* ip-10-31-217-100.ec2.internal : ok=116 changed=63 unreachable=0 failed=0 ip-10-31-217-145.ec2.internal : ok=284 changed=148 unreachable=0 failed=1 ip-10-31-217-185.ec2.internal : ok=116 changed=63 unreachable=0 failed=0 ip-10-31-217-32.ec2.internal : ok=116 changed=63 unreachable=0 failed=0 ip-10-31-217-33.ec2.internal : ok=343 changed=165 unreachable=0 failed=1 ip-10-31-217-56.ec2.internal : ok=116 changed=63 unreachable=0 failed=0 ip-10-31-217-81.ec2.internal : ok=284 changed=148 unreachable=0 failed=1 ip-10-31-217-96.ec2.internal : ok=116 changed=63 unreachable=0 failed=0 localhost : ok=11 changed=0 unreachable=0 failed=0 INSTALLER STATUS *************************************************************** Initialization : Complete (0:05:54) Health Check : Complete (0:01:18) Node Bootstrap Preparation : Complete (0:23:05) etcd Install : Complete (0:04:37) Master Install : In Progress (0:14:36) This phase can be restarted by running: playbooks/openshift-master/config.yml Failure summary: 1. Hosts: ip-10-31-217-145.ec2.internal, ip-10-31-217-33.ec2.internal, ip-10-31-217-81.ec2.internal Play: Configure masters Task: Report control plane errors Message: Control plane pods didn't come up ~~~~ [2] Jun 28 17:08:11 ip-10-31-217-33.ec2.internal atomic-openshift-node[20765]: E0628 17:08:11.624084 20765 kubelet.go:2101] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Can not reproduce on openshift-ansible-3.11.98-1.git.0.3cfa7c3.el7.noarch.rpm. The installation succeed. ... PLAY [Disable excluders and gather facts] ************************************** PLAY [Create OpenShift certificates for master hosts] ************************** PLAY [Generate or retrieve existing session secrets] *************************** PLAY [Configure masters] ******************************************************* PLAY [Deploy the central bootstrap configuration] ****************************** PLAY [Ensure inventory labels are assigned to masters] ************************* ... And task [openshift_control_plane : Report control plane errors] were skipped without errors.
Verified on openshift-ansible-3.11.154-1.git.0.7a11cbe.el7.noarch.rpm Installation succeed with task [openshift_control_plane : Report control plane errors] were skipped without errors.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3817