Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1596772

Summary: openshift-ansible installer error "Control plane pods didn't come up"
Product: OpenShift Container Platform Reporter: Matt Bruzek <mbruzek>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED DUPLICATE QA Contact: Johnny Liu <jialiu>
Severity: high Docs Contact:
Priority: high    
Version: 3.10.0CC: aos-bugs, jokerman, mmccomas, somalley
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-29 18:55:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The journalctl -u atomic-openshift-node output. none

Description Matt Bruzek 2018-06-29 17:12:38 UTC
Description of problem:

When running the openshift-ansible installer on OpenStack I see the following error message: "Control plane pods didn't come up"

Version-Release number of the following components:
$ git describe
v3.10.0-rc.0-107-ga5effbd
$ git status
# On branch release-3.10
ansible --version
ansible 2.4.3.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jun 12 2018, 10:42:23) [GCC 4.8.5 20150623 (Red Hat 4.8.5-34)]

How reproducible: I can reproduce this often.

Steps to Reproduce:
1. Install OpenStack
2. Install OpenShift on OpenStack https://github.com/openshift/openshift-ansible/tree/master/playbooks/openstack
3. Notice the error after running the install playbook:

source /home/cloud-user/keystonerc; ansible-playbook -vvv --user openshift -i inventory -i openshift-ansible/playbooks/openstack/inventory.py openshift-ansible/playbooks/openstack/openshift-cluster/install.yml

Actual results:

TASK [openshift_control_plane : Report control plane errors] *******************
task path: /home/cloud-user/openshift-ansible/roles/openshift_control_plane/tasks/main.yml:252
Friday 29 June 2018  11:00:19 -0400 (0:00:00.115)       0:20:20.927 *********** 
fatal: [master-0.scale-ci.example.com]: FAILED! => {
    "changed": false, 
    "msg": "Control plane pods didn't come up"
}
fatal: [master-2.scale-ci.example.com]: FAILED! => {
    "changed": false, 
    "msg": "Control plane pods didn't come up"
}
fatal: [master-1.scale-ci.example.com]: FAILED! => {
    "changed": false, 
    "msg": "Control plane pods didn't come up"
}

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
app-node-0.scale-ci.example.com : ok=161  changed=60   unreachable=0    failed=0   
app-node-1.scale-ci.example.com : ok=161  changed=60   unreachable=0    failed=0   
cns-0.scale-ci.example.com : ok=161  changed=60   unreachable=0    failed=0   
cns-1.scale-ci.example.com : ok=161  changed=60   unreachable=0    failed=0   
cns-2.scale-ci.example.com : ok=161  changed=60   unreachable=0    failed=0   
infra-node-0.scale-ci.example.com : ok=161  changed=60   unreachable=0    failed=0   
infra-node-1.scale-ci.example.com : ok=161  changed=60   unreachable=0    failed=0   
infra-node-2.scale-ci.example.com : ok=161  changed=60   unreachable=0    failed=0   
lb-0.scale-ci.example.com  : ok=86   changed=21   unreachable=0    failed=0   
localhost                  : ok=30   changed=0    unreachable=0    failed=0   
master-0.scale-ci.example.com : ok=365  changed=147  unreachable=0    failed=1   
master-1.scale-ci.example.com : ok=311  changed=132  unreachable=0    failed=1   
master-2.scale-ci.example.com : ok=311  changed=132  unreachable=0    failed=1   


INSTALLER STATUS ***************************************************************
Initialization              : Complete (0:00:19)
	[DEPRECATION WARNING]: The following are deprecated variables and will be no longer be used in the next minor release. Please update your inventory accordingly.
	openshift_node_kubelet_args
Health Check                : Complete (0:00:13)
Node Bootstrap Preparation  : Complete (0:01:20)
etcd Install                : Complete (0:00:22)
Load Balancer Install       : Complete (0:00:07)
Master Install              : In Progress (0:15:43)
	This phase can be restarted by running: playbooks/openshift-master/config.yml
Friday 29 June 2018  11:00:19 -0400 (0:00:00.072)       0:20:20.999 *********** 
=============================================================================== 
openshift_control_plane : Wait for control plane pods to appear ------- 880.13s
/home/cloud-user/openshift-ansible/roles/openshift_control_plane/tasks/main.yml:204 
Ensure openshift-ansible installer package deps are installed ---------- 58.89s
/home/cloud-user/openshift-ansible/playbooks/init/base_packages.yml:31 --------
openshift_node : install needed rpm(s) --------------------------------- 17.48s
/home/cloud-user/openshift-ansible/roles/openshift_node/tasks/install_rpms.yml:2 
container_runtime : Fixup SELinux permissions for docker --------------- 13.64s
/home/cloud-user/openshift-ansible/roles/container_runtime/tasks/package_docker.yml:159 
Run health checks (install) - EL --------------------------------------- 12.53s
/home/cloud-user/openshift-ansible/playbooks/openshift-checks/private/install.yml:24 
openshift_openstack : Install required packages ------------------------- 5.26s
/home/cloud-user/openshift-ansible/roles/openshift_openstack/tasks/node-packages.yml:4 
openshift_node : Install node, clients, and conntrack packages ---------- 4.86s
/home/cloud-user/openshift-ansible/roles/openshift_node/tasks/install.yml:2 ---
Run variable sanity checks ---------------------------------------------- 4.24s
/home/cloud-user/openshift-ansible/playbooks/init/sanity_checks.yml:14 --------
Run variable sanity checks ---------------------------------------------- 3.83s
/home/cloud-user/openshift-ansible/playbooks/init/sanity_checks.yml:14 --------
container_runtime : restart container runtime --------------------------- 3.56s
/home/cloud-user/openshift-ansible/roles/container_runtime/handlers/main.yml:3 
openshift_master_certificates : copy ------------------------------------ 3.32s
/home/cloud-user/openshift-ansible/roles/openshift_master_certificates/tasks/main.yml:91 
os_firewall : Start and enable iptables service ------------------------- 2.93s
/home/cloud-user/openshift-ansible/roles/os_firewall/tasks/iptables.yml:30 ----
Gather Cluster facts ---------------------------------------------------- 2.71s
/home/cloud-user/openshift-ansible/playbooks/init/cluster_facts.yml:27 --------
openshift_node : Update journald setup ---------------------------------- 2.35s
/home/cloud-user/openshift-ansible/roles/openshift_node/tasks/journald.yml:11 -
Gather Cluster facts ---------------------------------------------------- 2.21s
/home/cloud-user/openshift-ansible/playbooks/init/cluster_facts.yml:27 --------
openshift_cli : Install clients ----------------------------------------- 2.16s
/home/cloud-user/openshift-ansible/roles/openshift_cli/tasks/main.yml:2 -------
openshift_node : openshift_facts ---------------------------------------- 2.15s
/home/cloud-user/openshift-ansible/roles/openshift_node/tasks/configure-proxy-settings.yml:2 
openshift_ca : Install the base package for admin tooling --------------- 2.09s
/home/cloud-user/openshift-ansible/roles/openshift_ca/tasks/main.yml:6 --------
openshift_node : openshift_facts ---------------------------------------- 2.08s
/home/cloud-user/openshift-ansible/roles/openshift_node/tasks/configure-proxy-settings.yml:2 
get openshift_current_version ------------------------------------------- 2.06s
/home/cloud-user/openshift-ansible/playbooks/init/cluster_facts.yml:10 --------


Failure summary:


  1. Hosts:    master-0.scale-ci.example.com, master-1.scale-ci.example.com, master-2.scale-ci.example.com
     Play:     Configure masters
     Task:     Report control plane errors
     Message:  Control plane pods didn't come up

Expected results:

This OpenShift install is part of automation that has worked on 3.10 in the recent past. I expected this latest build to work.

Additional info:
Will attach the log file and config

Comment 2 Matt Bruzek 2018-06-29 17:23:50 UTC
[openshift@master-2 ~]$ systemctl status atomic-openshift-node -l
● atomic-openshift-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-06-29 10:45:37 EDT; 2h 36min ago
     Docs: https://github.com/openshift/origin
 Main PID: 11321 (hyperkube)
   Memory: 201.4M
   CGroup: /system.slice/atomic-openshift-node.service
           └─11321 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-cache-authorized-ttl=5m --authorization-webhook-cache-unauthorized-ttl=5m --bootstrap-kubeconfig=/etc/origin/node/bootstrap.kubeconfig --cadvisor-port=0 --cert-dir=/etc/origin/node/certificates --cgroup-driver=systemd --client-ca-file=/etc/origin/node/client-ca.crt --cluster-dns=192.168.0.4 --cluster-domain=cluster.local --container-runtime-endpoint=/var/run/dockershim.sock --containerized=false --enable-controller-attach-detach=true --experimental-dockershim-root-directory=/var/lib/dockershim --fail-swap-on=false --feature-gates=RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true --file-check-frequency=0s --healthz-bind-address= --healthz-port=0 --host-ipc-sources=api --host-ipc-sources=file --host-network-sources=api --host-network-sources=file --host-pid-sources=api --host-pid-sources=file --hostname-override= --http-check-frequency=0s --image-service-endpoint=/var/run/dockershim.sock --iptables-masquerade-bit=0 --kubeconfig=/etc/origin/node/node.kubeconfig --max-pods=250 --network-plugin=cni --node-ip= --pod-infra-container-image=registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.10 --pod-manifest-path=/etc/origin/node/pods --port=10250 --read-only-port=0 --register-node=true --root-dir=/var/lib/origin/openshift.local.volumes --rotate-certificates=true --tls-cert-file= --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA --tls-cipher-suites=TLS_RSA_WITH_AES_128_GCM_SHA256 --tls-cipher-suites=TLS_RSA_WITH_AES_256_GCM_SHA384 --tls-cipher-suites=TLS_RSA_WITH_AES_128_CBC_SHA --tls-cipher-suites=TLS_RSA_WITH_AES_256_CBC_SHA --tls-min-version=VersionTLS12 --tls-private-key-file=

Jun 29 13:22:07 master-2.scale-ci.example.com atomic-openshift-node[11321]: W0629 13:22:07.275065   11321 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 29 13:22:07 master-2.scale-ci.example.com atomic-openshift-node[11321]: E0629 13:22:07.276323   11321 kubelet.go:2147] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Jun 29 13:22:12 master-2.scale-ci.example.com atomic-openshift-node[11321]: W0629 13:22:12.277611   11321 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 29 13:22:12 master-2.scale-ci.example.com atomic-openshift-node[11321]: E0629 13:22:12.277691   11321 kubelet.go:2147] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Jun 29 13:22:17 master-2.scale-ci.example.com atomic-openshift-node[11321]: W0629 13:22:17.278752   11321 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 29 13:22:17 master-2.scale-ci.example.com atomic-openshift-node[11321]: E0629 13:22:17.278822   11321 kubelet.go:2147] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Jun 29 13:22:22 master-2.scale-ci.example.com atomic-openshift-node[11321]: W0629 13:22:22.280175   11321 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 29 13:22:22 master-2.scale-ci.example.com atomic-openshift-node[11321]: E0629 13:22:22.280238   11321 kubelet.go:2147] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Jun 29 13:22:27 master-2.scale-ci.example.com atomic-openshift-node[11321]: W0629 13:22:27.282202   11321 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 29 13:22:27 master-2.scale-ci.example.com atomic-openshift-node[11321]: E0629 13:22:27.282356   11321 kubelet.go:2147] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Comment 3 Matt Bruzek 2018-06-29 17:33:32 UTC
Created attachment 1455538 [details]
The journalctl -u atomic-openshift-node output.

Comment 4 Sally 2018-06-29 18:00:43 UTC
I also hit this bug.  I tried 'updating?' to openshift-ansible branch 'release-3.10' and hit this every time.. when I pin to openshift-ansible-3.10.0-0.41.0 I don't see it.

Comment 5 Scott Dodson 2018-06-29 18:55:05 UTC
The etcd static pod failed to start and that prevents everything else from starting. The etcd static pod failed because the image it referenced didn't exist in registry.reg-aws.openshift.com but that has since been corrected.

Jun 29 10:45:42 master-2.scale-ci.example.com atomic-openshift-node[11321]: E0629 10:45:42.285661   11321 remote_image.go:108] PullImage "registry.reg-aws.openshift.com:443/rhel7/etcd:3.2.22" from image service failed: rpc error: code = Unknown desc = Error: image rhel7/etcd:3.2.22 not found

*** This bug has been marked as a duplicate of bug 1596635 ***