Description of problem: Installation/startup in AWS fails, network plugin is not ready, cni config uninitialized Pods are failing to start with the following messages in /var/log/messages: Aug 7 11:29:27 AWGMEUOM01 atomic-openshift-node: W0807 11:29:27.047991 32375 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d Aug 7 11:29:27 AWGMEUOM01 atomic-openshift-node: E0807 11:29:27.048134 32375 kubelet.go:2147] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized Version-Release number of the following components: Customer tried with: $ rpm -qa | grep -i -e ansible -e atomic openshift-ansible-roles-3.10.21-1.git.0.6446011.el7.noarch ansible-2.6.2-1.el7.noarch openshift-ansible-playbooks-3.10.21-1.git.0.6446011.el7.noarch openshift-ansible-3.10.21-1.git.0.6446011.el7.noarch openshift-ansible-docs-3.10.21-1.git.0.6446011.el7.noarch I advised customer use ansible 2.4 rpms instead, they downgraded the packages and tried again but same issue. Actual results: FAILED - RETRYING: Wait for control plane pods to appear (18 retries left).Result was: { . . . "msg": { "cmd": "/bin/oc get pod master-etcd-awgmeuom02 -o json -n kube-system", "results": [ {} ], "returncode": 1, "stderr": "Unable to connect to the server: EOF\n", "stdout": "" }, I'll upload full ansible logs to the bz. NOTES: This issue seems similar to https://github.com/openshift/openshift-ansible/issues/7967 and https://bugzilla.redhat.com/show_bug.cgi?id=1592010 However I'm not certain its the same issue because it is a different version and I dont see all the same messages, like "Unable to connect to the server"
Steven, We need to get logs from the static pods and the complete journal from the node service on all masters. `journalctl --no-pager > node.log` `master-logs etcd etcd &> etcd.log` `master-logs api api &> api.log` `master-logs controllers controllers &> controllers.log` The static pods for the API should come up before CNI and SDN are initialized and the node is marked ready. There should be no need to install atomic-openshift-sdn-ovs in 3.10, this is all handled via a daemonset that's provisioned after the API bootstraps.
Hi, We previously requested to check if pods were around, like: oc get pods -n kubesystem But customer was not able to get output for these due to master not responding. Do we expect that command to respond if we ask for it in a different namespace? Or else how should we check for these logs? Sorry, the "services running as pods" thing is still a bit new to me. Is "master-logs" a command or shorthand for getting journalctl output? I dont see it as an option in my 3.10 cluster so I assume the latter
Nevermind, I see "master-logs" in /usr/local/bin, I'll have the customer grab those
*** Bug 1615754 has been marked as a duplicate of this bug. ***
*** Bug 1613348 has been marked as a duplicate of this bug. ***
QE also hit some similar issue as this bug, refer to scenario #1 in https://bugzilla.redhat.com/show_bug.cgi?id=1629726#c2.
Getting similar on bare metal environment. Any updates?
This should be addressed via https://github.com/openshift/openshift-ansible/pull/10356 on release-3.11.
According to dev's proposed verification path. https://gist.github.com/michaelgugino/c961476d8be7d160a5e53fe9a9734051 For 3.11 fresh install, for testing scenarios #4, also need similar backport like what is done in 3.10 https://github.com/openshift/openshift-ansible/pull/10409
PR created for 3.11: https://github.com/openshift/openshift-ansible/pull/10447
3.11 merged.
Verified this bug with openshift-ansible-3.11.38-1.git.0.d146f83.el7.noarch, and PASS. Scenario #1: Try to install a new 3.11 cluster with openshift_kubelet_name_override set. Installs should fail. ############ ANSIBLE RUN: playbooks/prerequisites.yml ############ PLAY [Fail openshift_kubelet_name_override for new hosts] ********************** TASK [Gathering Facts] ********************************************************* Monday 05 November 2018 14:33:22 +0800 (0:00:00.111) 0:00:00.111 ******* ok: [qe-jialiu312-master-etcd-1.1105-0gs.qe.rhcloud.com] ok: [qe-jialiu312-node-1.1105-0gs.qe.rhcloud.com] ok: [qe-jialiu312-node-registry-router-1.1105-0gs.qe.rhcloud.com] TASK [Fail when openshift_kubelet_name_override is defined] ******************** Monday 05 November 2018 14:33:23 +0800 (0:00:01.097) 0:00:01.209 ******* fatal: [qe-jialiu312-master-etcd-1.1105-0gs.qe.rhcloud.com]: FAILED! => {"changed": false, "msg": "openshift_kubelet_name_override Cannot be defined for new hosts"} fatal: [qe-jialiu312-node-registry-router-1.1105-0gs.qe.rhcloud.com]: FAILED! => {"changed": false, "msg": "openshift_kubelet_name_override Cannot be defined for new hosts"} fatal: [qe-jialiu312-node-1.1105-0gs.qe.rhcloud.com]: FAILED! => {"changed": false, "msg": "openshift_kubelet_name_override Cannot be defined for new hosts"} to retry, use: --limit @/home/slave3/workspace/Launch Environment Flexy Wrapper/private-openshift-ansible/playbooks/prerequisites.retry Scenario #2: cluster install on OSP (snvl2) without cloudprovider enabled + short hostname, PASS. Scenario #3: cluster install on OSP (snvl2) with cloudprovider enabled + short hostname, PASS.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3537