test: operator install console is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=operator+install+console FIXME: Replace this paragraph with a particular job URI from the search results to ground discussion. A given test may fail for several reasons, and this bug should be scoped to one of those reasons. Ideally you'd pick a job showing the most-common reason, but since that's hard to determine, you may also chose to pick a job at random. Release-gating jobs (release-openshift-...) should be preferred over presubmits (pull-ci-...) because they are closer to the released product and less likely to have in-flight code changes that complicate analysis. FIXME: Provide a snippet of the test failure or error from the job log
Two FIXMEs still in comment 0. The bug was reported against 4.7, but I see no 4.7 OCP release jobs failing with this (although there are some 4.7 OKD jobs failing with this): $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=operator+install+console&maxAge=168h&type=junit&groupBy=job' | grep 'failures match' | sort periodic-ci-kube-reporting-metering-operator-master-metering-periodic-aws - 7 runs, 100% failed, 14% of failures match periodic-ci-kube-reporting-metering-operator-release-4.7-metering-periodic-aws - 7 runs, 100% failed, 14% of failures match periodic-ci-kubernetes-conformance-k8s - 7 runs, 29% failed, 100% of failures match ... promote-release-openshift-machine-os-content-e2e-aws-4.5 - 435 runs, 6% failed, 4% of failures match promote-release-openshift-machine-os-content-e2e-aws-4.6 - 445 runs, 7% failed, 6% of failures match promote-release-openshift-okd-machine-os-content-e2e-gcp-4.6 - 82 runs, 51% failed, 64% of failures match ... release-openshift-ocp-installer-e2e-aws-4.4 - 12 runs, 17% failed, 50% of failures match release-openshift-ocp-installer-e2e-aws-csi-4.5 - 7 runs, 71% failed, 20% of failures match release-openshift-ocp-installer-e2e-aws-mirrors-4.4 - 7 runs, 57% failed, 25% of failures match release-openshift-ocp-installer-e2e-aws-ovn-4.6 - 27 runs, 78% failed, 14% of failures match release-openshift-ocp-installer-e2e-aws-serial-4.1 - 7 runs, 14% failed, 100% of failures match release-openshift-ocp-installer-e2e-aws-serial-4.4 - 18 runs, 61% failed, 9% of failures match release-openshift-ocp-installer-e2e-aws-upi-4.5 - 20 runs, 40% failed, 13% of failures match release-openshift-ocp-installer-e2e-azure-4.4 - 12 runs, 42% failed, 20% of failures match release-openshift-ocp-installer-e2e-azure-ovn-4.6 - 27 runs, 56% failed, 27% of failures match release-openshift-ocp-installer-e2e-azure-serial-4.6 - 28 runs, 18% failed, 20% of failures match release-openshift-ocp-installer-e2e-gcp-4.6 - 31 runs, 13% failed, 25% of failures match release-openshift-ocp-installer-e2e-gcp-ovn-4.6 - 27 runs, 52% failed, 29% of failures match release-openshift-ocp-installer-e2e-gcp-rt-4.4 - 14 runs, 100% failed, 79% of failures match release-openshift-ocp-installer-e2e-gcp-rt-4.5 - 3 runs, 100% failed, 100% of failures match release-openshift-ocp-installer-e2e-gcp-serial-4.5 - 20 runs, 30% failed, 33% of failures match release-openshift-ocp-installer-e2e-metal-4.6 - 31 runs, 35% failed, 9% of failures match release-openshift-ocp-installer-e2e-metal-serial-4.5 - 20 runs, 35% failed, 43% of failures match release-openshift-ocp-installer-e2e-metal-serial-4.6 - 28 runs, 32% failed, 33% of failures match release-openshift-ocp-installer-e2e-openstack-4.4 - 14 runs, 57% failed, 13% of failures match release-openshift-ocp-installer-e2e-openstack-4.5 - 14 runs, 50% failed, 14% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.4 - 15 runs, 67% failed, 20% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.5 - 14 runs, 50% failed, 43% of failures match release-openshift-ocp-installer-e2e-ovirt-4.5 - 38 runs, 50% failed, 11% of failures match release-openshift-ocp-installer-e2e-ovirt-4.6 - 67 runs, 61% failed, 10% of failures match release-openshift-okd-installer-e2e-aws-4.5 - 16 runs, 25% failed, 25% of failures match release-openshift-okd-installer-e2e-aws-4.6 - 51 runs, 88% failed, 60% of failures match release-openshift-origin-installer-e2e-aws-ovn-network-stress-4.5 - 21 runs, 14% failed, 33% of failures match release-openshift-origin-installer-e2e-aws-sdn-network-stress-4.4 - 21 runs, 10% failed, 50% of failures match release-openshift-origin-installer-e2e-aws-serial-4.6 - 40 runs, 25% failed, 10% of failures match release-openshift-origin-installer-e2e-azure-4.6 - 44 runs, 50% failed, 9% of failures match release-openshift-origin-installer-e2e-azure-shared-vpc-4.5 - 6 runs, 67% failed, 25% of failures match release-openshift-origin-installer-e2e-gcp-4.5 - 25 runs, 16% failed, 25% of failures match release-openshift-origin-installer-e2e-gcp-4.7 - 54 runs, 35% failed, 47% of failures match release-openshift-origin-installer-e2e-gcp-serial-4.4 - 11 runs, 82% failed, 11% of failures match release-openshift-origin-installer-e2e-gcp-shared-vpc-4.4 - 7 runs, 43% failed, 33% of failures match release-openshift-origin-installer-launch-aws - 207 runs, 50% failed, 4% of failures match release-openshift-origin-installer-launch-gcp - 453 runs, 41% failed, 5% of failures match The last 24h have been better: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=operator+install+console&maxAge=24h&type=junit&groupBy=job&name=release-openshift-ocp-' | grep 'failures match' | sort release-openshift-ocp-installer-e2e-aws-ovn-4.6 - 4 runs, 75% failed, 33% of failures match release-openshift-ocp-installer-e2e-aws-upi-4.5 - 6 runs, 33% failed, 50% of failures match release-openshift-ocp-installer-e2e-gcp-ovn-4.6 - 4 runs, 100% failed, 50% of failures match release-openshift-ocp-installer-e2e-gcp-rt-4.4 - 2 runs, 100% failed, 100% of failures match release-openshift-ocp-installer-e2e-gcp-serial-4.5 - 6 runs, 50% failed, 67% of failures match release-openshift-ocp-installer-e2e-metal-serial-4.5 - 6 runs, 33% failed, 50% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.5 - 2 runs, 100% failed, 50% of failures match release-openshift-ocp-installer-e2e-ovirt-4.5 - 7 runs, 29% failed, 50% of failures match release-openshift-ocp-installer-e2e-ovirt-4.6 - 10 runs, 80% failed, 25% of failures match Also note that this is really a post-test state check, not a post-install state check [1]. Picking on release-openshift-ocp-installer-e2e-gcp-ovn-4.6, here are some jobs: $ curl -s 'https://search.ci.openshift.org/search?search=operator+install+console&maxAge=24h&type=junit&groupBy=job&name=release-openshift-ocp-installer-e2e-gcp-ovn-4.6' | jq -r 'keys[]' https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.6/1314410856292814848 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.6/1314697955512422400 The latter blew up before bootstrap-complete: level=error msg="Cluster operator network Degraded is True with RolloutHung: DaemonSet \"openshift-ovn-kubernetes/ovnkube-node\" rollout is not making progress - last change 2020-10-09T22:56:39Z" level=info msg="Cluster operator network Progressing is True with Deploying: DaemonSet \"openshift-multus/network-metrics-daemon\" is not available (awaiting 3 nodes)\nDaemonSet \"openshift-multus/multus-admission-controller\" is waiting for other operators to become ready\nDaemonSet \"openshift-ovn-kubernetes/ovnkube-node\" is not available (awaiting 3 nodes)" level=info msg="Cluster operator network Available is False with Startup: The network is starting up" level=info msg="Pulling debug logs from the bootstrap machine" level=info msg="Bootstrap gather logs captured here \"/tmp/artifacts/installer/log-bundle-20201009232434.tar.gz\"" level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition" And that "awaiting 3 nodes" thing is because the control-plane kubelets are mad: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.6/1314697955512422400/artifacts/e2e-gcp/nodes.json | jq -r '.items[].status.conditions[] | select(.type == "Ready") | .status + " " + .reason + ": " + .message' False KubeletNotReady: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? False KubeletNotReady: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? False KubeletNotReady: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? I dunno who's fault _that_ is, but I'm pretty sure it's not the console. Sending it over to the node folks. [1]: https://github.com/openshift/release/pull/12298
*** Bug 1886840 has been marked as a duplicate of this bug. ***
If SDN doesn't put CNI configs in /etc/kubernetes/cni/net.d, then CRI-O can make no forward progress. Moving to networking to investigate what is going wrong
Seeing similar behavior in 4.6.0-rc.2, RHCOS version 46.82.202010091720-0. Mine is a baremetal UPI install with 3 node controller/worker deployment. Journalctl output on control plane nodes lists similar NetworkPluginNotReady message. Looking also at the openshift-ovn-kubernetes namespace, ovnkube-node pods are not able to get subnet annotations. oc get pods -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE ovnkube-master-9jwnx 6/6 Running 1 63m ovnkube-master-ffjbc 6/6 Running 0 63m ovnkube-master-mxffv 6/6 Running 0 63m ovnkube-node-bkjzf 2/3 CrashLoopBackOff 9 63m ovnkube-node-vdmt7 2/3 Error 9 63m ovnkube-node-xck78 2/3 CrashLoopBackOff 9 63m ovs-node-5vkk5 1/1 Running 0 63m ovs-node-jfbk4 1/1 Running 0 63m ovs-node-mptdm 1/1 Running 0 63m I1012 19:40:11.113574 73249 node.go:193] Waiting for node master2.sandbox.lab to start, no annotation found on node for subnet: node "master2.sandbox.lab" has no "k8s.ovn.org/node-subnets" annotation
There is a suspicion that this is a dup of bug 1886834.
Peter's got it correctly, if there's CNI configuration dropped, it's a strong sign that something with the default network provider (OVN or openshift-sdn) didn't complete properly, and it didn't drop its configuration on disk. Taking a look at this ovnkube log from the above provided runs: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-ovn-4.6/1314410856292814848/artifacts/e2e-gcp/pods/openshift-ovn-kubernetes_ovnkube-master-8mnrn_kube-rbac-proxy.log I can see that OVN kubernetes is complaining that `ovn-master-metrics-cert not mounted after 20 minutes` This is directly related to bug 1886834, which addresses always mounting the certs share, as completed in: https://github.com/openshift/cluster-network-operator/pull/834/files I believe this should be considered a dupe.
*** This bug has been marked as a duplicate of bug 1886834 ***