Bug 1811530
| Summary: | Install failed due to mdns record changed | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | weiwei jiang <wjiang> |
| Component: | Etcd Operator | Assignee: | Sam Batschelet <sbatsche> |
| Status: | CLOSED DUPLICATE | QA Contact: | ge liu <geliu> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.4 | CC: | fbrychta, ikarpukh, jzmeskal, m.andre, pprinett, scuppett, smilner, wewang, wsun, yanyang, yprokule |
| Target Milestone: | --- | Keywords: | Regression, TestBlocker |
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-03-11 13:29:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1809238, 1810490 | ||
This is also affecting upstream CI. I'm looking into it. This is likely caused by https://github.com/openshift/cluster-etcd-operator/pull/233 (backported to 4.4 in https://github.com/openshift/cluster-etcd-operator/pull/239), I'll port https://github.com/openshift/machine-config-operator/commit/2908ca449b46200cbed67ae5a465243a7919f144 to openstack, hopefully this is enough to fix our issue. When install OCP in GCP, met the same issue as follow: level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.nightly-2020-03-09-234759: 76% complete" level=error msg="Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::RouterCerts_NoRouterCertSecret: RouterCertsDegraded: secret/v4-0-config-system-router-certs -n openshift-authentication: could not be retrieved: secret \"v4-0-config-system-router-certs\" not found\nIngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server" level=info msg="Cluster operator authentication Progressing is Unknown with NoData: " level=info msg="Cluster operator authentication Available is Unknown with NoData: " level=error msg="Cluster operator kube-apiserver Degraded is True with InstallerPodContainerWaiting_CreateContainerError::StaticPods_Error: InstallerPodContainerWaitingDegraded: Pod \"installer-2-wewang-vw88w-m-1.c.openshift-qe.internal\" on node \"wewang-vw88w-m-1.c.openshift-qe.internal\" container \"installer\" is waiting for 38m21.732926905s because \"the container name \\\"k8s_installer_installer-2-wewang-vw88w-m-1.c.openshift-qe.internal_openshift-kube-apiserver_243574fe-ebe9-4756-9e7f-6e8a446bf457_0\\\" is already in use by \\\"df46526127e942582cf15846967911f37d4a8db5abd712b0500d561131974176\\\". You have to remove that container to be able to reuse that name.: that name is already in use\"\nStaticPodsDegraded: nodes/wewang-vw88w-m-2.c.openshift-qe.internal pods/kube-apiserver-wewang-vw88w-m-2.c.openshift-qe.internal container=\"kube-apiserver-cert-regeneration-controller\" is not ready\nStaticPodsDegraded: nodes/wewang-vw88w-m-2.c.openshift-qe.internal pods/kube-apiserver-wewang-vw88w-m-2.c.openshift-qe.internal container=\"kube-apiserver-cert-regeneration-controller\" is waiting: \"CrashLoopBackOff\" - \"back-off 5m0s restarting failed container=kube-apiserver-cert-regeneration-controller pod=kube-apiserver-wewang-vw88w-m-2.c.openshift-qe.internal_openshift-kube-apiserver(b3014b7a2f1c6b8515fe65cbb22372bd)\"\nStaticPodsDegraded: pods \"kube-apiserver-wewang-vw88w-m-1.c.openshift-qe.internal\" not found\nStaticPodsDegraded: pods \"kube-apiserver-wewang-vw88w-m-0.c.openshift-qe.internal\" not found" Fyi, About comment 8, the cluster is about ocp in ipi on gcp. Should be fixed with https://github.com/openshift/cluster-kube-apiserver-operator/pull/791 Today set up the cluster successfully for 'IPI on GCP with http_proxy&ovn' against 4.4.0-0.nightly-2020-03-10-194324 I can confirm this issue was also encountered when deploying OPC on RHV with 4.4.0-0.nightly-2020-03-09-175442 *** Bug 1811855 has been marked as a duplicate of this bug. *** *** This bug has been marked as a duplicate of bug 1812071 *** |
Description of problem: Checked with recent OCP on OSP installation, found kube-apiserver can not be ready. # oc get pods -n openshift-kube-apiserver -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES installer-2-qe-wjios44-6bf4h-master-2 0/1 Completed 0 116s 10.128.0.26 qe-wjios44-6bf4h-master-2 <none> <none> kube-apiserver-qe-wjios44-6bf4h-master-2 3/4 Running 3 104s 192.168.0.13 qe-wjios44-6bf4h-master-2 <none> <none> # oc -n openshift-kube-apiserver logs kube-apiserver-qe-wjios44-b8bvg-master-1 W0309 05:30:32.293133 1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://etcd-0.qe-wjios44.0309-xtg.qe.rhcloud.com:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-0.qe-wjios44.0309-xtg.qe.rhcloud.com on 192.168.0.6:53: no such host". Reconnecting... W0309 05:30:32.769123 1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://etcd-2.qe-wjios44.0309-xtg.qe.rhcloud.com:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-2.qe-wjios44.0309-xtg.qe.rhcloud.com on 192.168.0.6:53: no such host". Reconnecting... W0309 05:30:33.301544 1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://etcd-1.qe-wjios44.0309-xtg.qe.rhcloud.com:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp: lookup etcd-1.qe-wjios44.0309-xtg.qe.rhcloud.com on 192.168.0.6:53: no such host". Reconnecting... [root@qe-wjios44-b8bvg-master-0 core]# dig +short -t SRV @127.0.0.1 _etcd-server-ssl._tcp.qe-wjios44.0309-xtg.qe.rhcloud.com 0 10 2380 qe-wjios44-b8bvg-etcd-0.qe-wjios44.0309-xtg.qe.rhcloud.com. 0 10 2380 qe-wjios44-b8bvg-etcd-2.qe-wjios44.0309-xtg.qe.rhcloud.com. 0 10 2380 qe-wjios44-b8bvg-etcd-1.qe-wjios44.0309-xtg.qe.rhcloud.com. [root@qe-wjios44-b8bvg-master-0 core]# dig +short @127.0.0.1 qe-wjios44-b8bvg-etcd-0.qe-wjios44.0309-xtg.qe.rhcloud.com. 192.168.0.17 [root@qe-wjios44-b8bvg-master-0 core]# dig +short @127.0.0.1 etcd-0.qe-wjios44.0309-xtg.qe.rhcloud.com. [root@qe-wjios44-b8bvg-master-0 core]# Version-Release number of selected component (if applicable): 4.4.0-0.nightly-2020-03-08-235004 How reproducible: Always Steps to Reproduce: 1. Try install IPI on OSP cluster 2. 3. Actual results: INFO Waiting up to 20m0s for the Kubernetes API at https://api.qe-wjios44.0309-xtg.qe.rhcloud.com:6443... DEBUG Still waiting for the Kubernetes API: Get https://api.qe-wjios44.0309-xtg.qe.rhcloud.com:6443/version?timeout=32s: dial tcp 10.0.98.45:6443: i/o timeout INFO API v1.17.1 up INFO Waiting up to 40m0s for bootstrapping to complete... ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::RouterCerts_NoRouterCertSecret: RouterCertsDegraded: secret/v4-0-config-system-router-certs -n openshift-authenti cation: could not be retrieved: secret "v4-0-config-system-router-certs" not found IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server INFO Cluster operator authentication Progressing is Unknown with NoData: INFO Cluster operator authentication Available is Unknown with NoData: ERROR Cluster operator kube-apiserver Degraded is True with StaticPods_Error: StaticPodsDegraded: nodes/qe-wjios44-b8bvg-master-1 pods/kube-apiserver-qe-wjios44-b8bvg-master-1 container="kube-apiserver" is not r eady StaticPodsDegraded: nodes/qe-wjios44-b8bvg-master-1 pods/kube-apiserver-qe-wjios44-b8bvg-master-1 container="kube-apiserver" is waiting: "CrashLoopBackOff" - "back-off 5m0s restarting failed container=kube-apise rver pod=kube-apiserver-qe-wjios44-b8bvg-master-1_openshift-kube-apiserver(4ad72f4ecbd4b2d85c8988b8b3aa8a4f)" StaticPodsDegraded: nodes/qe-wjios44-b8bvg-master-1 pods/kube-apiserver-qe-wjios44-b8bvg-master-1 container="kube-apiserver-cert-regeneration-controller" is not ready StaticPodsDegraded: nodes/qe-wjios44-b8bvg-master-1 pods/kube-apiserver-qe-wjios44-b8bvg-master-1 container="kube-apiserver-cert-regeneration-controller" is waiting: "CrashLoopBackOff" - "back-off 5m0s restartin g failed container=kube-apiserver-cert-regeneration-controller pod=kube-apiserver-qe-wjios44-b8bvg-master-1_openshift-kube-apiserver(4ad72f4ecbd4b2d85c8988b8b3aa8a4f)" StaticPodsDegraded: pods "kube-apiserver-qe-wjios44-b8bvg-master-0" not found StaticPodsDegraded: pods "kube-apiserver-qe-wjios44-b8bvg-master-2" not found Expected results: Should work well Additional info: