Bug 1797796
| Summary: | Cluster etcd operator cannot talk to bootstrap pod because of auth failures | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alay Patel <alpatel> |
| Component: | Etcd | Assignee: | Sam Batschelet <sbatsche> |
| Status: | CLOSED DUPLICATE | QA Contact: | ge liu <geliu> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.4 | CC: | adahiya, augol, eslutsky, mfojtik, mvirgil, rgolan, skolicha, wking, zshi |
| Target Milestone: | --- | ||
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Release Note | |
| Doc Text: |
When there are mutliple networks, it is important to remember the following guidelines:
1) bootkube would have to be populated with BOOTSTRAP_IP in the same subnet as the masters
2) storage URLs in kube apiserver will also have to be with an IP from same subnet as the masters or the cert signer will have to produce certs with all IPs included in the SAN.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-03-10 16:22:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1771572 | ||
seems like similar failures https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.4/986 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.4/987 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/3059/pull-ci-openshift-installer-master-e2e-gcp/234 I suspect we see the same for ovirt and its blocking us: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/3047/pull-ci-openshift-installer-master-e2e-ovirt/601/artifacts/e2e-ovirt/pods/openshift-etcd-operator_etcd-operator-6fbdf775c5-blkcw_operator.log *** This bug has been marked as a duplicate of bug 1807169 *** *** This bug has been marked as a duplicate of bug 1808060 *** |
Description of problem: In 4.4, the cluster-etcd-operator(CEO) scales the etcd cluster from bootstrap node to 4 member control plane (3 etcd pods for each master). Sometimes, the scaling times out because CEO pod is not able to talk to the bootstrap etcd in order to add other etcd nodes as members of etcd. The error from operator logs is: ------ I0201 18:29:56.506190 1 util.go:37] checking against etcd-2.ci-op-1yrd4g86-e4498.origin-ci-int-gce.dev.openshift.com. W0201 18:29:57.079291 1 clientconn.go:1156] grpc: addrConn.createTransport failed to connect to {https://10.0.0.5:2379 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate signed by unknown authority". Reconnecting... How reproducible: This is probably major component of bootstrapping failures in CI. grep for "Err :connection" in [1][2][3] Expected results: The operator pod is expected to be able to have correct certs to talk to bootstrap etcd Additional info: Another quick way to spot this bug in CI is looking for etcd resource in must-gather. If one member is in Ready state, and other two are in unknown state, it is because the etcd-operaror is likely erroring out on auth failures in adding the member to the cluster, example as follows: --------- observedConfig: cluster: members: - name: etcd-bootstrap peerURLs: https://10.0.0.6:2380 status: Unknown pending: - name: etcd-member-ci-op-kd2mp-m-1.c.openshift-gce-devel-ci.internal peerURLs: https://etcd-1.ci-op-9d6rs79x-15937.origin-ci-int-gce.dev.openshift.com:2380 status: Unknown - name: etcd-member-ci-op-kd2mp-m-0.c.openshift-gce-devel-ci.internal peerURLs: https://etcd-0.ci-op-9d6rs79x-15937.origin-ci-int-gce.dev.openshift.com:2380 status: Ready - name: etcd-member-ci-op-kd2mp-m-2.c.openshift-gce-devel-ci.internal peerURLs: https://etcd-2.ci-op-9d6rs79x-15937.origin-ci-int-gce.dev.openshift.com:2380 status: Unknown 1. https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/2745/pull-ci-openshift-installer-master-e2e-gcp/222/artifacts/e2e-gcp/pods/openshift-etcd-operator_etcd-operator-f78f5b65c-jzqlz_operator.log 2.https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/68/pull-ci-openshift-cluster-etcd-operator-master-e2e-gcp-upgrade/195/artifacts/e2e-gcp-upgrade/pods/openshift-etcd-operator_etcd-operator-55f94bfd85-hhvck_operator.log 3.https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/6986/rehearse-6986-pull-ci-openshift-origin-master-e2e-conformance-k8s/5/artifacts/e2e-conformance-k8s/pods/openshift-etcd-operator_etcd-operator-bbd958bb7-k476j_operator.log