Bug 1811760
| Summary: | [ovirt] Some cluster operators fail to come up because RHV CA is not trusted by a pod | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Roy Golan <rgolan> |
| Component: | Installer | Assignee: | Roy Golan <rgolan> |
| Installer sub component: | OpenShift on RHV | QA Contact: | Jan Zmeskal <jzmeskal> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | high | CC: | jcall, jialiu, jzmeskal |
| Version: | 4.4 | Keywords: | TestBlockerForLayeredProduct |
| Target Milestone: | --- | ||
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1794313 | Environment: | |
| Last Closed: | 2020-05-13 22:00:59 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1794313 | ||
| Bug Blocks: | |||
|
Description
Roy Golan
2020-03-09 17:16:30 UTC
I have tried verifying this with openshift-install-linux-4.4.0-0.nightly-2020-03-12-052849, but some operators still did not come up:
oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication Unknown Unknown True 64m
cloud-credential 4.4.0-0.nightly-2020-03-12-052849 True False False 69m
cluster-autoscaler 4.4.0-0.nightly-2020-03-12-052849 True False False 53m
console 4.4.0-0.nightly-2020-03-12-052849 Unknown True False 55m
dns 4.4.0-0.nightly-2020-03-12-052849 True False False 61m
etcd 4.4.0-0.nightly-2020-03-12-052849 True False False 60m
image-registry 4.4.0-0.nightly-2020-03-12-052849 True False False 53m
ingress unknown False True True 54m
insights 4.4.0-0.nightly-2020-03-12-052849 True False False 54m
kube-apiserver 4.4.0-0.nightly-2020-03-12-052849 True False False 60m
kube-controller-manager 4.4.0-0.nightly-2020-03-12-052849 True False False 60m
kube-scheduler 4.4.0-0.nightly-2020-03-12-052849 True False False 61m
kube-storage-version-migrator 4.4.0-0.nightly-2020-03-12-052849 False False False 64m
machine-api 4.4.0-0.nightly-2020-03-12-052849 True False False 61m
machine-config 4.4.0-0.nightly-2020-03-12-052849 True False False 61m
marketplace 4.4.0-0.nightly-2020-03-12-052849 True False False 54m
monitoring False True True 48m
network 4.4.0-0.nightly-2020-03-12-052849 True False False 65m
node-tuning 4.4.0-0.nightly-2020-03-12-052849 True False False 64m
openshift-apiserver 4.4.0-0.nightly-2020-03-12-052849 True False False 56m
openshift-controller-manager 4.4.0-0.nightly-2020-03-12-052849 True False False 54m
openshift-samples 4.4.0-0.nightly-2020-03-12-052849 True False False 52m
operator-lifecycle-manager 4.4.0-0.nightly-2020-03-12-052849 True False False 62m
operator-lifecycle-manager-catalog 4.4.0-0.nightly-2020-03-12-052849 True False False 62m
operator-lifecycle-manager-packageserver 4.4.0-0.nightly-2020-03-12-052849 True False False 60m
service-ca 4.4.0-0.nightly-2020-03-12-052849 True False False 64m
service-catalog-apiserver 4.4.0-0.nightly-2020-03-12-052849 True False False 64m
service-catalog-controller-manager 4.4.0-0.nightly-2020-03-12-052849 True False False 64m
storage 4.4.0-0.nightly-2020-03-12-052849 True False False 54m
The only (maybe?) relevant log message I could find is this:
oc logs pod/ingress-operator-74578cc864-rxtjz -c ingress-operator | grep ERROR | grep certificate_controller
2020-03-12T12:22:09.137Z ERROR operator.init.controller-runtime.controller controller/controller.go:218 Reconciler error {"controller": "certificate_controller", "request": "openshift-ingress-operator/default", "error": "failed to lookup wildcard cert: secrets \"router-certs-default\" not found", "errorCauses": [{"error": "failed to lookup wildcard cert: secrets \"router-certs-default\" not found"}]}
2020-03-12T12:22:43.309Z ERROR operator.init.controller-runtime.controller controller/controller.go:218 Reconciler error {"controller": "certificate_controller", "request": "openshift-ingress-operator/default", "error": "failed to publish router CA: failed to ensure \"default-ingress-cert\" in \"openshift-config-managed\" was published: Post https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/configmaps: read tcp 10.129.0.21:59996->172.30.0.1:443: read: connection reset by peer", "errorCauses": [{"error": "failed to publish router CA: failed to ensure \"default-ingress-cert\" in \"openshift-config-managed\" was published: Post https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/configmaps: read tcp 10.129.0.21:59996->172.30.0.1:443: read: connection reset by peer"}]}
However, I tried deploying cluster with the same installer in the same environment just one hour before this attempt and all went smoothly. The only difference was that for the second attempt I specified ovirt_insecure: false in ~/.ovirt/ovirt-config.yaml.
Once I figure out a place where to store must-gather logs, I'll post them here.
Just one additional information. After the failed attempt, I have once again tried to deploy OCP using the same installer into the same env, just with ovirt_insecure: true. The deployment was once again successful. Verified with: openshift-install-linux-4.4.0-0.nightly-2020-03-22-130538 rhvm-4.3.9.0-0.1.el7.noarch Verification steps: 1. Have a oVirt credentials like this: ovirt_url: https://<engine_fqdn>/ovirt-engine/api ovirt_username: admin@internal ovirt_password: "<engine_password>" ovirt_ca_bundle: |- <content of /etc/pki/ovirt-engine> 2. Run openshift-install create cluster Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |