Bug 1768051
| Summary: | OpenShift 4.2 Install Fails with error: the server doesn't have a resource type "csr" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Aja Lightner <alightne> | ||||
| Component: | Machine Config Operator | Assignee: | Erica von Buelow <evb> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Michael Nguyen <mnguyen> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.2.0 | CC: | adahiya, amurdaca, aos-bugs, eparis, jcallen, jokerman, kgarriso, mstaeble, rphillips, smilner | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.4.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-03-03 18:24:03 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Aja Lightner
2019-11-02 00:50:55 UTC
control-plane/10.123.13.102/journals/kubelet.log:Oct 31 14:49:37 etcd-1.o4.dr3.demo.sk hyperkube[1154]: E1031 14:49:37.907026 1154 certificate_manager.go:385] Failed while requesting a signed certificate from the master: cannot create certificate signing request: the server rejected our request for an unknown reason (post certificatesigningrequests.certificates.k8s.io) Created attachment 1631710 [details]
Log bundle
> log-bundle-20191031145106/bootstrap/journals/bootkube.log ``` Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-0.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.101:2379: connect: connection refused Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-2.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.103:2379: connect: connection refused Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-1.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.102:2379: connect: connection refused Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: Error: unhealthy cluster Oct 31 14:45:34 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: etcdctl failed. Retrying in 5 seconds... ``` The bootstrap-host is waiting for etcd-cluster formation on control-plane hosts. > log-bundle-20191031145106/bootstrap/containers/machine-config-server-ae8426373114ed617b03030a747589d9a38efc8a7aa38b07849219995bb86a86.log ``` I1031 14:04:26.885488 1 api.go:97] Pool master requested by 10.123.13.80:46260 I1031 14:04:26.885538 1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml" I1031 14:04:26.887600 1 bootstrap_server.go:82] reading file "/etc/mcs/bootstrap/machine-configs/rendered-master-b38dadd973a9c0be0f894d3cd69ee8e8.yaml" I1031 14:05:47.131961 1 api.go:97] Pool master requested by 10.123.13.80:46890 I1031 14:05:47.132943 1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml" I1031 14:05:47.133592 1 bootstrap_server.go:82] reading file "/etc/mcs/bootstrap/machine-configs/rendered-master-b38dadd973a9c0be0f894d3cd69ee8e8.yaml" I1031 14:07:35.453662 1 api.go:97] Pool master requested by 10.123.13.80:47728 I1031 14:07:35.454651 1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml" ``` The control-plane hosts have requested the ignition from bootstrap-host. So looking at > /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.101/ > /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.102/ there are containers running on that host `empty containers` directory and kubelet is also not showing errors for why etcd statisc pods are not running. > /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.103/ the init containers for etcd have completed but etcd-member pods are failinig or haven't started yet, no logs from kubelet regarding anything. Moving to node team to help debug. Thank you, Abhinav, for the update. Hi Abhinav - Did you have any updates from the node team? Were they able to help with debugging this? Thank you, Aja bootstrap/containers/machine-config-controller-7b89f76874a18448df276b8ecf7a14cef4ad2911a6f1b9f062a20e12dc4ddbaf.log: ``` I1031 14:04:22.198338 1 bootstrap.go:40] Version: v4.2.0-201910101614-dirty (62b0b6d2a751a5f364f2e6d5c9cfe63419668777) W1031 14:04:22.426844 1 render.go:137] Warning: the controller config referenced an unsupported platform: vsphere W1031 14:04:22.466008 1 render.go:137] Warning: the controller config referenced an unsupported platform: vsphere ``` Looks like the MCO is reporting vsphere is an unsupported platform. This is strange because the docs show vsphere should be supported [1]. Going to reassign to the MCO team for more input. 1. https://docs.openshift.com/container-platform/4.2/installing/installing_vsphere/installing-vsphere.html#installation-vsphere-config-yaml_installing-vsphere MCO has been waiting for verification on the status of vsphere : https://github.com/openshift/machine-config-operator/pull/998#discussion_r318568006 We are happy to merge (and make any other changes) but were told there were issues with kubelet on vsphere with no update to the contrary. Please let us know.. I have not heard of any Kubelet issues on Vsphere. Is there something the Node team should look into? Antonio, is there anything Ryan/Node needs outside of your comment here?: https://github.com/openshift/machine-config-operator/pull/998#discussion_r318568006 (In reply to Kirsten Garrison from comment #9) > Antonio, is there anything Ryan/Node needs outside of your comment here?: > https://github.com/openshift/machine-config-operator/pull/ > 998#discussion_r318568006 I don't think so, also, that's just a warning, how is it causing any issue here? As https://github.com/openshift/machine-config-operator/pull/998 has merged, is there a further problem we need to investigate in this BZ? Hi Team, The problem we were investigating is the failed 4.2 installation on vSphere (see Comment 1 and Comment 3), and not just removing the warning message that stated vSphere was unsupported (Comment 6). Please let me know if there is more I need to gather to help solve this. Log Errors, from Comment 1: control-plane/10.123.13.102/journals/kubelet.log:Oct 31 14:49:37 etcd-1.o4.dr3.demo.sk hyperkube[1154]: E1031 14:49:37.907026 1154 certificate_manager.go:385] Failed while requesting a signed certificate from the master: cannot create certificate signing request: the server rejected our request for an unknown reason (post certificatesigningrequests.certificates.k8s.io) Log Errors, From Comment 3: > log-bundle-20191031145106/bootstrap/journals/bootkube.log ``` Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-0.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.101:2379: connect: connection refused Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-2.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.103:2379: connect: connection refused Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-1.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.102:2379: connect: connection refused Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: Error: unhealthy cluster Oct 31 14:45:34 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: etcdctl failed. Retrying in 5 seconds... ``` The bootstrap-host is waiting for etcd-cluster formation on control-plane hosts. > log-bundle-20191031145106/bootstrap/containers/machine-config-server-ae8426373114ed617b03030a747589d9a38efc8a7aa38b07849219995bb86a86.log ``` I1031 14:04:26.885488 1 api.go:97] Pool master requested by 10.123.13.80:46260 I1031 14:04:26.885538 1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml" I1031 14:04:26.887600 1 bootstrap_server.go:82] reading file "/etc/mcs/bootstrap/machine-configs/rendered-master-b38dadd973a9c0be0f894d3cd69ee8e8.yaml" I1031 14:05:47.131961 1 api.go:97] Pool master requested by 10.123.13.80:46890 I1031 14:05:47.132943 1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml" I1031 14:05:47.133592 1 bootstrap_server.go:82] reading file "/etc/mcs/bootstrap/machine-configs/rendered-master-b38dadd973a9c0be0f894d3cd69ee8e8.yaml" I1031 14:07:35.453662 1 api.go:97] Pool master requested by 10.123.13.80:47728 I1031 14:07:35.454651 1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml" ``` The control-plane hosts have requested the ignition from bootstrap-host. So looking at > /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.101/ > /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.102/ there are containers running on that host `empty containers` directory and kubelet is also not showing errors for why etcd statisc pods are not running. > /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.103/ the init containers for etcd have completed but etcd-member pods are failinig or haven't started yet, no logs from kubelet regarding anything. Please double-check: - NTP/time on the ESXi hosts and confirm the guests have the correct time as well - Confirm all DNS records, confirm the RHCOS guests are resolving correctly. I am asking the customer to confirm these items now. Thanks Joseph. Any updates on this? Aja, Any updates? |