Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1768051

Summary:

OpenShift 4.2 Install Fails with error: the server doesn't have a resource type "csr"

Product:

OpenShift Container Platform

Reporter:

Aja Lightner <alightne>

Component:

Machine Config Operator

Assignee:

Erica von Buelow <evb>

Status:

CLOSED INSUFFICIENT_DATA

QA Contact:

Michael Nguyen <mnguyen>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

4.2.0

CC:

adahiya, amurdaca, aos-bugs, eparis, jcallen, jokerman, kgarriso, mstaeble, rphillips, smilner

Target Milestone:

---

Target Release:

4.4.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-03-03 18:24:03 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Log bundle	none

Description Aja Lightner 2019-11-02 00:50:55 UTC

Description of problem:
OpenShift 4.2 UPI install fails on CoreOS/VMWare
- The bootstrap API server is up to an extent, but it's returning a 404
- The bootstrap is unable to approve CSRs

Version-Release number of the following components:
4.2


How reproducible:

Steps to Reproduce:
1. Customer is following instructions per: https://docs.openshift.com/container-platform/4.2/installing/installing_vsphere/installing-vsphere.html


Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:
Successful install of 4.2

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Aja Lightner 2019-11-02 00:55:26 UTC

control-plane/10.123.13.102/journals/kubelet.log:Oct 31 14:49:37 etcd-1.o4.dr3.demo.sk hyperkube[1154]: E1031 14:49:37.907026    1154 certificate_manager.go:385] Failed while requesting a signed certificate from the master: cannot create certificate signing request: the server rejected our request for an unknown reason (post certificatesigningrequests.certificates.k8s.io)

Comment 2 Aja Lightner 2019-11-02 00:56:16 UTC

Created attachment 1631710 [details]
Log bundle

Comment 3 Abhinav Dahiya 2019-11-04 19:43:30 UTC

> log-bundle-20191031145106/bootstrap/journals/bootkube.log


```
Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-0.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.101:2379: connect: connection refused
Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-2.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.103:2379: connect: connection refused
Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-1.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.102:2379: connect: connection refused
Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: Error: unhealthy cluster
Oct 31 14:45:34 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: etcdctl failed. Retrying in 5 seconds...
```

The bootstrap-host is waiting for etcd-cluster formation on control-plane hosts.


> log-bundle-20191031145106/bootstrap/containers/machine-config-server-ae8426373114ed617b03030a747589d9a38efc8a7aa38b07849219995bb86a86.log

```
I1031 14:04:26.885488       1 api.go:97] Pool master requested by 10.123.13.80:46260
I1031 14:04:26.885538       1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml"
I1031 14:04:26.887600       1 bootstrap_server.go:82] reading file "/etc/mcs/bootstrap/machine-configs/rendered-master-b38dadd973a9c0be0f894d3cd69ee8e8.yaml"
I1031 14:05:47.131961       1 api.go:97] Pool master requested by 10.123.13.80:46890
I1031 14:05:47.132943       1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml"
I1031 14:05:47.133592       1 bootstrap_server.go:82] reading file "/etc/mcs/bootstrap/machine-configs/rendered-master-b38dadd973a9c0be0f894d3cd69ee8e8.yaml"
I1031 14:07:35.453662       1 api.go:97] Pool master requested by 10.123.13.80:47728
I1031 14:07:35.454651       1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml"
```

The control-plane hosts have requested the ignition from bootstrap-host.


So looking at 
> /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.101/
> /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.102/

there are containers running on that host `empty containers` directory and kubelet is also not showing errors for why etcd statisc pods are not running.

> /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.103/

the init containers for etcd have completed but etcd-member pods are failinig or haven't started yet, no logs from kubelet regarding anything.


Moving to node team to help debug.

Comment 4 Aja Lightner 2019-11-04 22:16:50 UTC

Thank you, Abhinav, for the update.

Comment 5 Aja Lightner 2019-11-09 17:19:52 UTC

Hi Abhinav - Did you have any updates from the node team? Were they able to help with debugging this?

Thank you,
Aja

Comment 6 Ryan Phillips 2019-11-11 15:22:17 UTC

bootstrap/containers/machine-config-controller-7b89f76874a18448df276b8ecf7a14cef4ad2911a6f1b9f062a20e12dc4ddbaf.log:
```

I1031 14:04:22.198338       1 bootstrap.go:40] Version: v4.2.0-201910101614-dirty (62b0b6d2a751a5f364f2e6d5c9cfe63419668777)
W1031 14:04:22.426844       1 render.go:137] Warning: the controller config referenced an unsupported platform: vsphere
W1031 14:04:22.466008       1 render.go:137] Warning: the controller config referenced an unsupported platform: vsphere

```

Looks like the MCO is reporting vsphere is an unsupported platform. This is strange because the docs show vsphere should be supported [1]. Going to reassign to the MCO team for more input.

1. https://docs.openshift.com/container-platform/4.2/installing/installing_vsphere/installing-vsphere.html#installation-vsphere-config-yaml_installing-vsphere

Comment 7 Kirsten Garrison 2019-11-12 14:59:16 UTC

MCO has been waiting for verification on the status of vsphere :

https://github.com/openshift/machine-config-operator/pull/998#discussion_r318568006

We are happy to merge (and make any other changes) but were told there were issues with kubelet on vsphere with no update to the contrary.

Please let us know..

Comment 8 Ryan Phillips 2019-11-12 15:06:20 UTC

I have not heard of any Kubelet issues on Vsphere. Is there something the Node team should look into?

Comment 9 Kirsten Garrison 2019-11-12 16:53:59 UTC

Antonio, is there anything Ryan/Node needs outside of your comment here?: 
https://github.com/openshift/machine-config-operator/pull/998#discussion_r318568006

Comment 10 Antonio Murdaca 2019-11-21 11:15:52 UTC

(In reply to Kirsten Garrison from comment #9)
> Antonio, is there anything Ryan/Node needs outside of your comment here?: 
> https://github.com/openshift/machine-config-operator/pull/
> 998#discussion_r318568006

I don't think so, also, that's just a warning, how is it causing any issue here?

Comment 11 Erica von Buelow 2019-11-25 16:01:54 UTC

As https://github.com/openshift/machine-config-operator/pull/998 has merged, is there a further problem we need to investigate in this BZ?

Comment 12 Aja Lightner 2019-11-27 21:28:44 UTC

Hi Team,

The problem we were investigating is the failed 4.2 installation on vSphere (see Comment 1 and Comment 3), and not just removing the warning message that stated vSphere was unsupported (Comment 6). 

Please let me know if there is more I need to gather to help solve this.



Log Errors, from Comment 1:
control-plane/10.123.13.102/journals/kubelet.log:Oct 31 14:49:37 etcd-1.o4.dr3.demo.sk hyperkube[1154]: E1031 14:49:37.907026    1154 certificate_manager.go:385] Failed while requesting a signed certificate from the master: cannot create certificate signing request: the server rejected our request for an unknown reason (post certificatesigningrequests.certificates.k8s.io)


Log Errors, From Comment 3:
> log-bundle-20191031145106/bootstrap/journals/bootkube.log


```
Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-0.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.101:2379: connect: connection refused
Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-2.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.103:2379: connect: connection refused
Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: https://etcd-1.o4.dr3.demo.sk:2379 is unhealthy: failed to connect: dial tcp 10.123.13.102:2379: connect: connection refused
Oct 31 14:45:33 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: Error: unhealthy cluster
Oct 31 14:45:34 bootstrap.o4.dr3.demo.sk bootkube.sh[1783]: etcdctl failed. Retrying in 5 seconds...
```

The bootstrap-host is waiting for etcd-cluster formation on control-plane hosts.


> log-bundle-20191031145106/bootstrap/containers/machine-config-server-ae8426373114ed617b03030a747589d9a38efc8a7aa38b07849219995bb86a86.log

```
I1031 14:04:26.885488       1 api.go:97] Pool master requested by 10.123.13.80:46260
I1031 14:04:26.885538       1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml"
I1031 14:04:26.887600       1 bootstrap_server.go:82] reading file "/etc/mcs/bootstrap/machine-configs/rendered-master-b38dadd973a9c0be0f894d3cd69ee8e8.yaml"
I1031 14:05:47.131961       1 api.go:97] Pool master requested by 10.123.13.80:46890
I1031 14:05:47.132943       1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml"
I1031 14:05:47.133592       1 bootstrap_server.go:82] reading file "/etc/mcs/bootstrap/machine-configs/rendered-master-b38dadd973a9c0be0f894d3cd69ee8e8.yaml"
I1031 14:07:35.453662       1 api.go:97] Pool master requested by 10.123.13.80:47728
I1031 14:07:35.454651       1 bootstrap_server.go:62] reading file "/etc/mcs/bootstrap/machine-pools/master.yaml"
```

The control-plane hosts have requested the ignition from bootstrap-host.


So looking at 
> /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.101/
> /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.102/

there are containers running on that host `empty containers` directory and kubelet is also not showing errors for why etcd statisc pods are not running.

> /tmp/mozilla_adahiya0/log-bundle-20191031145106/control-plane/10.123.13.103/

the init containers for etcd have completed but etcd-member pods are failinig or haven't started yet, no logs from kubelet regarding anything.

Comment 13 Joseph Callen 2019-12-05 17:08:59 UTC

Please double-check:

- NTP/time on the ESXi hosts and confirm the guests have the correct time as well
- Confirm all DNS records, confirm the RHCOS guests are resolving correctly.

Comment 14 Aja Lightner 2019-12-09 17:01:29 UTC

I am asking the customer to confirm these items now. Thanks Joseph.

Comment 15 Erica von Buelow 2019-12-12 16:18:35 UTC

Any updates on this?

Comment 17 Steve Milner 2020-02-03 15:06:30 UTC

Aja,

Any updates?