Bug 1895099 - vsphere-upi and vsphere-upi-serial jobs time out waiting for bootstrap to complete in CI [NEEDINFO]
Summary: vsphere-upi and vsphere-upi-serial jobs time out waiting for bootstrap to com...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Brad P. Crochet
QA Contact: jima
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-05 18:27 UTC by Fabian von Feilitzsch
Modified: 2021-02-24 15:31 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:31:24 UTC
Target Upstream Version:
augol: needinfo? (fabian)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2207 0 None closed Bug 1895099: Fix VSphere UPI not populating PlatformStatus 2021-01-07 19:42:19 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:31:54 UTC

Description Fabian von Feilitzsch 2020-11-05 18:27:33 UTC
Version: 4.7


Platform: vsphere

#Please specify the platform type: aws, libvirt, openstack or baremetal etc.

Please specify:
* UPI (semi-manual installation on customized infrastructure)

What happened?

This is blocking nightly payloads

Bootstrap node came up, and then the install failed with the error: "level=fatal msg=failed to wait for bootstrapping to complete: timed out waiting for the condition". I was unable to find any logs that indicated why it timed out in any more detail than that.

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-vsphere-upi/1324353285213130752

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-vsphere-upi-serial/1324353271518728192

Comment 1 Joseph Callen 2020-11-05 19:40:56 UTC
Changing the owner, if incorrect please change to the appropriate group.

This change:
https://github.com/openshift/machine-config-operator/pull/2079


[root@bootstrap-0 ~]# crictl logs -f 415
I1105 19:35:32.338275       1 bootstrap.go:40] Version: v4.7.0-202011040512.p0-dirty (b25d87d0e1a79d62a92257609b7734f7b63a4d22)
I1105 19:35:32.374219       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-dns-02-config.yml" [1] manifest because of unhandled *v1.DNS                                                                                                                             
I1105 19:35:32.375825       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-infrastructure-02-config.yml" [1] manifest because of unhandled *v1.Infrastructure                                                                                                       
I1105 19:35:32.380835       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-ingress-02-config.yml" [1] manifest because of unhandled *v1.Ingress                                                                                                                     
I1105 19:35:32.381444       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-network-02-config.yml" [1] manifest because of unhandled *v1.Network                                                                                                                     
I1105 19:35:32.381840       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-proxy-01-config.yaml" [1] manifest because of unhandled *v1.Proxy                                                                                                                        
I1105 19:35:32.385303       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-scheduler-02-config.yml" [1] manifest because of unhandled *v1.Scheduler                                                                                                                 
I1105 19:35:32.392922       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cvo-overrides.yaml" [1] manifest because of unhandled *v1.ClusterVersion                                                                                                                         
F1105 19:35:32.409566       1 bootstrap.go:47] error running MCC[BOOTSTRAP]: failed to create MachineConfig for role master: failed to execute template: template: /etc/mcc/templates/common/on-prem/files/NetworkManager-resolv-prepender.yaml:50:16: executing "/etc/mcc/templates/common/on-prem/files/NetworkManager-resolv-prepender.yaml" at <onPremPlatformAPIServerInternalIP .>: error calling onPremPlatformAPIServerInternalIP: runtime error: invalid memory address or nil pointer dereference                                                  
[root@bootstrap-0 ~]#

Comment 2 Patrick Dillon 2020-11-05 22:11:26 UTC
In UPI installs infra.Status.PlatformStatus.VSphere will be nil which is crashing here: https://github.com/openshift/machine-config-operator/blob/master/pkg/controller/template/render.go#L432

What we did previously was not populate the file at all for UPI (when VSphere is nil): https://github.com/openshift/machine-config-operator/pull/2079/files#diff-88b9eeb5fb253707d42dfa34ac2143dda76d949209de64c032a8cd11a0d97c29

Comment 5 jima 2020-11-23 05:51:34 UTC
Verified on upi-on-vsphere with version 4.7.0-0.nightly-2020-11-22-204912, installation is successful, so move bug to VERIFIED.

Comment 8 errata-xmlrpc 2021-02-24 15:31:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.