Bug 1895099

Summary: vsphere-upi and vsphere-upi-serial jobs time out waiting for bootstrap to complete in CI
Product: OpenShift Container Platform Reporter: Fabian von Feilitzsch <fabian>
Component: NetworkingAssignee: Brad P. Crochet <brad>
Networking sub component: mDNS QA Contact: jima
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: augol, beth.white, jcallen, mifiedle, mstaeble, padillon
Version: 4.7Keywords: OtherQA, Triaged
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:31:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Fabian von Feilitzsch 2020-11-05 18:27:33 UTC
Version: 4.7


Platform: vsphere

#Please specify the platform type: aws, libvirt, openstack or baremetal etc.

Please specify:
* UPI (semi-manual installation on customized infrastructure)

What happened?

This is blocking nightly payloads

Bootstrap node came up, and then the install failed with the error: "level=fatal msg=failed to wait for bootstrapping to complete: timed out waiting for the condition". I was unable to find any logs that indicated why it timed out in any more detail than that.

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-vsphere-upi/1324353285213130752

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-vsphere-upi-serial/1324353271518728192

Comment 1 Joseph Callen 2020-11-05 19:40:56 UTC
Changing the owner, if incorrect please change to the appropriate group.

This change:
https://github.com/openshift/machine-config-operator/pull/2079


[root@bootstrap-0 ~]# crictl logs -f 415
I1105 19:35:32.338275       1 bootstrap.go:40] Version: v4.7.0-202011040512.p0-dirty (b25d87d0e1a79d62a92257609b7734f7b63a4d22)
I1105 19:35:32.374219       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-dns-02-config.yml" [1] manifest because of unhandled *v1.DNS                                                                                                                             
I1105 19:35:32.375825       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-infrastructure-02-config.yml" [1] manifest because of unhandled *v1.Infrastructure                                                                                                       
I1105 19:35:32.380835       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-ingress-02-config.yml" [1] manifest because of unhandled *v1.Ingress                                                                                                                     
I1105 19:35:32.381444       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-network-02-config.yml" [1] manifest because of unhandled *v1.Network                                                                                                                     
I1105 19:35:32.381840       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-proxy-01-config.yaml" [1] manifest because of unhandled *v1.Proxy                                                                                                                        
I1105 19:35:32.385303       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cluster-scheduler-02-config.yml" [1] manifest because of unhandled *v1.Scheduler                                                                                                                 
I1105 19:35:32.392922       1 bootstrap.go:116] skipping "/etc/mcc/bootstrap/cvo-overrides.yaml" [1] manifest because of unhandled *v1.ClusterVersion                                                                                                                         
F1105 19:35:32.409566       1 bootstrap.go:47] error running MCC[BOOTSTRAP]: failed to create MachineConfig for role master: failed to execute template: template: /etc/mcc/templates/common/on-prem/files/NetworkManager-resolv-prepender.yaml:50:16: executing "/etc/mcc/templates/common/on-prem/files/NetworkManager-resolv-prepender.yaml" at <onPremPlatformAPIServerInternalIP .>: error calling onPremPlatformAPIServerInternalIP: runtime error: invalid memory address or nil pointer dereference                                                  
[root@bootstrap-0 ~]#

Comment 2 Patrick Dillon 2020-11-05 22:11:26 UTC
In UPI installs infra.Status.PlatformStatus.VSphere will be nil which is crashing here: https://github.com/openshift/machine-config-operator/blob/master/pkg/controller/template/render.go#L432

What we did previously was not populate the file at all for UPI (when VSphere is nil): https://github.com/openshift/machine-config-operator/pull/2079/files#diff-88b9eeb5fb253707d42dfa34ac2143dda76d949209de64c032a8cd11a0d97c29

Comment 5 jima 2020-11-23 05:51:34 UTC
Verified on upi-on-vsphere with version 4.7.0-0.nightly-2020-11-22-204912, installation is successful, so move bug to VERIFIED.

Comment 8 errata-xmlrpc 2021-02-24 15:31:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 9 Red Hat Bugzilla 2023-09-15 00:50:45 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days