Bug 1966862
Summary: | vsphere IPI - local dns prepender is not prepending nameserver 127.0.0.1 | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Oscar Alias <oaliasbo> | ||||
Component: | Installer | Assignee: | Oscar Alias <oaliasbo> | ||||
Installer sub component: | openshift-installer | QA Contact: | jima | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | medium | ||||||
Priority: | unspecified | CC: | mabajodu, mstaeble, vmedina | ||||
Version: | 4.7 | ||||||
Target Milestone: | --- | ||||||
Target Release: | 4.8.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: The bootstrap machine when installing to vSphere may not get its /etc/resolv.conf updated to include 127.0.0.1 as a nameserver.
Consequence: The bootstrap machine is unable to access the temporary control plane that it creates. This results in a failed installation.
Fix: Adjust the 30-local-dns-prepender NetworkManager dispatcher so that the sed command more reliably finds the line after which to add the nameserver line.
Result: The bootstrap machine is able to access its temporary control plane, and the installation succeeds.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-07-27 23:11:00 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1967355 | ||||||
Attachments: |
|
Description
Oscar Alias
2021-06-02 05:19:39 UTC
I submitted the following PR https://github.com/openshift/installer/pull/4973 I am promoting the removal of the pattern '.*$' to prevent expansion. At some point after the timeout of the bootkube.sh, the localhost appears in /etc/resolv.conf. But it is added too late in the process, therefore the bootstrap node is not removed and the log shows that the bootstrap failed to complete. This does not prevent the master and worker nodes to be created successfully, but manual intervention is required to complete the installation as the kube-apiserver operator gets stuck. Install vsphere ipi with nightly build 4.8.0-0.nightly-2021-06-03-022152 on VMC, and unable to reproduce the issue, localhost 127.0.0.1 was added into /etc/resolv.conf well on bootstrap server, even if I removed this entry from /etc/resolv.conf and restarted NetworkManager service. Also tried on CentOS 8 VM as in Description, not reproduced the issue, "nameserver 127.0.0.1" was added in /etc/resolv.conf after restarting NetworkManager. @oaliasbo, may I know if there is any other special for your cluster nodes? Anyway, I verified on 4.8.0-0.nightly-2021-06-03-055145 with fix which remove the pattern ".*$", file /etc/resolv.conf on bootstrap server has correct content with localhost entry, bootstrap server was destroyed when bootstrap completed. Hi @jima, I did my tests with 4.7.7 and 4.7.9. The nodes in vSphere are using pfSense to provide DHCP and DNS services. I also observed that on some occasions, the bootstrap ended with the "nameserver 127.0.0.1", but it was still up at the end of the process and a manual fix was required to adjust etcd and remove the node. The test with CentOS VM and the dispatcher was done executing a service restart of NetworkManager and a reboot of the VM. With your results different from mine, I think that the DHCP service could be causing this. I am looking into that. As you mention, the 4.8.0-0.nightly-2021-06-03-055145 removed the pattern ".*$". And I confirm that under the same conditions as before, the bootstrap server is now destroyed properly with no additional errors in the installation log. Hi @jima, I did my tests with 4.7.7 and 4.7.9. The nodes in vSphere are using pfSense to provide DHCP and DNS services. I also observed that on some occasions, the bootstrap ended with the "nameserver 127.0.0.1", but it was still up at the end of the process and a manual fix was required to adjust etcd and remove the node. The test with CentOS VM and the dispatcher was done executing a service restart of NetworkManager and a reboot of the VM. With your results different from mine, I think that the DHCP service could have caused it. As you mention, the 4.8.0-0.nightly-2021-06-03-055145 removed the pattern ".*$". And I confirm that under the same conditions as before, the bootstrap server is now destroyed properly with no additional errors in the installation log. @ @oscar, thanks for your explanation, according to your test result, issue should be fixed in your case. I move bug to VERIFIED. *** Bug 1916890 has been marked as a duplicate of this bug. *** *** Bug 1944196 has been marked as a duplicate of this bug. *** *** Bug 1916890 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |