Bug 2029438
| Summary: | Bootstrap node cannot resolve api-int because NetworkManager replaces resolv.conf | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jim Ramsay <jramsay> | ||||
| Component: | Installer | Assignee: | Ben Nemec <bnemec> | ||||
| Installer sub component: | openshift-installer | QA Contact: | jima | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | medium | CC: | bgalvani, bnemec, jima, otuchfel, padillon, sasha, vpickard, wsun, yboaron | ||||
| Version: | 4.9 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.11.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | Telco; Telco:RAN | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Cause: vsphere rhcos image has no /etc/resolv.conf
Consequence: default networkmanager settings cause attempts to access /etc/resolv.conf and throw an error when not found
Fix: set rc-manager=unmanaged
Result: networkmanager does not attempt to access /etc/resolv.conf
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 2083335 (view as bug list) | Environment: | |||||
| Last Closed: | 2022-08-10 10:40:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2083335 | ||||||
| Attachments: |
|
||||||
|
Description
Jim Ramsay
2021-12-06 14:12:03 UTC
Created attachment 1844924 [details]
bootkube.service.log
Can you please make the following modifications to the bug description: - Remove references to "nmstate" - nmstate is not being used here, it's just raw nmconnection files generated by the assisted service (using nmstate, but that's beside the point), in your case it's generated to and this is what matters: [connection] id=eno1 uuid=60a1b8f8-d3de-44cc-a09e-72fd1e76c9c6 type=ethernet interface-name=eno1 permissions= autoconnect=true autoconnect-priority=1 [ethernet] mac-address-blacklist= [ipv4] address1=10.8.34.12/24 dhcp-client-id=mac dns=10.11.5.19; dns-priority=40 dns-search=telco5gran.eng.rdu2.redhat.com; method=manual route1=0.0.0.0/0,10.8.34.254 route1_options=table=254 [ipv6] addr-gen-mode=eui64 dhcp-duid=ll dhcp-iaid=mac dns-search= method=disabled [proxy] - Remove the .interfaces stanza from the yaml under "and with nmstate something like the following:", it's assisted-installer specific and is not relevant to the problem. Only the content under ".config" is the actual nmstate config. And even then, please just specify that the nmconnection file above is simply generated with `nmstate gc <config>` and nmstate is not running on the node - Replace the workaround "I have a workaround: If I manually add "127.0.0.1" to the dns-resolver section of my nmstate, the install succeeds." with this workaround "sudo nmcli device disconnect eno1; wait ; sudo nmcli device connect eno1" - it shows that simply doing a meaningless action on interfaces will trigger the dispatcher script which works as intended. *** Bug 2033550 has been marked as a duplicate of this bug. *** This issue is not unique to baremetal. See https://bugzilla.redhat.com/show_bug.cgi?id=2033550 where the same issue is happening with vSphere. *** Bug 2027836 has been marked as a duplicate of this bug. *** The issue happened several times against 4.10 recently on QE CI and manual installation. Is there any plan to fix the issue on 4.10? Once this happens, the cluster could not be set up successfully. Per #comment 13, update the severity to high. We are researching who the correct assignee for this bz is. upi-on-vsphere installation failed at bootstrap stage when using nightly build 4.11.0-0.nightly-2022-04-24-085400 (containing the fix) or later payload, it is succeeded against 4.11.0-0.nightly-2022-04-23-153426. Checked on bootstrap instance, /etc/resolv.conf was not generated. [root@bootstrap-0 ~]# ls -ltr /etc/resolv.conf ls: cannot access '/etc/resolv.conf': No such file or directory And see rc-manager is configured as unmanaged. [root@bootstrap-0 ~]# ls -ltr /etc/NetworkManager/conf.d/99-vsphere.conf -rw-------. 1 root root 28 Apr 25 03:04 /etc/NetworkManager/conf.d/99-vsphere.conf [root@bootstrap-0 ~]# cat /etc/NetworkManager/conf.d/99-vsphere.conf [main] rc-manager=unmanaged The UPI bug was fixed by https://github.com/openshift/installer/pull/5842 . This should be ready for testing again. The issue of vsphere upi installation in comment 21 has been fixed in https://github.com/openshift/installer/pull/5842, and verified passed, upi installation is successful without any error. The original issue described in this bug on ipi-on-vsphere also happens sometimes on QE CI(1-2 time per week), after PR installer#5482 is merged, I monitor QE CI for two weeks, and don't hit such issue in CI and manual installation any more. Issue should be fixed, move bug to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |