Bug 1991928
Summary: | Installation with multiple NIC failed on OCP 4.9 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Muhammad Adeel (IBM) <madeel> | ||||
Component: | NetworkManager | Assignee: | NetworkManager Development Team <nm-team> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Desktop QE <desktop-qa-list> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 8.4 | CC: | anbhat, atragler, bfournie, bgalvani, christian.lapolt, danili, dslavens, lrintel, mtarsel, rkhan, sukulkar, thaller, till, wolfgang.voesch | ||||
Target Milestone: | beta | Keywords: | Triaged | ||||
Target Release: | 8.4 | ||||||
Hardware: | s390x | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-09-07 08:53:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Muhammad Adeel (IBM)
2021-08-10 12:12:35 UTC
A similar problem was observed in BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1974411 Muhammad will add additional details on the 2 problems, there is a workaround for problem #2, then we can decide whether this bug is related to the networking team. Also setting "reviewed-in-sprint" flag as this bug is still in evaluation and is unlikely to be resolved before the end of the sprint (August 14th). There are two problems associated to this BZ: 1. Sometimes even with the single NIC, the network doesn't come up and CoreOS rootfs can't be fetched from the http server. The logs on the console shows that CoreOS is always retrying to fetch the rootfs but it never finishes. 2. Add an additional NIC to the node by using ip= in the param file: ip=10.13.114.2::10.13.114.1:255.255.255.0::enc1000:none ip=10.100.214.2::10.100.214.1:255.255.255.0::enP513s129:none here, enc1000 is the primary network interface where DNS server exists. enP513s129 is an additional network which has no DNS server. On some machine the Network Manager(NM) picks up enc1000 and selects gateway ip 10.13.114.1 as the default route as mentioned in the NM logs: policy: set 'enc1000' (enc1000) as default for IPv4 routing and DNS In this case the Cluster installation is successful because it has setup the correct default route. However, on other machine the NM selects enP513s129 as primary interface and sets 10.100.214.1 as default route. In this case the Cluster installation failed, which is obvious because there is no DNS on that route. A workaround in this case is to remove the gateway ip from the ip= param which ended in only one 10.13.114.1 default route. I think we need to understand two things here: a. Why NIC probe order is changing between machines and in particular which rule NM is dependent upon? b. How do we setup our default route to be the correct one? Re-assigning to Networking team to further evaluate this bug, as Muhammad has provided information and behavior in Comment 3. Please feel free to evaluate the "Blocker?" status as the team sees fit. Please also feel free to assign to the correct sub-component as I only took a guess based on Multi-NIC's relations with NMState Operator This doesn't appear to have any involvement from OpenShift networking. It's purely NetworkManager/Dracut behavior. Sending to the NetworkManager team for their input. I can say they will most likely ask for trace logs from NetworkManager. In this case, you should be able to enable trace logging by passing a machine-config manifest to the installer that creates a file in /etc/NetworkManager/conf.d with the content: [logging] level=TRACE (In reply to Ben Nemec from comment #6) > I can say they will most likely ask for trace logs from NetworkManager. In > this case, you should be able to enable trace logging by passing a > machine-config manifest to the installer that creates a file in > /etc/NetworkManager/conf.d with the content: > > [logging] > level=TRACE Yes, please. You enable debug logging during boot by setting `rd.debug` (from `man dracut.cmdline`). Then provide the complete logs. Thank you. I was able to identify the root cause of the problem and it was due to multiple default routes. We will update the installation part of the document so that it reflects the correct multiple NIC configuration. Thank you. I will close this one and open a corresponding document BZ. Thomas is there any AI required from your side? the corresponding document BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2000583 sorry, I didn't understand your comment 9 and comment 10. Am I reading this correctly, that you think there is no bug here? If yes, then we can indeed just close it... Yes, there is no bug. thanks. Closing due to comment 13. If something is missing, please comment or reopen. Thank you!! |