Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2079277

Summary: [UPI][Baremetal] RCHOS is not able to configure network interfaces to reach ignition file
Product: Red Hat Enterprise Linux 8 Reporter: Ignacio <igarciam>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Vladimir Benes <vbenes>
Severity: high Docs Contact:
Priority: high    
Version: 8.4CC: acabral, agabriel, augol, bgalvani, dornelas, dustymabe, ealcaniz, eglottma, fcristin, jlebon, jligon, lrintel, lucab, mrussell, nstielau, openshift-bugs-escalate, pibanezr, rkhan, sfaye, sukulkar, till, vbenes, wking
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: NetworkManager-1.39.10-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-08 10:10:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Luca BRUNO 2022-04-27 10:29:26 UTC
Thanks for the report and the attached logs.
From the emergency shell, can you please check what `ls -la /dev/disk/by-label/` shows?

Comment 21 Amit Ugol 2022-05-11 07:08:00 UTC
Can we please get the following:
Machine firmware.
NICs, models, and firmware versions.
What is the network layout.
What sort of bond was configured, and between which NICs.

Comment 32 Thomas Haller 2022-05-12 16:08:57 UTC
in comment 11, there is no route to 10.x.x.x/8.
It would be NetworkManager's job to configure that.

Is it possible to get debug logs of NetworkManager (and just the entire boot)?
AFAIS, there are none present. I think you get debug logs by setting "rd.debug" on the kernel command line.

Comment 35 Ignacio 2022-05-13 15:14:15 UTC
(In reply to Thomas Haller from comment #32)
> in comment 11, there is no route to 10.x.x.x/8.
> It would be NetworkManager's job to configure that.
> 
> Is it possible to get debug logs of NetworkManager (and just the entire
> boot)?
> AFAIS, there are none present. I think you get debug logs by setting
> "rd.debug" on the kernel command line.

we have entire boot logs with "systemd.log_level=debug systemd.journald.forward_to_console=1 inst.loglevel=debug", please see comment 12

Let us know if those logs are enough. If you still need the debug logs by setting "rd.debug" on the kernel command line, please specify which kind of test you want, forcing to go to dracut emergency shell or letting it boot from disk and getting the loop trying to fetch the ignition file.

Comment 36 Dusty Mabe 2022-05-13 19:26:17 UTC
Those options set systemd into debug logging. For NetworkManager, IIUC, rd.debug is needed to set NetworkManager into TRACE logging.

See https://github.com/dracutdevs/dracut/blob/9bef71094eba84a9eac161fc45386ccd73bd2b34/modules.d/35network-manager/nm-config.sh#L9-L18

Comment 37 Dusty Mabe 2022-05-13 20:33:18 UTC
Forgot to answer the last part.. Probably the scenario where the system gets in a loop trying to fetch the Ignition config.

Comment 38 Dusty Mabe 2022-05-17 17:21:32 UTC
Any updates here?

Comment 44 Beniamino Galvani 2022-05-25 10:09:05 UTC
Hi, after discussing with Thomas, we think that the problem might be related to the long time that interfaces take to get carrier after they are added to the bond. NetworkManager has a built-in timeout of 6 seconds that doesn't seem enough in this case. Therefore, it quits too early without activating the VLAN.

A solution to that could be to add argument "rd.net.timeout.carrier=60" to the kernel command line, that increases the carrier timeout to 60 seconds.

Ignacio, would it be possible to try again with the new argument?

Comment 45 Ignacio 2022-05-25 10:56:01 UTC
Sure, I will let know you the output. It may take some days because they have some days off this week.

Comment 46 Ignacio 2022-05-26 08:02:17 UTC
Good news. First test looks promising. The installation continues and the node is able to reach the cluster
I'll upload the log so you can check for any differences of how much time the system really needed for the carrier.

Now the question would be if the default carrier timeout in Networkmanager should be revisited and increased. What do you think?

Comment 48 Beniamino Galvani 2022-05-27 14:41:35 UTC
In new logs, I see that ens2f0 needs 6.53 seconds to get carrier after it's added to the bond:

  [   36.812315] bond0: (slave ens2f0): Enslaving as a backup interface with a down link
  [   43.346666] ixgbe 0000:05:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX

So it's just half second more than NM's timeout (6 seconds).

>  Now the question would be if the default carrier timeout in Networkmanager should be revisited and increased. What do you think?

You are correct, it would be wiser to increase the default timeout in initrd. The old dracut network module waited for 10 seconds. I submitted a patch to increase it to 15 seconds in NM:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1239

Comment 49 Dusty Mabe 2022-06-03 14:31:42 UTC
ok I'm moving this BZ to NetworkManager. IIUC they have a workaround for now and, regardless of whether NM changes the default timeout in RHEL8 or not, RHEL9 is starting NetworkManager via systemd in the initrd (IIUC), which means that the bond will eventually come up on RHEL9 and Ignition will be able to fetch the config.

Comment 56 Vladimir Benes 2022-08-02 10:30:37 UTC
timeout is set to Dracut's original 10s.

Comment 59 errata-xmlrpc 2022-11-08 10:10:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7680