Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Can we please get the following:
Machine firmware.
NICs, models, and firmware versions.
What is the network layout.
What sort of bond was configured, and between which NICs.
in comment 11, there is no route to 10.x.x.x/8.
It would be NetworkManager's job to configure that.
Is it possible to get debug logs of NetworkManager (and just the entire boot)?
AFAIS, there are none present. I think you get debug logs by setting "rd.debug" on the kernel command line.
(In reply to Thomas Haller from comment #32)
> in comment 11, there is no route to 10.x.x.x/8.
> It would be NetworkManager's job to configure that.
>
> Is it possible to get debug logs of NetworkManager (and just the entire
> boot)?
> AFAIS, there are none present. I think you get debug logs by setting
> "rd.debug" on the kernel command line.
we have entire boot logs with "systemd.log_level=debug systemd.journald.forward_to_console=1 inst.loglevel=debug", please see comment 12
Let us know if those logs are enough. If you still need the debug logs by setting "rd.debug" on the kernel command line, please specify which kind of test you want, forcing to go to dracut emergency shell or letting it boot from disk and getting the loop trying to fetch the ignition file.
Comment 44Beniamino Galvani
2022-05-25 10:09:05 UTC
Hi, after discussing with Thomas, we think that the problem might be related to the long time that interfaces take to get carrier after they are added to the bond. NetworkManager has a built-in timeout of 6 seconds that doesn't seem enough in this case. Therefore, it quits too early without activating the VLAN.
A solution to that could be to add argument "rd.net.timeout.carrier=60" to the kernel command line, that increases the carrier timeout to 60 seconds.
Ignacio, would it be possible to try again with the new argument?
Good news. First test looks promising. The installation continues and the node is able to reach the cluster
I'll upload the log so you can check for any differences of how much time the system really needed for the carrier.
Now the question would be if the default carrier timeout in Networkmanager should be revisited and increased. What do you think?
Comment 48Beniamino Galvani
2022-05-27 14:41:35 UTC
In new logs, I see that ens2f0 needs 6.53 seconds to get carrier after it's added to the bond:
[ 36.812315] bond0: (slave ens2f0): Enslaving as a backup interface with a down link
[ 43.346666] ixgbe 0000:05:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
So it's just half second more than NM's timeout (6 seconds).
> Now the question would be if the default carrier timeout in Networkmanager should be revisited and increased. What do you think?
You are correct, it would be wiser to increase the default timeout in initrd. The old dracut network module waited for 10 seconds. I submitted a patch to increase it to 15 seconds in NM:
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1239
ok I'm moving this BZ to NetworkManager. IIUC they have a workaround for now and, regardless of whether NM changes the default timeout in RHEL8 or not, RHEL9 is starting NetworkManager via systemd in the initrd (IIUC), which means that the bond will eventually come up on RHEL9 and Ignition will be able to fetch the config.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2022:7680