Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
A customer hits the issue when installing a system through Infiniband interface.
On his system, for some reason, the "nm-run.sh" script executes before the Infiniband interface is discovered by the kernel.
At the end of the script, "/tmp/nm.done" is created, causing the online hook for the interface to never execute, hence Stage2 to never be fetched.
See below a partial "set -x" sample output, the boot is done with "ip=ibs1f0:dhcp" ("ibs1f0" being initially seen as "ib0" by the kernel)
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[ 17.943242] localhost.localdomain dracut-initqueue[2290]: ++ '[' -e /tmp/nm.done ']'
[ 17.943399] localhost.localdomain dracut-initqueue[2290]: ++ '[' -z 1 ']'
[ 17.943399] localhost.localdomain dracut-initqueue[2290]: ++ '[' -s /run/NetworkManager/initrd/hostname ']'
[ 17.943399] localhost.localdomain dracut-initqueue[2290]: ++ for _i in /sys/class/net/*
[ 17.943399] localhost.localdomain dracut-initqueue[2290]: ++ '[' -d /sys/class/net/ens8f0 ']'
[ 17.945130] localhost.localdomain dracut-initqueue[2496]: +++ cat /sys/class/net/ens8f0/ifindex
[ 17.945700] localhost.localdomain dracut-initqueue[2290]: ++ state=/run/NetworkManager/devices/2
[ 17.945700] localhost.localdomain dracut-initqueue[2290]: ++ grep -q '^connection-uuid=' /run/NetworkManager/devices/2
[ 17.945700] localhost.localdomain dracut-initqueue[2290]: ++ continue
[ 17.945700] localhost.localdomain dracut-initqueue[2290]: ++ for _i in /sys/class/net/*
[ 17.945700] localhost.localdomain dracut-initqueue[2290]: ++ '[' -d /sys/class/net/ens8f1 ']'
[ 17.947365] localhost.localdomain dracut-initqueue[2498]: +++ cat /sys/class/net/ens8f1/ifindex
[ 17.947940] localhost.localdomain dracut-initqueue[2290]: ++ state=/run/NetworkManager/devices/3
[ 17.947940] localhost.localdomain dracut-initqueue[2290]: ++ grep -q '^connection-uuid=' /run/NetworkManager/devices/3
[ 17.947940] localhost.localdomain dracut-initqueue[2290]: ++ continue
[ 17.947940] localhost.localdomain dracut-initqueue[2290]: ++ for _i in /sys/class/net/*
[ 17.947940] localhost.localdomain dracut-initqueue[2290]: ++ '[' -d /sys/class/net/ens8f2 ']'
[ 17.949659] localhost.localdomain dracut-initqueue[2500]: +++ cat /sys/class/net/ens8f2/ifindex
[ 17.950163] localhost.localdomain dracut-initqueue[2290]: ++ state=/run/NetworkManager/devices/4
[ 17.950163] localhost.localdomain dracut-initqueue[2290]: ++ grep -q '^connection-uuid=' /run/NetworkManager/devices/4
[ 17.950163] localhost.localdomain dracut-initqueue[2290]: ++ continue
[ 17.950163] localhost.localdomain dracut-initqueue[2290]: ++ for _i in /sys/class/net/*
[ 17.950163] localhost.localdomain dracut-initqueue[2290]: ++ '[' -d /sys/class/net/ens8f3 ']'
[ 17.951827] localhost.localdomain dracut-initqueue[2502]: +++ cat /sys/class/net/ens8f3/ifindex
[ 17.952376] localhost.localdomain dracut-initqueue[2290]: ++ state=/run/NetworkManager/devices/5
[ 17.952376] localhost.localdomain dracut-initqueue[2290]: ++ grep -q '^connection-uuid=' /run/NetworkManager/devices/5
[ 17.952376] localhost.localdomain dracut-initqueue[2290]: ++ continue
[ 17.952376] localhost.localdomain dracut-initqueue[2290]: ++ for _i in /sys/class/net/*
[ 17.952376] localhost.localdomain dracut-initqueue[2290]: ++ '[' -d /sys/class/net/lo ']'
[ 17.953457] localhost.localdomain dracut-initqueue[2504]: +++ cat /sys/class/net/lo/ifindex
[ 17.953910] localhost.localdomain dracut-initqueue[2290]: ++ state=/run/NetworkManager/devices/1
[ 17.953910] localhost.localdomain dracut-initqueue[2290]: ++ grep -q '^connection-uuid=' /run/NetworkManager/devices/1
[ 17.953910] localhost.localdomain dracut-initqueue[2290]: ++ continue
[ 17.953910] localhost.localdomain dracut-initqueue[2290]: ++ :
---> HERE ABOVE "ibs1f0" doesn't exist yet
[ 18.217260] localhost.localdomain kernel: mlx5_core 0000:31:00.0 ibs1f0: renamed from ib0
---> Interface is now discovered by the kernel
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
When booting with "rd.debug", the issue doesn't happen because of slowness induced by "rd.debug" (especially writing to the console), causing the interface to be discovered before the script executes.
Version-Release number of selected component (if applicable):
dracut-057-13.git20220816.el9
How reproducible:
Always
Steps to Reproduce: this can be reproduced using a QEMU/KVM and "live-plumbing" of the interface
1. Configure a VM with network interface that *won't be used* (will be "enp1s0" usually)
2. Configure booting directly on kernel/initrd
"Direct Kernel Boot"
kernel: rhel91 DVD kernel
initrd: rhel91 DVD initrd
arguments: console=tty0 console=ttyS0,115200n8 ip=enp5s0:dhcp inst.repo=http://192.168.122.1/rhel91 rd.debug rd.break
3. Boot the system and wait for dracut-initqueue to start
4. Add network interface "enp5s0"
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
<interface type="network">
<mac address="52:54:00:ce:e3:e4"/>
<source network="default" portid="f8966d36-8586-430d-8f57-265a878ddc35" bridge="virbr0"/>
<target dev="vnet10"/>
<model type="virtio"/>
<alias name="net1"/>
<address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
</interface>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Actual results:
dracut-initqueue times out and Stage2 is never downloaded
Expected results:
Stage2 gets downloaded because "online" hook for enp5s0 executes at some point in time
Additional info:
The root cause for the issue is having line 72 unconditionally execute and stop further executions of "for" loop on line 62:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
:
5 if [ -e /tmp/nm.done ]; then
6 return
7 fi
:
62 for _i in /sys/class/net/*; do
63 [ -d "$_i" ] || continue
64 state="/run/NetworkManager/devices/$(cat "$_i"/ifindex)"
65 grep -q '^connection-uuid=' "$state" 2> /dev/null || continue
66 ifname="${_i##*/}"
67 dhcpopts_create "$state" > /tmp/dhclient."$ifname".dhcpopts
68 source_hook initqueue/online "$ifname"
69 /sbin/netroot "$ifname"
70 done
71
72 : > /tmp/nm.done
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Due to this, even if no interface was eligible for "online" hook (lines 68-69), the loop won't be entered anymore.
I believe a fix is to create the "nm.done" file only if "source_hook" could execute, something like this below:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
:
62 for _i in /sys/class/net/*; do
63 [ -d "$_i" ] || continue
64 state="/run/NetworkManager/devices/$(cat "$_i"/ifindex)"
65 grep -q '^connection-uuid=' "$state" 2> /dev/null || continue
66 ifname="${_i##*/}"
67 dhcpopts_create "$state" > /tmp/dhclient."$ifname".dhcpopts
68 source_hook initqueue/online "$ifname"
69 /sbin/netroot "$ifname"
70 : > /tmp/nm.done
71 done
72
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
But I'm not completely sure how this works, especially if it's expected to execute the "online" hook for multiple interfaces, in such case the proposed fix will not work because we cannot be sure if the interface for netroot is already there.
I'm ok with having NM wait on the interface (which it currently doesn't, or only for a few seconds).
However this won't solve the case when booting with "ip=dhcp".
Reassigning this to dracut. This is a dracut regression, pull request for a revert filed here: https://github.com/dracutdevs/dracut/pull/2134
Dracut maintainers, please review & apply as appropriate. Thank you!
Hello,
Even in case there is a single network interface, it's possible that 99-nm-run.sh executes while no interface was enumerated yet, causing it just impossible to install the system.
This can be seen with IB interfaces (mlx5_core) on a customer site.
So now the question is what we can do to workaround this reliably on 9.0 and 9.1?
9.0 has EUS, so it's even more critical than 9.1 (once BZ is release with - hopefully - 9.2).
Renaud.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (dracut bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2023:2547
Description of problem: A customer hits the issue when installing a system through Infiniband interface. On his system, for some reason, the "nm-run.sh" script executes before the Infiniband interface is discovered by the kernel. At the end of the script, "/tmp/nm.done" is created, causing the online hook for the interface to never execute, hence Stage2 to never be fetched. See below a partial "set -x" sample output, the boot is done with "ip=ibs1f0:dhcp" ("ibs1f0" being initially seen as "ib0" by the kernel) -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- [ 17.943242] localhost.localdomain dracut-initqueue[2290]: ++ '[' -e /tmp/nm.done ']' [ 17.943399] localhost.localdomain dracut-initqueue[2290]: ++ '[' -z 1 ']' [ 17.943399] localhost.localdomain dracut-initqueue[2290]: ++ '[' -s /run/NetworkManager/initrd/hostname ']' [ 17.943399] localhost.localdomain dracut-initqueue[2290]: ++ for _i in /sys/class/net/* [ 17.943399] localhost.localdomain dracut-initqueue[2290]: ++ '[' -d /sys/class/net/ens8f0 ']' [ 17.945130] localhost.localdomain dracut-initqueue[2496]: +++ cat /sys/class/net/ens8f0/ifindex [ 17.945700] localhost.localdomain dracut-initqueue[2290]: ++ state=/run/NetworkManager/devices/2 [ 17.945700] localhost.localdomain dracut-initqueue[2290]: ++ grep -q '^connection-uuid=' /run/NetworkManager/devices/2 [ 17.945700] localhost.localdomain dracut-initqueue[2290]: ++ continue [ 17.945700] localhost.localdomain dracut-initqueue[2290]: ++ for _i in /sys/class/net/* [ 17.945700] localhost.localdomain dracut-initqueue[2290]: ++ '[' -d /sys/class/net/ens8f1 ']' [ 17.947365] localhost.localdomain dracut-initqueue[2498]: +++ cat /sys/class/net/ens8f1/ifindex [ 17.947940] localhost.localdomain dracut-initqueue[2290]: ++ state=/run/NetworkManager/devices/3 [ 17.947940] localhost.localdomain dracut-initqueue[2290]: ++ grep -q '^connection-uuid=' /run/NetworkManager/devices/3 [ 17.947940] localhost.localdomain dracut-initqueue[2290]: ++ continue [ 17.947940] localhost.localdomain dracut-initqueue[2290]: ++ for _i in /sys/class/net/* [ 17.947940] localhost.localdomain dracut-initqueue[2290]: ++ '[' -d /sys/class/net/ens8f2 ']' [ 17.949659] localhost.localdomain dracut-initqueue[2500]: +++ cat /sys/class/net/ens8f2/ifindex [ 17.950163] localhost.localdomain dracut-initqueue[2290]: ++ state=/run/NetworkManager/devices/4 [ 17.950163] localhost.localdomain dracut-initqueue[2290]: ++ grep -q '^connection-uuid=' /run/NetworkManager/devices/4 [ 17.950163] localhost.localdomain dracut-initqueue[2290]: ++ continue [ 17.950163] localhost.localdomain dracut-initqueue[2290]: ++ for _i in /sys/class/net/* [ 17.950163] localhost.localdomain dracut-initqueue[2290]: ++ '[' -d /sys/class/net/ens8f3 ']' [ 17.951827] localhost.localdomain dracut-initqueue[2502]: +++ cat /sys/class/net/ens8f3/ifindex [ 17.952376] localhost.localdomain dracut-initqueue[2290]: ++ state=/run/NetworkManager/devices/5 [ 17.952376] localhost.localdomain dracut-initqueue[2290]: ++ grep -q '^connection-uuid=' /run/NetworkManager/devices/5 [ 17.952376] localhost.localdomain dracut-initqueue[2290]: ++ continue [ 17.952376] localhost.localdomain dracut-initqueue[2290]: ++ for _i in /sys/class/net/* [ 17.952376] localhost.localdomain dracut-initqueue[2290]: ++ '[' -d /sys/class/net/lo ']' [ 17.953457] localhost.localdomain dracut-initqueue[2504]: +++ cat /sys/class/net/lo/ifindex [ 17.953910] localhost.localdomain dracut-initqueue[2290]: ++ state=/run/NetworkManager/devices/1 [ 17.953910] localhost.localdomain dracut-initqueue[2290]: ++ grep -q '^connection-uuid=' /run/NetworkManager/devices/1 [ 17.953910] localhost.localdomain dracut-initqueue[2290]: ++ continue [ 17.953910] localhost.localdomain dracut-initqueue[2290]: ++ : ---> HERE ABOVE "ibs1f0" doesn't exist yet [ 18.217260] localhost.localdomain kernel: mlx5_core 0000:31:00.0 ibs1f0: renamed from ib0 ---> Interface is now discovered by the kernel -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- When booting with "rd.debug", the issue doesn't happen because of slowness induced by "rd.debug" (especially writing to the console), causing the interface to be discovered before the script executes. Version-Release number of selected component (if applicable): dracut-057-13.git20220816.el9 How reproducible: Always Steps to Reproduce: this can be reproduced using a QEMU/KVM and "live-plumbing" of the interface 1. Configure a VM with network interface that *won't be used* (will be "enp1s0" usually) 2. Configure booting directly on kernel/initrd "Direct Kernel Boot" kernel: rhel91 DVD kernel initrd: rhel91 DVD initrd arguments: console=tty0 console=ttyS0,115200n8 ip=enp5s0:dhcp inst.repo=http://192.168.122.1/rhel91 rd.debug rd.break 3. Boot the system and wait for dracut-initqueue to start 4. Add network interface "enp5s0" -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- <interface type="network"> <mac address="52:54:00:ce:e3:e4"/> <source network="default" portid="f8966d36-8586-430d-8f57-265a878ddc35" bridge="virbr0"/> <target dev="vnet10"/> <model type="virtio"/> <alias name="net1"/> <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/> </interface> -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Actual results: dracut-initqueue times out and Stage2 is never downloaded Expected results: Stage2 gets downloaded because "online" hook for enp5s0 executes at some point in time Additional info: The root cause for the issue is having line 72 unconditionally execute and stop further executions of "for" loop on line 62: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- : 5 if [ -e /tmp/nm.done ]; then 6 return 7 fi : 62 for _i in /sys/class/net/*; do 63 [ -d "$_i" ] || continue 64 state="/run/NetworkManager/devices/$(cat "$_i"/ifindex)" 65 grep -q '^connection-uuid=' "$state" 2> /dev/null || continue 66 ifname="${_i##*/}" 67 dhcpopts_create "$state" > /tmp/dhclient."$ifname".dhcpopts 68 source_hook initqueue/online "$ifname" 69 /sbin/netroot "$ifname" 70 done 71 72 : > /tmp/nm.done -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- Due to this, even if no interface was eligible for "online" hook (lines 68-69), the loop won't be entered anymore. I believe a fix is to create the "nm.done" file only if "source_hook" could execute, something like this below: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- : 62 for _i in /sys/class/net/*; do 63 [ -d "$_i" ] || continue 64 state="/run/NetworkManager/devices/$(cat "$_i"/ifindex)" 65 grep -q '^connection-uuid=' "$state" 2> /dev/null || continue 66 ifname="${_i##*/}" 67 dhcpopts_create "$state" > /tmp/dhclient."$ifname".dhcpopts 68 source_hook initqueue/online "$ifname" 69 /sbin/netroot "$ifname" 70 : > /tmp/nm.done 71 done 72 -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- But I'm not completely sure how this works, especially if it's expected to execute the "online" hook for multiple interfaces, in such case the proposed fix will not work because we cannot be sure if the interface for netroot is already there.