RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1787620 - ip=dhcp6,dhcp does not work on network without ipv6
Summary: ip=dhcp6,dhcp does not work on network without ipv6
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: dracut
Version: 8.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.0
Assignee: Lukáš Nykrýn
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
Depends On:
Blocks: 1771572
TreeView+ depends on / blocked
 
Reported: 2020-01-03 17:09 UTC by Steven Hardy
Modified: 2023-02-12 22:19 UTC (History)
16 users (show)

Fixed In Version: dracut-049-63.git20200114.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-28 16:06:04 UTC
Type: Bug
Target Upstream Version:
Embargoed:
harald: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-31421 0 None None None 2023-02-12 22:19:20 UTC
Red Hat Product Errata RHBA-2020:1760 0 None None None 2020-04-28 16:06:31 UTC

Description Steven Hardy 2020-01-03 17:09:40 UTC
Description of problem:

When trying to deploy a RHCOS machine for OCP4 usage, we have the requirement to operate in a single-stack ipv6 environment.

The machines have (at least) two nics, and at the point where dracut brings the network up we only expect one to become active, with a dhcp lease served via dhcpv6.

This is problematic, because passing ip=dhcp6 gets a lease on the expected interface, but then blocks waiting for the second nic to come up (which expected to be inactive at this time, no DHCP on this network)

From the manpage it sounds like ip=any should wait for any (but not all) interface to become active, but this doesn't seem to enable ipv6 (so we get blocked on dhclient waiting forever for an ipv4 lease.

The only way to get the desired behavior appears to be to specify a specific interface e.g ip=ens3:dhcp6 - then we get a v6 lease on ens3 and the inactive nic (ens4 in this case) doesn't prevent further progress.

Specifying the nic name explicitly probably isn't a workable solution for our use-case, since we are using a common OS image for clusters with potentially heterogeneous hardware, and can't currently specify the kernel cmdline on a per-node basis.

Version-Release number of selected component (if applicable):

Testing a RHEL8 based RHCOS build (rhcos-43.81.201912131630.0-qemu.x86_64.qcow2) - not currently certain what dracut version that contains.

How reproducible:

Always

Steps to Reproduce:
1. Deploy a VM (or baremetal) with two nics, and two virtual networks, enable DHCPv6 on only one of them
2. Boot VM with an RHCOS (or I guess normal RHEL) image
3. Observe that neither ip=dhcp6 or ip=any enables boot past the dracut dhclient stage

Actual results:

No way to boot OS in a single-stack ipv6 environment except explicit nic configuration

Expected results:

I expected ip=any to try dhcp and dhcp6 and for dracut to continue provided any interface came up.

Comment 1 Colin Walters 2020-01-03 19:06:46 UTC
To xref, from previous discussion we want to move to NM in the initrd.  I pushed up
https://github.com/coreos/fedora-coreos-config/pull/259
as a starting point for this switch for FCOS, which we could then pick up in RHCOS.

Comment 2 Steven Hardy 2020-01-06 15:55:19 UTC
(In reply to Colin Walters from comment #1)
> To xref, from previous discussion we want to move to NM in the initrd.  I
> pushed up
> https://github.com/coreos/fedora-coreos-config/pull/259
> as a starting point for this switch for FCOS, which we could then pick up in
> RHCOS.

We'll need to fix this for 4.3 and not just master RHCOS, I'm guessing such a change won't be a backport candidate, so we still need a fix or workaround when using the legacy network plugin?

Comment 3 Colin Walters 2020-01-06 16:32:59 UTC
> I'm guessing such a change won't be a backport candidate, so we still need a fix or workaround when using the legacy network plugin?

Yeah, I'd agree with that.

One procedural note on this: https://gitlab.cee.redhat.com/coreos/redhat-coreos/#overridingusing-specific-package-versions

(We *can* fork dracut if need be into the OpenShift channel before releasing in RHEL but we've gotten strong pushback against that)

Comment 4 Micah Abbott 2020-01-07 17:04:39 UTC
I had made a one-off RHCOS build that hacked together support for DHCPv4/DHCPv6 in dracut.  Some of the details are here:

https://issues.redhat.com/browse/GRPA-1327

The dracut hack was an introduction of a new `ip=` option called `both_dhcp` that did DHCPv4 and DHCPv6 before making the interface active.

https://github.com/miabbott/dracut/commit/6887e9b4c2c02dbab3304e23b8ecc5d0f9094503


Though maybe the `any` option could be changed to something like:

```
diff --git a/modules.d/35network-legacy/ifup.sh b/modules.d/35network-legacy/ifup.sh
index 5331c461..9971d422 100755
--- a/modules.d/35network-legacy/ifup.sh
+++ b/modules.d/35network-legacy/ifup.sh
@@ -421,7 +421,10 @@ for p in $(getargs ip=); do
 
     for autoopt in $(str_replace "$autoconf" "," " "); do
         case $autoopt in
-            dhcp|on|any)
+            any)
+               load_ipv6
+               do_dhcp -4 || do_dhcp -6 ;;
+            dhcp|on)
                 do_dhcp -4 ;;
             dhcp6)
                 load_ipv6
```

Comment 5 Russell Bryant 2020-01-08 02:54:14 UTC
(In reply to Micah Abbott from comment #4)
> I had made a one-off RHCOS build that hacked together support for
> DHCPv4/DHCPv6 in dracut.  Some of the details are here:
> 
> https://issues.redhat.com/browse/GRPA-1327
> 
> The dracut hack was an introduction of a new `ip=` option called `both_dhcp`
> that did DHCPv4 and DHCPv6 before making the interface active.
> 
> https://github.com/miabbott/dracut/commit/
> 6887e9b4c2c02dbab3304e23b8ecc5d0f9094503
> 
> 
> Though maybe the `any` option could be changed to something like:
> 
> ```
> diff --git a/modules.d/35network-legacy/ifup.sh
> b/modules.d/35network-legacy/ifup.sh
> index 5331c461..9971d422 100755
> --- a/modules.d/35network-legacy/ifup.sh
> +++ b/modules.d/35network-legacy/ifup.sh
> @@ -421,7 +421,10 @@ for p in $(getargs ip=); do
>  
>      for autoopt in $(str_replace "$autoconf" "," " "); do
>          case $autoopt in
> -            dhcp|on|any)
> +            any)
> +               load_ipv6
> +               do_dhcp -4 || do_dhcp -6 ;;

I think the ideal behavior would be to always do both, but only consider it a failure if both fail.

Comment 6 Lukáš Nykrýn 2020-01-08 07:58:05 UTC
(In reply to Micah Abbott from comment #4)
> I had made a one-off RHCOS build that hacked together support for
> DHCPv4/DHCPv6 in dracut.  Some of the details are here:
> 
> https://issues.redhat.com/browse/GRPA-1327
> 
> The dracut hack was an introduction of a new `ip=` option called `both_dhcp`
> that did DHCPv4 and DHCPv6 before making the interface active.
> 
> https://github.com/miabbott/dracut/commit/
> 6887e9b4c2c02dbab3304e23b8ecc5d0f9094503
> 
> 
> Though maybe the `any` option could be changed to something like:
> 
> ```
> diff --git a/modules.d/35network-legacy/ifup.sh
> b/modules.d/35network-legacy/ifup.sh
> index 5331c461..9971d422 100755
> --- a/modules.d/35network-legacy/ifup.sh
> +++ b/modules.d/35network-legacy/ifup.sh
> @@ -421,7 +421,10 @@ for p in $(getargs ip=); do
>  
>      for autoopt in $(str_replace "$autoconf" "," " "); do
>          case $autoopt in
> -            dhcp|on|any)
> +            any)
> +               load_ipv6
> +               do_dhcp -4 || do_dhcp -6 ;;
> +            dhcp|on)
>                  do_dhcp -4 ;;
>              dhcp6)
>                  load_ipv6
> ```

I don't like the idea of redefining what "any" does. I think it would be better to make ip=dhcp,dhcp6 or ip=dhcp ip=dhcp6 work.

Comment 7 Lukáš Nykrýn 2020-01-08 10:17:28 UTC
Hmm after reading the bug properly I am a bit confused. Originally I thought that the problem is that we have one interface and either has dhcpv4 or dhcpv6 there. And yeah The current code is broken in such case, since ip=dhcp,dhcp6 always needs v6 to succeed.

But the comment 0 is about two interfaces where one has dhcp v6 and other should be ignored, which is a completely different issue.

Comment 8 Steven Hardy 2020-01-08 11:12:36 UTC
(In reply to Lukáš Nykrýn from comment #7)
> Hmm after reading the bug properly I am a bit confused. Originally I thought
> that the problem is that we have one interface and either has dhcpv4 or
> dhcpv6 there. And yeah The current code is broken in such case, since
> ip=dhcp,dhcp6 always needs v6 to succeed.
> 
> But the comment 0 is about two interfaces where one has dhcp v6 and other
> should be ignored, which is a completely different issue.

It's basically a variation of the same issue, we have one nic which will always get a DHCP lease, either ipv4 or ipv6 depending on the environment.  This is the nic we need dracut to bring up in order for the deployment to succeed.

However in typical baremetal deployments, the nodes will have additional nics, and sometimes those will not have any external DHCP when dracut runs, for specific networks we know they will be statically configured later after the OS has booted.

We need some way for dracut to bring up just the one nic that does get a DHCP lease, and not wait for all the additional nics (which IIUC is the default behavior with NetworkManager, but we need a solution for OCP 4.3 which is using the legacy plugin).

Currently the only way to do that AFAICS is to specify the nic explicitly e.g ip=ens3,dhcp6 - this won't work in OCP customer environments because we expect to share a single OS image with potentially more than one type of hardware, we can't hard-code the nic name in the image.

We need some way to say get a DHCP lease (including in single-stack ipv6 cases) on *any* interface then continue, instead of blocking for all of them to get a lease (which I think is the "any" behavior?)

I think we have two options, either we make "any" work with ipv6, or we add a new option which indicates we should enable dhcp6 but succeed when any interface gets a lease (any6?)

Comment 9 Lukáš Nykrýn 2020-01-09 09:53:59 UTC
Can someone try this again, with following patches:
https://github.com/lnykryn/dracut/commit/0067f7b9c4dffa930e15af8e36982a2cd2ec5a17
https://github.com/dracutdevs/dracut/pull/704/commits/e306a5f900f1b93f6b743ac042fb81c06dd461c3

and with "rd.neednet=1 ip=dhcp,dhcp6 rd.net.timeout.dhcp=3 rd.net.timeout.ipv6dad=3" on kernel cmdline

Comment 11 Michael Nguyen 2020-01-09 22:13:09 UTC
(In reply to Lukáš Nykrýn from comment #9)
> Can someone try this again, with following patches:
> https://github.com/lnykryn/dracut/commit/
> 0067f7b9c4dffa930e15af8e36982a2cd2ec5a17
> https://github.com/dracutdevs/dracut/pull/704/commits/
> e306a5f900f1b93f6b743ac042fb81c06dd461c3
> 
> and with "rd.neednet=1 ip=dhcp,dhcp6 rd.net.timeout.dhcp=3
> rd.net.timeout.ipv6dad=3" on kernel cmdline

I created a custom RHCOS build with the dracut patches.  I was able to successfully boot a RHCOS VM with two NICs and two virtual networks (one with dhcpv6 enabled) with those kernel cmdline arguments.

The NetworkManager-wait-online service failed but succeeded after I restarted it.

Comment 13 Steve Milner 2020-01-13 15:37:26 UTC
Michael, Lukáš,

Any idea why NetworkManager-wait-online fails?

Comment 16 Lukáš Nykrýn 2020-01-14 12:18:56 UTC
(In reply to Steve Milner from comment #13)
> Michael, Lukáš,
> 
> Any idea why NetworkManager-wait-online fails?

No idea, but it should be a separate issue. Perhaps just file a bug for NM.

Comment 17 Steve Milner 2020-01-14 14:52:23 UTC
Michael,

Did the service logs give any information?

Comment 18 Steven Hardy 2020-01-14 16:29:19 UTC
> I created a custom RHCOS build with the dracut patches.

Can anyone either point me to the process for doing this, or provide a version of rhcos-43.81.201912131630.0-openstack.x86_64.qcow2.gz and rhcos-43.81.201912131630.0-qemu.x86_64.qcow2.gz which I can use to test in my environment please?

Comment 22 Steven Hardy 2020-01-20 16:45:38 UTC
(In reply to Lukáš Nykrýn from comment #9)
> Can someone try this again, with following patches:
> https://github.com/lnykryn/dracut/commit/
> 0067f7b9c4dffa930e15af8e36982a2cd2ec5a17
> https://github.com/dracutdevs/dracut/pull/704/commits/
> e306a5f900f1b93f6b743ac042fb81c06dd461c3
> 
> and with "rd.neednet=1 ip=dhcp,dhcp6 rd.net.timeout.dhcp=3
> rd.net.timeout.ipv6dad=3" on kernel cmdline

I tested this using images from @mnguyen (thanks!) and can confirm that it works.

While I can understand not wanting to change the default behavior of dracut, I wonder if this is something we should consider changing in the RHCOS image, e.g apply the kernel CLI so this will "just work" in both ipv4 and ipv6 environments?

Comment 24 Colin Walters 2020-01-21 15:59:17 UTC
> I wonder if this is something we should consider changing in the RHCOS image, e.g apply the kernel CLI so this will "just work" in both ipv4 and ipv6 environments?

Yes, we will do that.

Comment 25 Micah Abbott 2020-01-21 17:15:28 UTC
(In reply to Steven Hardy from comment #22)

> While I can understand not wanting to change the default behavior of dracut,
> I wonder if this is something we should consider changing in the RHCOS
> image, e.g apply the kernel CLI so this will "just work" in both ipv4 and
> ipv6 environments?

Tracking that here https://bugzilla.redhat.com/show_bug.cgi?id=1793591

Comment 26 Jonathan Lebon 2020-01-22 02:34:45 UTC
I think we also need this patch: https://github.com/dracutdevs/dracut/pull/710. Without this, in an IPv6-only environment dracut will fail to get a lease in the initramfs.
@Lukáš could you review and backport this patch as well?

Comment 27 Harald Hoyer 2020-01-23 13:35:54 UTC
backported  https://github.com/dracutdevs/dracut/pull/71, waiting for the GATING

Comment 28 Harald Hoyer 2020-01-23 14:41:26 UTC
updated erratum

Comment 31 errata-xmlrpc 2020-04-28 16:06:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1760


Note You need to log in before you can comment on or make changes to this bug.