Bug 1787620
Summary: | ip=dhcp6,dhcp does not work on network without ipv6 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Steven Hardy <shardy> |
Component: | dracut | Assignee: | Lukáš Nykrýn <lnykryn> |
Status: | CLOSED ERRATA | QA Contact: | Frantisek Sumsal <fsumsal> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.4 | CC: | augol, dracut-maint-list, fsumsal, harald, jlebon, lnykryn, miabbott, mnguyen, mvirgil, racedoro, rbryant, sasha, sgordon, smilner, walters, yprokule |
Target Milestone: | rc | Flags: | harald:
needinfo-
|
Target Release: | 8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | dracut-049-63.git20200114.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-04-28 16:06:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1771572 |
Description
Steven Hardy
2020-01-03 17:09:40 UTC
To xref, from previous discussion we want to move to NM in the initrd. I pushed up https://github.com/coreos/fedora-coreos-config/pull/259 as a starting point for this switch for FCOS, which we could then pick up in RHCOS. (In reply to Colin Walters from comment #1) > To xref, from previous discussion we want to move to NM in the initrd. I > pushed up > https://github.com/coreos/fedora-coreos-config/pull/259 > as a starting point for this switch for FCOS, which we could then pick up in > RHCOS. We'll need to fix this for 4.3 and not just master RHCOS, I'm guessing such a change won't be a backport candidate, so we still need a fix or workaround when using the legacy network plugin? > I'm guessing such a change won't be a backport candidate, so we still need a fix or workaround when using the legacy network plugin? Yeah, I'd agree with that. One procedural note on this: https://gitlab.cee.redhat.com/coreos/redhat-coreos/#overridingusing-specific-package-versions (We *can* fork dracut if need be into the OpenShift channel before releasing in RHEL but we've gotten strong pushback against that) I had made a one-off RHCOS build that hacked together support for DHCPv4/DHCPv6 in dracut. Some of the details are here: https://issues.redhat.com/browse/GRPA-1327 The dracut hack was an introduction of a new `ip=` option called `both_dhcp` that did DHCPv4 and DHCPv6 before making the interface active. https://github.com/miabbott/dracut/commit/6887e9b4c2c02dbab3304e23b8ecc5d0f9094503 Though maybe the `any` option could be changed to something like: ``` diff --git a/modules.d/35network-legacy/ifup.sh b/modules.d/35network-legacy/ifup.sh index 5331c461..9971d422 100755 --- a/modules.d/35network-legacy/ifup.sh +++ b/modules.d/35network-legacy/ifup.sh @@ -421,7 +421,10 @@ for p in $(getargs ip=); do for autoopt in $(str_replace "$autoconf" "," " "); do case $autoopt in - dhcp|on|any) + any) + load_ipv6 + do_dhcp -4 || do_dhcp -6 ;; + dhcp|on) do_dhcp -4 ;; dhcp6) load_ipv6 ``` (In reply to Micah Abbott from comment #4) > I had made a one-off RHCOS build that hacked together support for > DHCPv4/DHCPv6 in dracut. Some of the details are here: > > https://issues.redhat.com/browse/GRPA-1327 > > The dracut hack was an introduction of a new `ip=` option called `both_dhcp` > that did DHCPv4 and DHCPv6 before making the interface active. > > https://github.com/miabbott/dracut/commit/ > 6887e9b4c2c02dbab3304e23b8ecc5d0f9094503 > > > Though maybe the `any` option could be changed to something like: > > ``` > diff --git a/modules.d/35network-legacy/ifup.sh > b/modules.d/35network-legacy/ifup.sh > index 5331c461..9971d422 100755 > --- a/modules.d/35network-legacy/ifup.sh > +++ b/modules.d/35network-legacy/ifup.sh > @@ -421,7 +421,10 @@ for p in $(getargs ip=); do > > for autoopt in $(str_replace "$autoconf" "," " "); do > case $autoopt in > - dhcp|on|any) > + any) > + load_ipv6 > + do_dhcp -4 || do_dhcp -6 ;; I think the ideal behavior would be to always do both, but only consider it a failure if both fail. (In reply to Micah Abbott from comment #4) > I had made a one-off RHCOS build that hacked together support for > DHCPv4/DHCPv6 in dracut. Some of the details are here: > > https://issues.redhat.com/browse/GRPA-1327 > > The dracut hack was an introduction of a new `ip=` option called `both_dhcp` > that did DHCPv4 and DHCPv6 before making the interface active. > > https://github.com/miabbott/dracut/commit/ > 6887e9b4c2c02dbab3304e23b8ecc5d0f9094503 > > > Though maybe the `any` option could be changed to something like: > > ``` > diff --git a/modules.d/35network-legacy/ifup.sh > b/modules.d/35network-legacy/ifup.sh > index 5331c461..9971d422 100755 > --- a/modules.d/35network-legacy/ifup.sh > +++ b/modules.d/35network-legacy/ifup.sh > @@ -421,7 +421,10 @@ for p in $(getargs ip=); do > > for autoopt in $(str_replace "$autoconf" "," " "); do > case $autoopt in > - dhcp|on|any) > + any) > + load_ipv6 > + do_dhcp -4 || do_dhcp -6 ;; > + dhcp|on) > do_dhcp -4 ;; > dhcp6) > load_ipv6 > ``` I don't like the idea of redefining what "any" does. I think it would be better to make ip=dhcp,dhcp6 or ip=dhcp ip=dhcp6 work. Hmm after reading the bug properly I am a bit confused. Originally I thought that the problem is that we have one interface and either has dhcpv4 or dhcpv6 there. And yeah The current code is broken in such case, since ip=dhcp,dhcp6 always needs v6 to succeed. But the comment 0 is about two interfaces where one has dhcp v6 and other should be ignored, which is a completely different issue. (In reply to Lukáš Nykrýn from comment #7) > Hmm after reading the bug properly I am a bit confused. Originally I thought > that the problem is that we have one interface and either has dhcpv4 or > dhcpv6 there. And yeah The current code is broken in such case, since > ip=dhcp,dhcp6 always needs v6 to succeed. > > But the comment 0 is about two interfaces where one has dhcp v6 and other > should be ignored, which is a completely different issue. It's basically a variation of the same issue, we have one nic which will always get a DHCP lease, either ipv4 or ipv6 depending on the environment. This is the nic we need dracut to bring up in order for the deployment to succeed. However in typical baremetal deployments, the nodes will have additional nics, and sometimes those will not have any external DHCP when dracut runs, for specific networks we know they will be statically configured later after the OS has booted. We need some way for dracut to bring up just the one nic that does get a DHCP lease, and not wait for all the additional nics (which IIUC is the default behavior with NetworkManager, but we need a solution for OCP 4.3 which is using the legacy plugin). Currently the only way to do that AFAICS is to specify the nic explicitly e.g ip=ens3,dhcp6 - this won't work in OCP customer environments because we expect to share a single OS image with potentially more than one type of hardware, we can't hard-code the nic name in the image. We need some way to say get a DHCP lease (including in single-stack ipv6 cases) on *any* interface then continue, instead of blocking for all of them to get a lease (which I think is the "any" behavior?) I think we have two options, either we make "any" work with ipv6, or we add a new option which indicates we should enable dhcp6 but succeed when any interface gets a lease (any6?) Can someone try this again, with following patches: https://github.com/lnykryn/dracut/commit/0067f7b9c4dffa930e15af8e36982a2cd2ec5a17 https://github.com/dracutdevs/dracut/pull/704/commits/e306a5f900f1b93f6b743ac042fb81c06dd461c3 and with "rd.neednet=1 ip=dhcp,dhcp6 rd.net.timeout.dhcp=3 rd.net.timeout.ipv6dad=3" on kernel cmdline (In reply to Lukáš Nykrýn from comment #9) > Can someone try this again, with following patches: > https://github.com/lnykryn/dracut/commit/ > 0067f7b9c4dffa930e15af8e36982a2cd2ec5a17 > https://github.com/dracutdevs/dracut/pull/704/commits/ > e306a5f900f1b93f6b743ac042fb81c06dd461c3 > > and with "rd.neednet=1 ip=dhcp,dhcp6 rd.net.timeout.dhcp=3 > rd.net.timeout.ipv6dad=3" on kernel cmdline I created a custom RHCOS build with the dracut patches. I was able to successfully boot a RHCOS VM with two NICs and two virtual networks (one with dhcpv6 enabled) with those kernel cmdline arguments. The NetworkManager-wait-online service failed but succeeded after I restarted it. Michael, Lukáš, Any idea why NetworkManager-wait-online fails? (In reply to Steve Milner from comment #13) > Michael, Lukáš, > > Any idea why NetworkManager-wait-online fails? No idea, but it should be a separate issue. Perhaps just file a bug for NM. Michael, Did the service logs give any information? > I created a custom RHCOS build with the dracut patches.
Can anyone either point me to the process for doing this, or provide a version of rhcos-43.81.201912131630.0-openstack.x86_64.qcow2.gz and rhcos-43.81.201912131630.0-qemu.x86_64.qcow2.gz which I can use to test in my environment please?
> Can anyone either point me to the process for doing this, https://github.com/coreos/coreos-assembler/ specifically https://github.com/coreos/coreos-assembler/blob/master/README-devel.md#using-overrides and https://gitlab.cee.redhat.com/coreos/redhat-coreos/#running-a-build-locally (In reply to Lukáš Nykrýn from comment #9) > Can someone try this again, with following patches: > https://github.com/lnykryn/dracut/commit/ > 0067f7b9c4dffa930e15af8e36982a2cd2ec5a17 > https://github.com/dracutdevs/dracut/pull/704/commits/ > e306a5f900f1b93f6b743ac042fb81c06dd461c3 > > and with "rd.neednet=1 ip=dhcp,dhcp6 rd.net.timeout.dhcp=3 > rd.net.timeout.ipv6dad=3" on kernel cmdline I tested this using images from @mnguyen (thanks!) and can confirm that it works. While I can understand not wanting to change the default behavior of dracut, I wonder if this is something we should consider changing in the RHCOS image, e.g apply the kernel CLI so this will "just work" in both ipv4 and ipv6 environments? > I wonder if this is something we should consider changing in the RHCOS image, e.g apply the kernel CLI so this will "just work" in both ipv4 and ipv6 environments?
Yes, we will do that.
(In reply to Steven Hardy from comment #22) > While I can understand not wanting to change the default behavior of dracut, > I wonder if this is something we should consider changing in the RHCOS > image, e.g apply the kernel CLI so this will "just work" in both ipv4 and > ipv6 environments? Tracking that here https://bugzilla.redhat.com/show_bug.cgi?id=1793591 I think we also need this patch: https://github.com/dracutdevs/dracut/pull/710. Without this, in an IPv6-only environment dracut will fail to get a lease in the initramfs. @Lukáš could you review and backport this patch as well? backported https://github.com/dracutdevs/dracut/pull/71, waiting for the GATING updated erratum Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1760 |