Bug 1803926 - ip=dhcp,dhcp6 kernel command line doesn't work for ipv4 deployments
Summary: ip=dhcp,dhcp6 kernel command line doesn't work for ipv4 deployments
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: 4.4.0
Assignee: Jonathan Lebon
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1807395
Blocks: 1812599
TreeView+ depends on / blocked
 
Reported: 2020-02-17 17:34 UTC by Steven Hardy
Modified: 2020-05-13 21:58 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, RHCOS would not correctly wait for all interfaces to get DHCP. Now, RHCOS correctly waits for all interfaces to get DHCP.
Clone Of:
: 1807395 1812599 (view as bug list)
Environment:
Last Closed: 2020-05-13 21:58:20 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github dracutdevs dracut pull 729 None closed network-legacy/ifup: fix ip=dhcp,dhcp6 setup_net logic 2020-05-04 15:38:41 UTC
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-13 21:58:24 UTC

Description Steven Hardy 2020-02-17 17:34:02 UTC
Description of problem:

This is a follow-up to https://bugzilla.redhat.com/show_bug.cgi?id=1787620 and https://bugzilla.redhat.com/show_bug.cgi?id=1793618 which added support for environments where ipv4 and ipv6 are enabled.

Unfortunately with the new setting of ip=dhcp,dhcp6 we are seeing issues with ipv4-only deployments, where the behavior does not seem equivalent to the previous ip=dhcp case.

In our test environment, each master has two nics, the first is connected to a dedicated provisioning network, the second is connected to the external/controlplane network - we need the second nic active to retrieve the ignition config from the bootstrap VM, but the first nic may also get a DHCP lease.

With ip=dhcp, we see both nics come up, and deployment succeeds.

With ip=dhcp,dhcp6 it seem that we don't attempt to start the second nic - I believe we're successfully getting a lease on the first one but then fail to retrieve the ignition config (because the URL references the network enabled by the second nic).

Note also that in typical baremetal environments, there may be additional nics which won't get any DHCP lease, so we need to attempt to bring up all-the-nics, on ipv4 and ipv6, but only fail if none of them become active.

Version-Release number of selected component (if applicable):

rhcos-44.81.202002071430-0-openstack.x86_64.qcow2

How reproducible:

Always


Steps to Reproduce:

I've been testing with https://github.com/openshift-metal3/dev-scripts which will reproduce this until https://github.com/openshift/installer/pull/3117 lands reverting the change to ip=dhcp,dhcp6

In this environment we're basically defining two libvirt networks, and booting RHCOS on a VM with two nics defined, an ignition config is then passed which references a URL accessible via the second of the two nics.

Comment 1 Micah Abbott 2020-02-17 19:47:03 UTC
I'm not sure this is possible with how `dracut` currently works today.

It looks like you ultimately want a way to tell `dracut` with a default `ip=` setting that says "Please get either IPv4 or IPv6 addresses on all available interfaces before declaring the network successfully configured"

So in your example, interface A gets an IPv4 address via DHCP and then `dracut` continues to wait until interface B gets an IPv6 address via DHCPv6 (also no IPv4 address is offered to interface B).  Is this correct?

Tagging in Harald and Lukas for visibility from the `dracut` side.

Comment 2 Steve Milner 2020-02-18 14:21:39 UTC
Temporary revert: https://github.com/openshift/installer/pull/3117

Comment 3 Steven Hardy 2020-02-18 14:41:45 UTC
> I'm not sure this is possible with how `dracut` currently works today.

For ipv4 ip=dhcp does already seem to behave as required, is that by luck?  In this latest testing I'm only using ipv4, so I expected ip=dhcp,dhcp6 to give exactly the same behavior as before with ip=dhcp?

> It looks like you ultimately want a way to tell `dracut` with a default `ip=` setting that says "Please get either IPv4 or IPv6 addresses on all available interfaces before declaring the network successfully configured"

Yeah basically, try to get IPv4 or IPv6 addresses on all available interfaces, but don't block on *all* being active (some may not get a lease so give up if they don't), only fail if *no* interface gets a lease

> So in your example, interface A gets an IPv4 address via DHCP and then `dracut` continues to wait until interface B gets an IPv6 address via DHCPv6 (also no IPv4 address is offered to interface B).  Is this correct?

That was the case with the initial testing that triggered #1787620 - but in this environment everything is ipv4

With ip=dhcp we see interface A and B get an ipv4 address, and interface B is needed to retrieve the ignition config in this environment.

With ip=dhcp,dhcp6 we see interface A get an IP, but interface B it seems is not activated at all, we don't attempt to get a lease at all on this second nic (which is different to the previous observed behavior).

Comment 4 Jonathan Lebon 2020-02-19 16:25:21 UTC
OK, this should be fixed by: https://github.com/dracutdevs/dracut/pull/729
We'll have to get this reviewed and backported to 8.2.

Comment 9 Micah Abbott 2020-02-25 21:24:43 UTC
Harald or Lukáš, with https://github.com/dracutdevs/dracut/pull/729 merged, could we get this backported to 8.2 and attached to an errata?

We'd like to include this fix as part of RHCOS/OCP 4.3 + 4.4 and require that the new build be attached to an 8.2 errata before we can tag it into the RHAOS puddle.

Comment 15 Michael Nguyen 2020-03-12 14:24:07 UTC
Ensured this worked on RHCOS 44.81.202003062006-0 boot image


== initramfs ==
load_video                                                                      
set gfxpayload=keep                                                            
insmod gzio                                                                    
linux ($root)/ostree/rhcos-d9bc661489552b2494aa2e2ee1253150f8ed9a4fd87e2e28c60\
512291852c949/vmlinuz-4.18.0-147.5.1.el8_1.x86_64 rhcos.root=crypt_rootfs cons\
ole=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu rd.break rd.luks.opt\
ions=discard ignition.firstboot rd.neednet=1 ip=dhcp,dhcp6 ostree=/ostree/boot\
.1/rhcos/d9bc661489552b2494aa2e2ee1253150f8ed9a4fd87e2e28c60512291852c949/0    
initrd ($root)/ostree/rhcos-d9bc661489552b2494aa2e2ee1253150f8ed9a4fd87e2e28c6\
0512291852c949/initramfs-4.18.0-147.5.1.el8_1.x86_64.img

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


switch_root:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:e3:5c:2d brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.11/24 brd 192.168.122.255 scope global dynamic enp1s0
       valid_lft 3458sec preferred_lft 3458sec
    inet6 fe80::5054:ff:fee3:5c2d/64 scope link 
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:09:48:0b brd ff:ff:ff:ff:ff:ff
    inet 192.168.129.235/24 brd 192.168.129.255 scope global dynamic enp2s0
       valid_lft 3463sec preferred_lft 3463sec
    inet6 fe80::5054:ff:fe09:480b/64 scope link 
       valid_lft forever preferred_lft forever
4: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:ff:28:d0 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8:dead:beef::1ca9/128 scope global dynamic 
       valid_lft 3527sec preferred_lft 3527sec
    inet6 fe80::5054:ff:feff:28d0/64 scope link 
       valid_lft forever preferred_lft forever



== booted in rhcos ==
[core@localhost ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:70:cb:d9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.194/24 brd 192.168.122.255 scope global dynamic noprefixroute enp1s0
       valid_lft 3503sec preferred_lft 3503sec
    inet6 fe80::cab5:cb03:fe6f:8596/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:21:ef:a3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.129.183/24 brd 192.168.129.255 scope global dynamic noprefixroute enp2s0
       valid_lft 3503sec preferred_lft 3503sec
    inet6 fe80::5d73:9a02:53b2:e0fa/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:14:00:81 brd ff:ff:ff:ff:ff:ff
    inet6 2001:db8:dead:beef::1489/128 scope global dynamic noprefixroute 
       valid_lft 3507sec preferred_lft 3507sec
    inet6 fe80::6730:d53a:27b6:975c/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
[core@localhost ~]$ rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* ostree://4e3f5cd998893cc6648888e77515c91e3de87732b220a84b5a6fa12c5d87f8ce
                   Version: 44.81.202003062006-0 (2020-03-06T20:11:30Z)

Comment 16 Micah Abbott 2020-03-12 17:19:45 UTC
Boot image bump merged - https://github.com/openshift/installer/pull/3271

Comment 17 Micah Abbott 2020-03-12 17:27:12 UTC
Per comment #15, this can be moved to VERIFIED

Comment 20 errata-xmlrpc 2020-05-13 21:58:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.