Bug 1877740

Summary:

RHCOS unable to get ip address during first boot

Product:

OpenShift Container Platform

Reporter:

Yussuf Shaikh <yshaikh>

Component:

RHCOS

Assignee:

Timothée Ravier <travier>

Status:

CLOSED ERRATA

QA Contact:

Michael Nguyen <mnguyen>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.6

CC:

amccrae, bbreard, bgilbert, dporter, imcleod, jligon, lmcfadde, manokuma, miabbott, mjtarsel, mtarsel, nstielau, pradikum, yshaikh

Target Milestone:

---

Target Release:

4.7.0

Hardware:

ppc64le

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: If the DHCP server took too long to respond to DHCP queries, NetworkManager would stop waiting for a DHCP answer and the network would not be configured in the initramfs. Consequence: Ignition could not fetch a remote configuration over the network. Fix: In 4.7, the version of NetworkManager has been updated to the RHEL 8.3 package version. Result: The new version of NetworkManager understands the rd.net.timeout.dhcp=xyz and rd.net.dhcp.retry=xyz options when set as kernel parameters to increase the timeout and number of retries. Users can now set those options to account for delayed DHCP answers.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-02-24 15:17:43 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
rhcos-console-log	none
rhcos-console.log	none
rhcos-console.log	none

Description Yussuf Shaikh 2020-09-10 10:46:48 UTC

Created attachment 1714412 [details]
rhcos-console-log

Description of problem:

During UPI install of OCP 4.6 found that RHCOS node does not get network address from DHCP and continously wait for ignition URL hosted on a http server. RHCOS image used is rhcos-46.82.202009010341-0. The NetworkManager dhcp4 state changes to "expire". The issue cause could be that the DHCP response is delayed for time greater than the allocated timeout by NetworkManager.

Couple of lines from console log showing connection goes to "expire" and never tries to get connection again:
[   13.194085] NetworkManager[731]: <info>  [1599297806.4888] dhcp4 (env32): activation: beginning transaction (timeout in 45 seconds)
[   13.196242] NetworkManager[731]: <info>  [1599297806.4909] dhcp4 (env32): state changed unknown -> expire

Version-Release number of selected component (if applicable): 46.82.202009010341-0


How reproducible: I would say depending on the DHCP response time.

Steps to Reproduce:
1. Start the DHCP server with host entries.
2. Boot the RHCOS node providing ignition file over an HTTP server.
3. Check the RHCOS (master/worker) console log for failures in NM connection activation.

Actual results: No IP is getting assigned to the rhcos nodes. Console message continously wait for the network to download the ignition file via HTTP URL.


Expected results: IP should get assigned after multiple try by the NM DHCP client. RHCOS node should be able to download the ignition file as soon as network connection is configured.


Additional info:
Had intermittent issues with rhco-4.5 as well. Bootstrap node would never lose the connection when using `dhclient` as was mentioned in the openstack UPI documentation. But with rhcos-4.6 it is visible in bootstrap node as well. The problem being on all RHCOS nodes with `internal` client.

Comment 1 Yussuf Shaikh 2020-09-10 10:52:39 UTC

Created attachment 1714413 [details]
rhcos-console.log

Comment 2 Yussuf Shaikh 2020-09-10 10:53:23 UTC

Created attachment 1714414 [details]
rhcos-console.log

Comment 3 Yussuf Shaikh 2020-09-10 11:00:09 UTC

Sorry for the multiple attachments upload. Please consider the latest.

To add to the information, we are using DHCP server configuration from  https://github.com/RedHatOfficial/ocp4-helpernode/blob/master/templates/dhcpd.conf.j2.

Comment 4 Micah Abbott 2020-09-10 20:56:17 UTC

Is this bare metal or virtualized?  It looks like a ppc64le system, but the platform ID being used is `openstack`

```
[    4.136068] dracut-cmdline[299]: Using kernel command line parameters: ip=dhcp,dhcp6 rd.driver.pre=dm_multipath BOOT_IMAGE=(ieee1275//vdevice/vfc-client@30000003/disk@5005076802133f81\\,0000000000000000,gpt2)/ostree/rhcos-a3074d3590afdd901792bc5617390e6e9eb99a7dd58356d33b27572690ef1ae0/vmlinuz-4.18.0-211.el8.ppc64le rhcos.root=crypt_rootfs random.trust_cpu=on console=tty0 console=hvc0,115200n8 rd.luks.options=discard ignition.firstboot rd.neednet=1 ostree=/ostree/boot.1/rhcos/a3074d3590afdd901792bc5617390e6e9eb99a7dd58356d33b27572690ef1ae0/0 ignition.platform.id=openstack
```

It's not clear why the DHCP client went from CONNECTING to expire in less than a few milliseconds.  Is this behavior repeatable?  Is there any specialized network configuration being supplied to the host?

Ignition looks like it is trying to fetch the config from the OpenStack metadata server, but I would expect that to happen after network is up.

Hmm...maybe the `ignition-fetch.service` should be using `After=network-online.target` - https://github.com/coreos/ignition/blob/master/dracut/30ignition/ignition-fetch.service#L22


@Andy any insight from the multi-arch perspective?

Comment 5 Yussuf Shaikh 2020-09-11 07:18:17 UTC

(In reply to Micah Abbott from comment #4)
> Is this bare metal or virtualized?  It looks like a ppc64le system, but the
> platform ID being used is `openstack`

It is an openstack platform.

> It's not clear why the DHCP client went from CONNECTING to expire in less
> than a few milliseconds.  Is this behavior repeatable?  Is there any
> specialized network configuration being supplied to the host?

Network on all hosts should be same. This behaviour is intermittent. Sometimes NetworkManger get the network even after the state is 'expire'. Could be instant or after some seconds, but then it never retry.

> Ignition looks like it is trying to fetch the config from the OpenStack
> metadata server, but I would expect that to happen after network is up.

(Do not want to divert from the main issue) Does time sync also have a play in this? I do not see chrony run before network.

Comment 6 Micah Abbott 2020-09-14 20:08:50 UTC

Conservatively targeting for 4.7 with a medium priority until we are able to perfom additional triage.

Comment 7 Andy McCrae 2020-09-15 09:54:12 UTC

I don't see anything that is Multi-Arch specific at this point, it looks like ignition does the right thing and continues to retry, but NetworkManager fails to setup the networking properly and so it can't connect.

Would it be possible to confirm this all works as expected with a RHEL 8.2 host in the OpenStack environment? (ruling out RHCOS and ignition)

Additionally, could you provide more information on the dhcp side? I'd expect the OpenStack Neutron (Networking) service to manage the dhcp of the hosts in the cluster, rather than a separate DHCP service, but I could be missing how this is configured/set-up.

Comment 9 Benjamin Gilbert 2020-09-15 16:32:40 UTC

Micah, re comment 4, Ignition intentionally doesn't wait for the network to be up, because there's no way to programmatically know when config fetch will succeed.  Having e.g. an address on one interface, or a route to the Internet, doesn't guarantee that the config is accessible yet.

Re comment 8: on OpenStack, Ignition will give up retrying the config fetch after 30 seconds due to bug 1874329 (https://github.com/coreos/ignition/issues/1081).  But if NetworkManager is failing to DHCP at all, this sounds like a different issue.

Comment 10 Timothée Ravier 2020-09-15 16:39:25 UTC

Can you make sure that you are not hitting a known issue: https://github.com/openshift/installer/blob/master/docs/user/openstack/known-issues.md#external-network-overlap

As you are doing UPI, can you also provide a full example config to enable us to reproduce that on another openstack instance?

Thanks

Comment 11 Yussuf Shaikh 2020-09-17 19:08:56 UTC

(In reply to Andy McCrae from comment #7)
> I don't see anything that is Multi-Arch specific at this point, it looks
> like ignition does the right thing and continues to retry, but
> NetworkManager fails to setup the networking properly and so it can't
> connect.
> 
> Would it be possible to confirm this all works as expected with a RHEL 8.2
> host in the OpenStack environment? (ruling out RHCOS and ignition)
> 
> Additionally, could you provide more information on the dhcp side? I'd
> expect the OpenStack Neutron (Networking) service to manage the dhcp of the
> hosts in the cluster, rather than a separate DHCP service, but I could be
> missing how this is configured/set-up.

With RHCOS-4.6 we don't see ignition timeout when GET <HTTP URL> is failing with error "network is unreachable". It simply wait for the GET to complete but NetworkManager will never retry to established dhcp network.

Today I have peformed some test on a libvirt setup using RHEL and RHCOS. Have seen the issue with NetworkManager using internal client.

We are not using neutron DHCP service and setup a dhcp-server on RHEL 8. FYR the server configuration is taken from https://github.com/RedHatOfficial/ocp4-helpernode/blob/master/templates/dhcpd.conf.j2 .


(In reply to Benjamin Gilbert from comment #9)
> Micah, re comment 4, Ignition intentionally doesn't wait for the network to
> be up, because there's no way to programmatically know when config fetch
> will succeed.  Having e.g. an address on one interface, or a route to the
> Internet, doesn't guarantee that the config is accessible yet.
> 
> Re comment 8: on OpenStack, Ignition will give up retrying the config fetch
> after 30 seconds due to bug 1874329
> (https://github.com/coreos/ignition/issues/1081).  But if NetworkManager is
> failing to DHCP at all, this sounds like a different issue.

I understand it is difficult to know network status while ignition is fetching the HTTP URL.

I can confirm on OpenStack that metadata fetch expire after 2 mins.
```
[  132.816176] ignition[702]: failed to fetch config from metadata service: unable to fetch resource in time
```


(In reply to Timothée Ravier from comment #10)

> As you are doing UPI, can you also provide a full example config to enable
> us to reproduce that on another openstack instance?

Yes as this is UPI we have multiple DHCP servers running on the same network segment.



As I was able to reproduce the problem on a libvirt setup as it will be easy to validate the problem. I have tried the following:

First, I have tried playing with the clock on the server side (RHEL8) with no effect on this issue. I was getting NetworkManager message "state changed unknown -> expire" message at random times. But would get dhcp network within msec to ~60sec (just no reason) with message "state changed expire -> bound" after it. Some times would not get expire state but getting bound instantly. Once or twice I was able to reproduce this issue where NetworkManager message would not change from expire state with no further logs.

Second, created a RHEL machine on libvirt with static IP and another RHEL with dhcp network. Setup a DHCP server and checked the bootpc port using tcpdump command. This time noticed NACK packets coming from libvirt network which tend to change NetworkManager status to expire.

Third, created 2 RHEL machines with static IP and another with dhcp network. This was to check if multiple DHCP servers on the same network segment could be a problem. 1st DHCP server configured to deny all clients, 2nd DHCP server configured to allow given client(dhcp client). With only 1st DHCP server running the client status was changed to expire and would stay there continously(no timeout of 45 sec or retry). When I started 2nd DHCP server the client "state changed expire -> bound".


Observation on NetworkManager state change:
When NACK papcket is received on DISCOVER state changed to expire. ACK is ignored.
When ACK packet is received on DISCOVER first state changed to bound. NACK is ignored.
When NACK papcket on 1st DISCOVER and NACK papcket on 2nd (or consecutive) state changed to expire (and stay same ignoring the timeout).
When NACK papcket on 1st DISCOVER and ACK on 2nd (or consecutive) DISCOVER state changed to expire and then state changed to expire.
With no ACK/NACK packets timeout after 45sec.

Comment 12 Andy McCrae 2020-09-18 14:46:54 UTC

Hi Yussuf,

Can you provide the configs you're using (rather than just the template) as well as the Openshift install-configs?
If it's re-creatable within libvirt we should be able to recreate it and see if we can figure out the issue.

Comment 13 Yussuf Shaikh 2020-09-21 15:35:21 UTC

DHCP server config:

```
# cat /etc/dhcp/dhcpd.conf
authoritative;
ddns-update-style interim;
default-lease-time 14400;
max-lease-time 14400;

        option routers                  192.168.98.1;
        option broadcast-address        192.168.98.255;
        option subnet-mask              255.255.255.0;
        option domain-name-servers      192.168.98.2;
        option domain-name              "yus-46-fc06.rh.com";

        subnet 192.168.98.0 netmask 255.255.255.0 {
        interface eth0;
        pool {
                range 192.168.98.3 192.168.98.254;
                # Static entries
                host bootstrap { hardware ethernet 52:54:00:ec:65:25; fixed-address 192.168.98.3; }
                host master-0 { hardware ethernet 52:54:00:b3:4d:bd; fixed-address 192.168.98.4; }
                host master-1 { hardware ethernet 52:54:00:35:90:c9; fixed-address 192.168.98.5; }
                host master-2 { hardware ethernet 52:54:00:45:e9:99; fixed-address 192.168.98.6; }
                host worker-0 { hardware ethernet 52:54:00:33:23:de; fixed-address 192.168.98.21; }
                host worker-1 { hardware ethernet 52:54:00:8b:a7:a1; fixed-address 192.168.98.22; }
                # this will not give out addresses to hosts not listed above
                deny unknown-clients;

                next-server 192.168.98.2;
        }
}
```

OCP install-config:
```
apiVersion: v1
baseDomain: rh.com
compute:
- hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: yus-46-fc06
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
sshKey: 'ssh-rsa ....'
pullSecret: '{"auths":{...}}'
```

Comment 14 lmcfadde 2020-09-29 13:49:15 UTC

Andy and team, does the DHCP server config shed on any light on the cause of the issue?  I realize you may be busy with blocker bugs at present and haven't gotten back to it.

Comment 15 Timothée Ravier 2020-09-30 09:19:45 UTC

I tried reproducing this issue with the following setup in libvirt:

- Network set with no DHCP:
```
<network connections="3">
  <name>network-nodhcp</name>
  <uuid>96cc4028-f1cd-4495-8a31-7478b6083fe7</uuid>
  <forward mode="nat">
    <nat>
      <port start="1024" end="65535"/>
    </nat>
  </forward>
  <bridge name="virbr3" stp="on" delay="0"/>
  <mac address="52:54:00:23:47:ce"/>
  <domain name="network-nodhcp"/>
  <ip address="192.168.98.1" netmask="255.255.255.0">
  </ip>
</network>
```

- Two CentOS 8.2 systems with dhcp-server with static IPs (192.168.98.10 & 192.168.98.11) with the following configs:

Server 1:
```
authoritative;
ddns-update-style interim;
default-lease-time 14400;
max-lease-time 14400;

        option routers                  192.168.98.1;
        option broadcast-address        192.168.98.255;
        option subnet-mask              255.255.255.0;
        option domain-name-servers      192.168.98.1;
        option domain-name              "yus-46-fc06.rh.com";

        subnet 192.168.98.0 netmask 255.255.255.0 {
        interface eth0;
        pool {
                range 192.168.98.3 192.168.98.254;

                # this will not give out addresses to hosts not listed above
                deny unknown-clients;
        }
}
```

Server 2:
```
authoritative;
ddns-update-style interim;
default-lease-time 14400;
max-lease-time 14400;

        option routers                  192.168.98.1;
        option broadcast-address        192.168.98.255;
        option subnet-mask              255.255.255.0;
        option domain-name-servers      192.168.98.1;
        option domain-name              "yus-46-fc06.rh.com";

        subnet 192.168.98.0 netmask 255.255.255.0 {
        interface eth0;
        pool {
                range 192.168.98.3 192.168.98.254;

                # Static entries
                host bootstrap { hardware ethernet 52:54:00:e4:04:d9; fixed-address 192.168.98.3; }

                # this will not give out addresses to hosts not listed above
                deny unknown-clients;

                next-server 192.168.98.10;
        }
}
```

- A Live ISO RHCOS (rhcos-46.82.202009260308-0-live.x86_64) with the following additional kernel arg set at boot time via GRUB:

```
... rd.break=initqueue
```

- In the emergency shell (you might have to press enter), I ran (taken from `/usr/lib/dracut/modules.d/35network-manager/nm-run.sh`):

```
$ NetworkManager --configure-and-quit=initrd --no-daemon
```

And with every configuration (1rst DHCP server started first, 2nd then / 2nd first, 1rst then), I get an IP assigned to the interface.

Have I accurately reproduced the network setup?

Comment 16 Timothée Ravier 2020-09-30 14:22:30 UTC

If you are able to reproduce the bug, can you add `rd.debug` the kernel command line and post here a console log? Thanks!

Comment 17 Yussuf Shaikh 2020-10-09 07:36:39 UTC

> 
> Have I accurately reproduced the network setup?

If you notice without the 2nd DHCP server the RHCOS node will go to `expire` state. Because DHCP1 denied the request with NACK.

Wait for >10 mins to start DHCP2 and you will notice RHCOS never get the IP. This issue is occurring at random but eventually get the IP after some min/hours (but by that time ignition fetch times out). So only explanation I can think of when both DHCP servers are running... the DHCP1 response is 1st received and state is turned to `expire`. Only when response of DHCP2 is received first with have state to `bound`. Seems to me a race condition.

Comment 18 Yussuf Shaikh 2020-10-10 06:35:31 UTC

I have done some tests without below configurations at the server side:
1. deny unknown-clients 
2. pool block
3. range

``````
authoritative;
ddns-update-style interim;
default-lease-time 14400;
max-lease-time 14400;
        option routers                  192.168.98.1;
        option broadcast-address        192.168.98.255;
        option subnet-mask              255.255.255.0;
        option domain-name-servers      192.168.98.1;
        option domain-name              "yus-46-fc06.rh.com";

        subnet 192.168.98.0 netmask 255.255.255.0 {
        interface eth0;
                # Static entries
                host bootstrap { hardware ethernet 52:54:00:e4:04:d9; fixed-address 192.168.98.3; }

                # this will not give out addresses to hosts not listed above
                # deny unknown-clients;

                next-server 192.168.98.10;
        }
```

Above configuration means the DHCP server would never reply with a NACK for other REQUESTs which is not defined in the static hosts list.

This time the RHCOS NetworkManager never changed the state to `expired` and waits for an ACK from the DHCP server and eventually change the state to `bound`.

Also, a timeout in ignition fetch to some value (default is 0) would ensure that the boot process will fail and then reboot to retry getting the DHCP network.

Comment 19 Timothée Ravier 2020-10-12 15:51:01 UTC

> Also, a timeout in ignition fetch to some value (default is 0) would ensure that the boot process will fail and then reboot to retry getting the DHCP network.

Unfortunately we can not do that as this is a design decision in Ignition to retry infinitely and never fail: https://coreos.github.io/ignition/rationale/#ignition-produces-the-machine-specified-or-no-machine-at-all

Comment 20 Timothée Ravier 2020-10-13 09:20:36 UTC

I may have reproduced this one by starting only the first DHCP server (no static entries), waiting for the full NetworkManager timeout, and then starting up the second DHCP server (with static entries). In this case, NetworkManager will give up at some point and will stop waiting for a DHCP response.

Comment 21 Yussuf Shaikh 2020-10-14 14:49:16 UTC

Right. Ignition tries continuously but NetworkManager does not.

One way we could work around is by using the "ignition.config.timeouts.httpTotal" which reboots the machine after N seconds and try DHCP and Ignition fetch again.

Comment 22 Timothée Ravier 2020-10-15 11:27:43 UTC

I will bring this issue to the NetworkManager team to see how we can fix this.

Comment 23 Timothée Ravier 2020-11-13 18:37:52 UTC

Waiting on feedback from NetworkManager team. Unlikely to be backported for 4.6 but should be better in 4.7 with RHEL 8.3 NetworkManager. Marking as UpcomingSprint.

Comment 24 Timothée Ravier 2020-12-04 14:39:33 UTC

Discussion is currently happening upstream to add support for an infinite waiting mode for DHCP.

In the meantime, you should be able to use the kernel command line options from https://bugzilla.redhat.com/show_bug.cgi?id=1879094#c3 (for example rd.net.timeout.dhcp=100 rd.net.dhcp.retry=10) and this will work with NetworkManager from RHEL 8.3 which is available for OCP 4.7+.

Moving to MODIFIED.

Comment 26 Micah Abbott 2020-12-10 20:22:56 UTC

Marking verified in RHCOS 47.83.202012072242-0

I did a similar test of the `rd.net.timeout.dhcp=100` parameter in https://bugzilla.redhat.com/show_bug.cgi?id=1879094#c10

Comment 27 Yussuf Shaikh 2020-12-13 14:12:07 UTC

(In reply to Timothée Ravier from comment #24)
> Discussion is currently happening upstream to add support for an infinite
> waiting mode for DHCP.
> 
> In the meantime, you should be able to use the kernel command line options
> from https://bugzilla.redhat.com/show_bug.cgi?id=1879094#c3 (for example
> rd.net.timeout.dhcp=100 rd.net.dhcp.retry=10) and this will work with
> NetworkManager from RHEL 8.3 which is available for OCP 4.7+.
> 
> Moving to MODIFIED.

As I have mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1877740#c21 I have used the "ignition.config.timeouts.httpTotal" to retry the http hosted ignition file and getting the DHCP in next boot after the timeout. This is my current workaround and personally I prefer this to changing the kernel options.

Comment 30 errata-xmlrpc 2021-02-24 15:17:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633