Bug 1879094
Summary: | RHCOS dhcp kernel parameters not working as expected | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Prajyot Parab <pparab> |
Component: | RHCOS | Assignee: | Dusty Mabe <dustymabe> |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.6 | CC: | alogan, bbreard, hhei, imcleod, jligon, miabbott, nstielau, pradikum, travier, yshaikh |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: |
Feature: Ability to configure DHCP timeout
Reason: In certain DHCP environments, acquiring a DHCP lease may take longer than the default 45 seconds.
Result: Users now have the ability to configure the timeout value used when trying to acquire a DHCP lease.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:18:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Prajyot Parab
2020-09-15 12:36:58 UTC
Networking support in RHCOS 4.6 was improved, so that more complex configurations can be supported. Kernel args should still be a supported mechanism for configuring the network. Could you add `rd.break` to the kernel args and collect the journal from the system, so that we can see what the networking logs look like? This bug has not been selected for work in the current sprint. The NetworkManager team added support for the `rd.net.timeout.dhcp` upstream in https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/fbf54ab. I did a test with `rd.net.timeout.dhcp=100` in the `next` stream of Fedora CoreOS (based on Fedora 33 with NetworkManager-1.26.2-2.fc33.x86_64) and I see: ``` [ 2.828429] NetworkManager[496]: <info> [1603384675.3589] dhcp4 (ens2): activation: beginning transaction (timeout in 100 seconds) ``` where in RHCOS 4.6 right now I see it using the default of 45 sconds: ``` [ 4.247269] NetworkManager[734]: <info> [1603383872.0092] dhcp4 (ens2): activation: beginning transaction (timeout in 45 seconds) ``` Support for the `rd.net.timeout.dhcp` option should exist in OCP/RHCOS 4.7 since the version of NetworkManager in RHEL 8.3 will include it. As for `rd.net.dhcp.retry`, NM does automatically retry a few times when it times out, but it doesn't look like the number of times is configurable. If you think support for `rd.net.dhcp.retry` then please open a RFE against NetworkManager. Since `rd.net.timeout.dhcp` isn't supported in the NetworkManager in 4.6 we'll need to try to help you workaround the problem for now. Currently in my tests it looks like the default timeout is 45 seconds and the number of retries is 4, so the final timeout will occur at approximately 180 seconds. It sounds like your DHCP server is taking much longer than that to be able to service the request? This is being worked on, but is currently awaiting more investigation or more information and won't be completed this sprint. This bug will be fixed when we rebase RHCOS on top of RHEL 8.3. This will occur in the 4.7 timeframe in a future sprint. Moving to POST, as we expect to see 8.3 in RHCOS 4.7 soon. RHCOS 47.83.202012020056-0 includes RHEL 8.3 and `NetworkManager-1.26.0-9.el8_3` which should include the fix in comment #3. Moving to MODIFIED Verified with RHCOS 47.83.202012072242-0 As noted in comment #3, only `rd.net.timeout.dhcp` is supported by NetworkManager at this time, so I confirmed using that param was used by NM correctly: ``` [core@cosa-devsh ~]$ rpm-ostree status State: idle Deployments: * ostree://d70e44dde4765c2b59cedae6c399c7255a4bb877cc80b1be5c93cbe614b1d395 Version: 47.83.202012072242-0 (2020-12-07T22:46:11Z) [core@cosa-devsh ~]$ rpm -q NetworkManager NetworkManager-1.26.0-9.el8_3.x86_64 [core@cosa-devsh ~]$ cat /proc/cmdline | more BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-da2a55fc8655016771f867e78910e69d6ee3b93e3cbc5 aad74660e2b8d9c8e19/vmlinuz-4.18.0-240.7.1.el8_3.x86_64 random.trust_cpu=on cons ole=tty0 console=ttyS0,115200n8 ignition.platform.id=qemu ignition.firstboot ost ree=/ostree/boot.1/rhcos/da2a55fc8655016771f867e78910e69d6ee3b93e3cbc5aad74660e2 b8d9c8e19/0 rd.net.timeout.dhcp=100 [core@cosa-devsh ~]$ journalctl -b -u NetworkManager --no-pager | grep timeout Dec 10 17:09:35 localhost NetworkManager[1447]: <info> [1607620175.9195] dhcp4 (ens5): activation: beginning transaction (timeout in 100 seconds) ``` Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |