Bug 1961666
| Summary: | In dracut allow enough time for DHCP allocation with dual stack, don't get stuck forever on missing IP family | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | vemporop | |
| Component: | NetworkManager | Assignee: | Beniamino Galvani <bgalvani> | |
| Status: | CLOSED ERRATA | QA Contact: | Filip Pokryvka <fpokryvk> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 8.2 | CC: | benoit, bgalvani, dustymabe, ferferna, fge, fpokryvk, jkonecny, keyoung, lrintel, mhrivnak, mko, rkhan, sfaye, sukulkar, till, vbenes | |
| Target Milestone: | beta | Keywords: | Triaged | |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | NetworkManager-1.36.0-0.2.el8 | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1990460 (view as bug list) | Environment: | ||
| Last Closed: | 2022-05-10 14:54:08 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1954580 | |||
| Bug Blocks: | 1990460 | |||
|
Description
vemporop
2021-05-18 13:03:30 UTC
I reported a similar issue that likely has the same root cause. Summary: a minimal ISO created by assisted-installer can't retrieve the rootfs from mirror.openshift.com (which is ipv4-only) if it gets a dhcp6 lease first. https://bugzilla.redhat.com/show_bug.cgi?id=1967632 *** Bug 1928345 has been marked as a duplicate of this bug. *** *** Bug 1967632 has been marked as a duplicate of this bug. *** This upstream issue/discussion targets this same problem: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/729 *** Bug 1947832 has been marked as a duplicate of this bug. *** I added a few dracut tests with delayed DHCP4 and 'ip=dhcp,dhcp6' kernel argument to NetworkManager-ci (including 2 NICs setup described above). PASS on latest RHEL8.5 build, FAIL on older builds. I wonder if during fixing this bug the following scenario was kept in mind * dual-stack system * IPv4 is obtained from DHCP as the first one * DHCPv6 is delayed * ignition is served from the IPv6-only host In this scenario we have IPv4 immediately, but we would need to wait for IPv6. Given that the fix mentions only "set required-timeout by default for IPv4 configuration", I wonder if the described scenario is still prone to fail. The test here would be to delay DHCPv6 and see if the machine waits for both IP addresses to be available. Related BZ in the Assisted Installer project - https://bugzilla.redhat.com/show_bug.cgi?id=2005498 The test failure is not breaking anything, just not fixing enough stuff. This is not important enough to block. the 8.5 GA. I am moving this bug to verify state and continue the fix in 8.6. Test feedback indicate the bug is only partially fixed. Change this bug to 8.6 and revoke the zstream approval. Once confirmed as fix via test on scratch build, we will review zstream for 8.5.0 and 8.4.0 again. (In reply to Mat Kowalski from comment #15) > I wonder if during fixing this bug the following scenario was kept in mind > > * dual-stack system > * IPv4 is obtained from DHCP as the first one > * DHCPv6 is delayed > * ignition is served from the IPv6-only host > > In this scenario we have IPv4 immediately, but we would need to wait for > IPv6. Given that the fix mentions only "set required-timeout by default for > IPv4 configuration", I wonder if the described scenario is still prone to > fail. The test here would be to delay DHCPv6 and see if the machine waits > for both IP addresses to be available. > > Related BZ in the Assisted Installer project - > https://bugzilla.redhat.com/show_bug.cgi?id=2005498 Hey Mat. The current default of ip=dhcp,dhcp6 was set to try to make sure that if someone had ipv4 or ipv6 networks the OS would still come up without needing to be configured. Then we hit issues where the first one would win and we massaged the behavior a bit to make it match the legacy network dracut module a bit so that it would wait for ipv4 a little longer. This was reasonable because it matched the legacy behavior and was likely to match more enironments (ipv4 being more common than ipv6). Unfortunately for you, we decided that forcing an extra wait/timeout for ipv6 wasn't reasonable to do in the default case since most environments probably don't have a ipv6+DHCP6 setup and would be waiting 20s for nothing. If you need ipv6, can you add `ip=dhcp6` to your setup? Talked with Mat. A general workaround for the following use case: - "I need ipv6 in my initramfs, but both ipv4 and ipv6 in my real root" is to provide `ip=dhcp6 coreos.no_persist_ip` on the kernel command line. This will give you ipv6 in the initramfs and BOTH ipv4 and ipv6 in your real root (because it won't propagate initramfs networking forward and the default behavior is both ipv6 and ipv4). In general though, we don't forsee changing the behavior of RHCOS by default to add a 20s timeout for DHCPv6 by default, even if the NetworkManager team changes what `ip=dhcp,dhcp6` means in https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/994. See https://github.com/coreos/fedora-coreos-tracker/issues/1000 Note that my comments in comment#19 and comment#20 are RHCOS specific and talk about defaults for RHCOS since that is where the original bug was reported and also where Mat is working. Please correct me if I get it wrong: The goal for this bug is to ensure the action of `ip=dhcp,dhcp6` and `ip=dhcp6,dhcp` in NetworkManager. 1. `ip=dhcp,dhcp6` and `ip=dhcp6,dhcp` generates identical results. 2. Both DHCPv4 and IPv6-Autoconf(DHCPv6 will be ran after route RA indicate so through ipv6-autoconf) will be enabled and run. 3. Single IP family is required pass. 4. Wait 20 seconds for secondary IP family DHCP/Autoconf. If user don't want to wait this extra 20 seconds, they could use: 1. ip=dhcp : Both DHCPv4 and DHCPv6 and Autoconf enabled, but only DHCPv4 required. 2. ip=dhcp6: Only IPv6 autoconf/DHCPv6 is enabled. IPv4 is disabled. I have added the test cases with slow ipv6, so now we have covered: * ip=dhcp,dhpc6, slow IPv4, nfsroot over IPv4 * ip=dhcp,dhpc6, slow IPv6, nfsroot over IPv6 * ip=dhcp,dhcp6, NIC1:IPv4 + IPv6, NIC2:IPv6, nfsroot over IPv4 * ip=dhcp,dhcp6, NIC1:slow IP4 + IPv6, NIC2:IPv6, nfsroot over IPv4 * ip=dhcp,dhcp6, NIC1:IPv4 + slow IPv6, NIC2:IPv6, nfsroot over IPv6 Hi Beniamino, I have noticed the tests are passing on 1.36.0-0.1.el8 as well [1][2]. Does it have the fix or the tests do not cover the bug (after FailedQA)? Also, as you see, there are too many combinations to check (possibility might be also slowing down the IPv6 only NIC, or try only ip=dhcp with IPv6 only NIC, or ip=dhcp6 when IPv4 only nic is present...), but dracut tests are consuming a lot of resources now. Is it worth we add tests for some of these combinations? Which you consider the most important? Thank you! [1] https://desktopqe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/beaker-NetworkManager-gitlab-trigger-test-upstream/2074/artifact/artifacts/report_NetworkManager-ci_Test0013_dracut_NM_NFS_root_nfs_ip_dhcp_dhcp6_with_slow_ip64_and_ip6_nic.html [2] https://desktopqe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/beaker-NetworkManager-gitlab-trigger-test-upstream/2074/artifact/artifacts/report_NetworkManager-ci_Test0010_dracut_NM_NFS_root_nfs_ip_dhcp_dhcp6_slow_ip6.html (In reply to Filip Pokryvka from comment #30) > Hi Beniamino, > > I have noticed the tests are passing on 1.36.0-0.1.el8 as well [1][2]. Does > it have the fix or the tests do not cover the bug (after FailedQA)? The fix was added in 1.33.4, so 1.36.0-0.1.el8 already includes it. > Which you consider the most important? These two seems the most important to cover this bz: * ip=dhcp,dhpc6, slow IPv4, nfsroot over IPv4 * ip=dhcp,dhpc6, slow IPv6, nfsroot over IPv6 The others are combinations of the two above with another NIC... I think they can be dropped to save resources. > possibility might be also slowing down the IPv6 only NIC This would test that NM signals completion only after all connections are done. I'm not sure we need it, as there are already other tests with multiple NICs, right? And so, a problem in this area would be probably caught by other tests... > or try only ip=dhcp with IPv6 only NIC > or ip=dhcp6 when IPv4 only nic For these, we already test in NM unit tests that a correct connection is generated: - for ip=dhcp with IPv4 enabled/required and IPv6 enabled/not-required - for ip=dhcp6 with IPv6 enabled/required and IPv4 disabled Once a correct connection is created, I don't expect we need to test anything else in integration tests. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:1985 |