Bug 2040671 - [Feature:IPv6DualStack] most tests are failing in dualstack ipi
Summary: [Feature:IPv6DualStack] most tests are failing in dualstack ipi
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.10.0
Assignee: Tim Rozet
QA Contact: Rio Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-14 12:38 UTC by Derek Higgins
Modified: 2022-03-10 16:39 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:39:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-baremetal-operator pull 239 0 None Merged Bug 2040671: Fix the way the network stack is determined 2022-01-27 10:41:09 UTC
Github openshift machine-config-operator pull 2929 0 None Merged Bug 2040671: Configure-ovs: Ensure DHCP finishes for both address families 2022-01-27 10:41:11 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:39:54 UTC

Description Derek Higgins 2022-01-14 12:38:14 UTC
Now that bz#2034527 is fixed the cluster in dualstack jobs is provisioning but most (not all) of the Feature:IPv6DualStack tests are failing

e.g. (from https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-dualstack/1481806814159835136)

9 passed and 10 failed

[Feature:IPv6DualStack] should be able to reach pod on ipv4 and ipv6 ip [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Passed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should be able to handle large requests: udp [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Passed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should function for pod-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Failed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should function for node-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Failed"
[Feature:IPv6DualStack] should create a single stack service with cluster ip from primary service range [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Passed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should function for node-Service: http [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Failed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should update endpoints: http [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Passed"
[Feature:IPv6DualStack] should create service with ipv6,v4 cluster ip [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Failed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should function for endpoint-Service: udp [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Failed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should function for service endpoints using hostNetwork [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Failed"
[Feature:IPv6DualStack] should create service with ipv4,v6 cluster ip [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Failed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should function for endpoint-Service: http [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Failed"
[Feature:IPv6DualStack] should create pod, add ipv6 and ipv4 ip to pod ips [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Passed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should function for pod-Service: http [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Failed"
[Feature:IPv6DualStack] should have ipv4 and ipv6 internal node ip [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Failed"
[Feature:IPv6DualStack] should create service with ipv6 cluster ip [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Passed"
[Feature:IPv6DualStack] should create service with ipv4 cluster ip [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Passed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should update endpoints: udp [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Passed"
[Feature:IPv6DualStack] Granular Checks: Services Secondary IP Family [LinuxOnly] should be able to handle large requests: http [Suite:openshift/conformance/parallel] [Suite:k8s]" e2e test finished As "Passed"

Comment 1 Arda Guclu 2022-01-14 13:35:45 UTC
I saw a couple of fixes in 1.23. To see the overall fail rate, we might wait;

https://github.com/openshift/origin/pull/26711

I'm not sure it will totally clear all failures but we can assure that there is a real problem.

Comment 2 Zane Bitter 2022-01-14 20:57:07 UTC
Looking at the kernel command line, we have:

Jan 14 02:55:33.953747 localhost kernel: Command line: BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-b0c17d2741adcd571b08d69dbae84aa880469604b4ea9d5e7b6ccbe53c3a3cf2/vmlinuz-4.18.0-305.28.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ignition.firstboot ostree=/ostree/boot.1/rhcos/b0c17d2741adcd571b08d69dbae84aa880469604b4ea9d5e7b6ccbe53c3a3cf2/0 ip=dhcp

This is on both control plane and workers.

We don't pass any ip= arg for dual-stack in CBO for workers:
https://github.com/openshift/cluster-baremetal-operator/blob/master/provisioning/baremetal_pod.go#L326-L327
Although we do in the installer for the control plane:
https://github.com/openshift/installer/blob/master/data/data/bootstrap/baremetal/files/usr/local/bin/startironic.sh.template#L100-L104

It's not clear where this is coming from. It could be that because we are booting the live ISO with this arg (for IPA) it is getting copied automatically to the installed image on disk? Or possibly it's a default arg?

Comment 3 Zane Bitter 2022-01-14 22:19:30 UTC
Trying out setting ip=dhcp,dhcp6: https://github.com/openshift/cluster-baremetal-operator/pull/236

Comment 5 Zane Bitter 2022-01-15 01:01:06 UTC
That had no effect. I wonder if the MCO sets the kernel command line flags after the initial boot+update+reboot.

Comment 6 Derek Higgins 2022-01-17 14:39:26 UTC
(In reply to Zane Bitter from comment #5)
> We don't pass any ip= arg for dual-stack in CBO for workers:

Looking at the image-customization-controller its being started with
              {
                "name": "IP_OPTIONS",
                "value": "ip=dhcp"
              },

Looks to me like this is based on what the int-api resolves to
https://github.com/openshift/cluster-baremetal-operator/blob/2655e07/controllers/provisioning_controller.go#L489-L495

> I0114 02:43:20.103478       1 provisioning_controller.go:495] "Network stack calculation" APIServerInternalHost="api-int.ostest.test.metalkube.org" NetworkStack=1

Are we expecting NetworkStack=3 for dualstack?

Comment 7 Zane Bitter 2022-01-17 15:12:10 UTC
(In reply to Derek Higgins from comment #6)
> > I0114 02:43:20.103478       1 provisioning_controller.go:495] "Network stack calculation" APIServerInternalHost="api-int.ostest.test.metalkube.org" NetworkStack=1
> 
> Are we expecting NetworkStack=3 for dualstack?

I'd have thought so. There's no other time when NetworkStack should be 3, so why else would it exist?

Comment 8 Ben Nemec 2022-01-17 16:17:49 UTC
I believe we set ip=dhcp in dual stack to ensure all of the nodes used the same ip version to resolve their hostnames. Otherwise we got some nodes using short names and others using fqdns depending on which DHCP server responded fastest.

Comment 9 Zane Bitter 2022-01-17 18:12:18 UTC
Original patch adding this was here (for bug 1946079): https://github.com/openshift/cluster-baremetal-operator/pull/148#issue-903275273

> This should be equal to
> "ip=dhcp" (when ipv4 only)
> "ip=dhcp6" (when ipv6 only)
> "" (when dual stack)

It appears that leaving it empty for dual-stack was based on this comment: https://bugzilla.redhat.com/show_bug.cgi?id=1931852#c8 that "ip=dhcp,dhcp6" is not a thing, which was probably true at the time but NetworkManager has since been changed to accept it beginning with 1.34: https://networkmanager.dev/blog/networkmanager-1-34/

There were several rounds of patches to get it to work:
https://github.com/openshift/cluster-baremetal-operator/pull/151
https://github.com/openshift/cluster-baremetal-operator/pull/158
https://github.com/openshift/cluster-baremetal-operator/pull/163

but none of these state as a goal that we should pass ip=dhcp on dual-stack. In fact they make it more explicit that we should pass "" for dual-stack: https://github.com/openshift/cluster-baremetal-operator/pull/158/commits/4589e4c490154ea20d6b28c69283ce3a6efd278f#diff-1575ce96065be1a97bee923445ae60115c8ce02b4a2736788012df8162407100R273-R274
Maybe @sdasu can comment, since she wrote https://github.com/openshift/cluster-baremetal-operator/pull/158/commits/5d37ad30322b511361c352ade461c96f5540190b


I suspect that if we actually pass ip=dhcp,dhcp6 (now that it's supported) we will probably get consistent hostnames, since we'll wait for DHCP on *both* IPv4 and IPv6 networks, and everything will work.

Comment 10 Ben Nemec 2022-01-17 21:11:53 UTC
Interesting. That doesn't match my recollection of the discussions I was involved in, but it's entirely possible I'm just wrong. I'm not sure ip=dhcp,dhcp6 helps here because it's not a question of all interfaces getting addresses, it's a question of what order they get addresses in. That can affect which hostname the system chooses (I think, anyway).

If one system gets a DHCP response first with a shortname, and another gets a DHCP6 response first with an FQDN then their hostname formats will differ. I think this can be a problem even within a single system if DHCP returns first during inspection and the DHCP6 returns first after deployment. The CSR name may not match.

However, this reminds me that ip=dhcp doesn't do what you expect anyway. Example:
$ /usr/libexec/nm-initrd-generator ip=dhcp -s

*** Connection 'default_connection' ***

[connection]
id=Wired Connection
uuid=b4a119b4-ebfb-4ea9-8de2-93b24f276076
type=ethernet
autoconnect-retries=1
multi-connect=3
permissions=

[ipv4]
dhcp-timeout=90
dns-search=
may-fail=false
method=auto

[ipv6]
addr-gen-mode=eui64
dhcp-timeout=90
dns-search=
method=auto
...


Note that even with ip=dhcp, ipv6 is still set to auto (I guess it just doesn't set may-fail to false?). That matches what I saw on the nodes in my investigation of https://bugzilla.redhat.com/show_bug.cgi?id=1982821 . With ip=dhcp6 the behavior is as expected.

So I'm not sure ip=dhcp would help this situation anyway. And I'm hoping someone else still remembers some context about this since I clearly don't. :-)

Comment 11 Zane Bitter 2022-01-18 01:19:26 UTC
(In reply to Ben Nemec from comment #10)
> Interesting. That doesn't match my recollection of the discussions I was
> involved in, but it's entirely possible I'm just wrong. I'm not sure
> ip=dhcp,dhcp6 helps here because it's not a question of all interfaces
> getting addresses, it's a question of what order they get addresses in. That
> can affect which hostname the system chooses (I think, anyway).

Won't https://github.com/openshift/image-customization-controller/blob/main/pkg/ignition/builder.go#L96-L99 ensure that we always use the IPv6 hostname if we get a DHCPv6 response?

> If one system gets a DHCP response first with a shortname, and another gets
> a DHCP6 response first with an FQDN then their hostname formats will differ.
> I think this can be a problem even within a single system if DHCP returns
> first during inspection and the DHCP6 returns first after deployment. The
> CSR name may not match.

That could be a problem if the provisioning network is v4 and we get a different hostname for v6.

> However, this reminds me that ip=dhcp doesn't do what you expect anyway.

I expect it to wait until we have an IPv4 address from DHCP before declaring the network up. I didn't expect it to disable IPv6.

Comment 12 Bob Fournier 2022-01-18 17:26:37 UTC
Sandhya will take at look at https://bugzilla.redhat.com/show_bug.cgi?id=2040671#c9 to see if we can try that change.

Comment 13 Zane Bitter 2022-01-19 16:49:52 UTC
There is some suggestion that the test failures might be nothing to do with the metal platform, and will be resolved by bug 2033751.

Comment 14 Dan Williams 2022-01-20 02:46:54 UTC
(In reply to Zane Bitter from comment #13)
> There is some suggestion that the test failures might be nothing to do with
> the metal platform, and will be resolved by bug 2033751.

At least "[sig-network] [Feature:IPv6DualStack] should have ipv4 and ipv6 internal node ip" is still likely platform, because that means kubelet doesn't know about both node addresses enough to post them to the apiserver in its node.Status.Addresses.

Comment 15 Derek Higgins 2022-01-20 10:12:33 UTC
(In reply to Zane Bitter from comment #13)
> There is some suggestion that the test failures might be nothing to do with
> the metal platform, and will be resolved by bug 2033751.

2 of the 10 tests the were failing now appear to be passing, I'm guessing due to the bump in kubebetes version https://bugzilla.redhat.com/show_bug.cgi?id=2033751
[Feature:IPv6DualStack] should create service with ipv4,v6 cluster ip [Suite:openshift/conformance/parallel]
[Feature:IPv6DualStack] should create service with ipv6,v4 cluster ip [Suite:openshift/conformance/parallel]
They started passing yesterday evening
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-dualstack

Comment 16 Ben Nemec 2022-01-22 01:08:50 UTC
I think this may be a timing thing. Check out these journal logs:

Jan 22 00:04:45 master-0.ostest.test.metalkube.org NetworkManager[3935]: <info>  [1642809885.0510] dhcp4 (br-ex): state changed unknown -> bound, address=192.168.111.20
[snip]
Jan 22 00:04:46 master-0.ostest.test.metalkube.org NetworkManager[3935]: <info>  [1642809886.2684] dhcp6 (br-ex): state changed unknown -> bound, address=fd2e:6f44:5dd8:c956::14

Meanwhile, in nodeip-configuration:

Jan 22 00:04:45 master-0.ostest.test.metalkube.org bash[5361]: time="2022-01-22T00:04:45Z" level=debug msg="retrieved Address map map[0xc0002c1e60:[127.0.0.1/8 lo ::1/128] 0xc0003d4000:[fd00:1101::893:937e:62cd:8cf8/128] 0xc0003d46c0:[1>

We somehow managed to retrieve the address map at the exact moment configure-ovs was reconfiguring the interface so it only had the v4 address and not the v6 one. This is happening consistently in my local env, and I see the same thing in a journal log from ci.

Theoretically we should be able to fix this by enforcing an ordering on the two services, but I'm having trouble with that locally. When I add a dependency one of the services seems to not run at all. :-/

Comment 17 Zane Bitter 2022-01-22 04:32:59 UTC
(In reply to Ben Nemec from comment #16)
> We somehow managed to retrieve the address map at the exact moment
> configure-ovs was reconfiguring the interface so it only had the v4 address
> and not the v6 one. This is happening consistently in my local env, and I
> see the same thing in a journal log from ci.
> 
> Theoretically we should be able to fix this by enforcing an ordering on the
> two services, but I'm having trouble with that locally. When I add a
> dependency one of the services seems to not run at all. :-/

It looks to me like configure-ovs is the last thing to run before network-online.target:
https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/units/ovs-configuration.service.yaml#L12

and node-ip-configuration is one of the first things to run after network-online.target:
https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/units/nodeip-configuration.service.yaml#L7

So they are ordered. configure-ovs restarts network-manager and waits for connections to come up, but if it doesn't wait for DHCP to get both IP addresses again (which certainly appears to be the case) then that would explain why it's quite consistently not getting both IPs.


Comparing to the last successful periodic job run, it seems quite possible that the difference was that the configure-ovs script used to contain an unconditional 5s sleep, which was removed:
https://github.com/openshift/machine-config-operator/commit/9cc7ac42a69474566a6930f80f72190769319f30#diff-afb45a3711a77d94f26471d9d94a7f7a03d931d9e72bdf849f2e26e2711d6fd7L340
by https://github.com/openshift/machine-config-operator/pull/2864

The last successful periodic job used the commit immediate before that PR merged. The metal dualstack job for the PR failed with the same symptoms as we see here ("should have ipv4 and ipv6 internal node ip": https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/2864/pull-ci-openshift-machine-config-operator-master-e2e-metal-ipi-ovn-dualstack/1470852830548987904 - this test ran on 15 Dec when the periodic job was still passing, and the metal platform changes didn't merge until 17 Dec.

The day after the PR merged, and before the next periodic job ran, the metal platform changes that caused bug 2034527 broke building the cluster at all in the metal-dualstack job, thus masking this problem until it was fixed.

Comment 18 Zane Bitter 2022-01-22 13:54:38 UTC
Testing confirms it: https://github.com/openshift/machine-config-operator/pull/2924

Obviously just adding a 5s sleep isn't really a solution, we should actually wait for the DHCP to succeed or timeout.

Comment 19 Yuval Kashtan 2022-01-23 14:22:09 UTC
timeout will cause a retry?

Because there might be cases where DHCP is unavailable, even for few hours, and then return..

Comment 20 Zane Bitter 2022-01-24 15:46:32 UTC
It should use the same logic the NetworkManager uses at startup to determine when the network is ready, instead of just declaring success as soon as the port is up without waiting for DHCP.

Comment 21 Yuval Kashtan 2022-01-24 15:55:38 UTC
that make sense,
but in the past we used to add ip=dhcp if api ip is ipv4 and ip=dhcp6 if it's ipv6
because otherwise, what happend in many cases (at least in our lab) is that
node received ipv6 (but not ipv6) NM declare network ready, and the rest of the deployment failed, when trying to fetch something from api over ipv4 (because it was not configured)

Comment 27 zhaozhanqi 2022-01-28 05:38:57 UTC
From the latest job logs https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-metal-ipi-ovn-dualstack/1486826255205535744, these cases are pass
Move this bug to verified.

Comment 30 errata-xmlrpc 2022-03-10 16:39:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.