Bug 1970779
Summary: | [OVN] Unable to assign master-0 for EgressIP even if the egress-assignable label is set | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | huirwang | |
Component: | Networking | Assignee: | Alexander Constantinescu <aconstan> | |
Networking sub component: | ovn-kubernetes | QA Contact: | huirwang | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | aconstan, wking | |
Version: | 4.7 | |||
Target Milestone: | --- | |||
Target Release: | 4.7.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1970833 (view as bug list) | Environment: | ||
Last Closed: | 2021-06-29 04:20:23 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1970833 | |||
Bug Blocks: |
Description
huirwang
2021-06-11 08:00:28 UTC
TL;DR: we can remove the regression tag. This is not a regression, but an existing problem on 4.7 that I think will be fixed by the commit: https://github.com/openshift/ovn-kubernetes/commit/1ba1ce885f089a023b4c803d85a2ffc7206eb98b (which is on 4.8 already) I've had a look at the cluster and I saw the following: The reason we can't assign the egress IP to master-0 is because the annotation which parses the primary IP address, has the following value: k8s.ovn.org/node-primary-ifaddr: '{"ipv4":"172.31.248.112/32"}', which is incorrect. We can see that from the default L3 config annotation that OVN has set: k8s.ovn.org/l3-gateway-config: '{"default":{"mode":"shared","interface-id":"br-ex_yinzhou-regre-kk9x9-master-0","mac-address":"00:50:56:ac:35:62","ip-addresses":["172.31.249.51/23"],"ip-address":"172.31.249.51/23","next-hops":["172.31.248.1"],"next-hop":"172.31.248.1","node-port-enable":"true","vlan-id":"0"}}' The IP: 172.31.248.112 is actually the cluster ingress VIP, which was associated with master-0 at one point: $ oc get cm -n kube-system cluster-config-v1 -o yaml ... platform: vsphere: apiVIP: 172.31.248.111 cluster: Cluster-1 datacenter: SDDC-Datacenter defaultDatastore: WorkloadDatastore ingressVIP: 172.31.248.112 network: qe-segment password: "" username: "" vCenter: vcenter.sddc-44-236-21-251.vmwarevmc.com publish: External pullSecret: "" ... If we look at the ovnkube-node logs on master-0 we can see it found the ingressVIP when it started, and the egress IP code which parses the primary IP address picks up the ingressVIP address instead of the correct one: oc logs -c ovnkube-node ovnkube-node-pj2z8 -n openshift-ovn-kubernetes | less ... I0611 05:24:57.154095 6122 gateway_localnet.go:183] Node local addresses initialized to: map[10.130.0.2:{10.130.0.0 fffffe00} 127.0.0.1:{127.0.0.0 ff000000} 172.31.248.112:{172.31.248.112 ffffffff} 172.31.249.51:{172.31.248.0 fffffe00} ::1:{::1 ffffffffffffffffffffffffffffffff} fe80::90ea:baff:fefd:6dc9:{fe80:: ffffffffffffffff0000000000000000} fe80::c1b:568f:b47c:b043:{fe80:: ffffffffffffffff0000000000000000}] ... I0611 05:24:57.990189 6122 kube.go:89] Setting annotations map[k8s.ovn.org/l3-gateway-config:{"default":{"mode":"shared","interface-id":"br-ex_yinzhou-regre-kk9x9-master-0","mac-address":"00:50:56:ac:35:62","ip-addresses":["172.31.249.51/23"],"ip-address":"172.31.249.51/23","next-hops":["172.31.248.1"],"next-hop":"172.31.248.1","node-port-enable":"true","vlan-id":"0"}} k8s.ovn.org/node-chassis-id:81f65b0b-619c-4ab5-a69c-6456b218144f k8s.ovn.org/node-mgmt-port-mac-address:92:ea:ba:fd:6d:c9 k8s.ovn.org/node-primary-ifaddr:{"ipv4":"172.31.248.112/32"}] on node yinzhou-regre-kk9x9-master-0 The problem is that the ovnkube-node code on 4.7 which parses the node IP used for egress IP assignments, uses the following function to get the default IP address: https://github.com/openshift/ovn-kubernetes/blob/release-4.7/go-controller/pkg/node/helper_linux.go#L98-L118 whereas the function parsing the L3 annotation uses https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/pkg/util/net_linux.go#L354-L380 Specifically the egress IP related code is missing: https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/pkg/util/net_linux.go#L370-L376 That will be fixed by back-porting the commit: https://github.com/openshift/ovn-kubernetes/commit/1ba1ce885f089a023b4c803d85a2ffc7206eb98b /Alex Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.18 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2502 |