Description of problem: [OVN AWS] Multiple EgressIP objects configured, EgressIPs weren't working properly Version-Release number of selected component (if applicable): 4.10.0-0.ci.test-2021-12-21-010955-ci-ln-qp8x36t-latest latest 4.10 including PR https://github.com/openshift/cloud-network-config-controller/pull/12 How reproducible: Steps to Reproduce: 1. Tag 3 nodes as Egress nodes 2. Create multiple EgressIPs $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip-1 10.0.58.100 ip-10-0-58-47.us-east-2.compute.internal 10.0.58.100 egressip-example6 10.0.58.110 ip-10-0-67-155.us-east-2.compute.internal 10.0.67.112 egressip4 10.0.58.101 ip-10-0-61-37.us-east-2.compute.internal 10.0.58.101 $ oc get egressip -o yaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2021-12-21T02:29:01Z" generation: 2 name: egressip-1 resourceVersion: "38607" uid: d29a69da-cc65-40ce-98a6-adfa9c8bd300 spec: egressIPs: - 10.0.58.100 namespaceSelector: matchLabels: name: test podSelector: {} status: items: - egressIP: 10.0.58.100 node: ip-10-0-58-47.us-east-2.compute.internal - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2021-12-21T03:33:45Z" generation: 4 name: egressip-example6 resourceVersion: "60413" uid: 0ba7f6bc-aed7-45f7-b159-de877da0e8be spec: egressIPs: - 10.0.58.110 - 10.0.58.111 - 10.0.67.112 namespaceSelector: matchLabels: team: red podSelector: {} status: items: - egressIP: 10.0.67.112 node: ip-10-0-67-155.us-east-2.compute.internal - egressIP: 10.0.58.110 node: ip-10-0-61-37.us-east-2.compute.internal - egressIP: 10.0.58.111 node: ip-10-0-58-47.us-east-2.compute.internal - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2021-12-21T03:19:37Z" generation: 4 name: egressip4 resourceVersion: "56234" uid: 86562d81-4e00-427c-98c4-726db3ccf6f9 spec: egressIPs: - 10.0.58.101 - 10.0.58.102 - 10.0.67.100 namespaceSelector: matchLabels: team: red podSelector: matchLabels: team: blue status: items: - egressIP: 10.0.58.101 node: ip-10-0-61-37.us-east-2.compute.internal - egressIP: 10.0.58.102 node: ip-10-0-58-47.us-east-2.compute.internal - egressIP: 10.0.67.100 node: ip-10-0-67-155.us-east-2.compute.internal kind: List metadata: resourceVersion: "" selfLink: "" 3. Create 2 namespace, test2 and hrw and some pods in the namespaces oc get ns hrw --show-labels NAME STATUS AGE LABELS hrw Active 17m kubernetes.io/metadata.name=hrw,team=red $ oc get pod -n hrw --show-labels NAME READY STATUS RESTARTS AGE LABELS test-rc-qpt8v 1/1 Running 0 17m name=test-pods test-rc-rwsxm 1/1 Running 0 17m name=test-pods $ oc get ns test2 --show-labels NAME STATUS AGE LABELS test2 Active 82m kubernetes.io/metadata.name=test2,team=red oc get pod -n test2 --show-labels NAME READY STATUS RESTARTS AGE LABELS test-rc-65z6n 1/1 Running 0 71m name=test-pods,team=blue test-rc-hqpfm 1/1 Running 0 71m name=test-pods test-rc-kmjjq 1/1 Running 0 71m name=test-pods test-rc-kpjst 1/1 Running 0 71m name=test-pods test-rc-mghm8 1/1 Running 0 71m name=test-pods 3. Check the egressip function from pods in hrw and in test2 Actual results: From pod test-rc-qpt8v in namespace hrw, the EgressIP uses EgressIPs in object egressip4, this is not correct. $ oc rsh -n hrw test-rc-qpt8v ~ $ while true;do curl 10.0.2.196:9095 --connect-timeout 2 -s ; echo "";sleep 2; done 10.0.58.102 10.0.58.102 10.0.58.102 10.0.67.100 10.0.58.102 10.0.58.102 10.0.67.100 10.0.67.100 10.0.67.100 10.0.67.100 10.0.67.100 10.0.58.101 10.0.58.101 From pod test-rc-65z6n in namespace test2, the load balance happened between 2 IPs, but here we configured 3. ~ $ while true;do curl 10.0.2.196:9095 --connect-timeout 2 -s ; echo "";sleep 2; done 10.0.58.101 10.0.58.102 10.0.58.102 10.0.58.102 10.0.58.102 10.0.58.101 10.0.58.102 10.0.58.102 10.0.58.102 10.0.58.101 10.0.58.102 10.0.58.102 10.0.58.101 10.0.58.102 10.0.58.101 10.0.58.101 10.0.58.101 10.0.58.101 10.0.58.101 10.0.58.101 10.0.58.101 10.0.58.102 10.0.58.101 10.0.58.102 10.0.58.102 10.0.58.102 10.0.58.101 10.0.58.101 10.0.58.101 10.0.58.101 10.0.58.101 10.0.58.101 10.0.58.102 10.0.58.102 10.0.58.102 10.0.58.102 10.0.58.102 10.0.58.101 10.0.58.102 10.0.58.102 10.0.58.102 Expected results: The pod should uses the matched EgressIP object. Additional info:
Can you give me a little more info about what is running on "10.0.2.196:9095"? Is that an external server that pints back the ip of the curl client? I would like to try it! ;) I am assuming that to reproduce this issue you did not have a specific script and simply added/remove configs until you got the cluster in this bad state, correct? * Regarding issue 1 of 2: pod test-rc-qpt8v using snat from egressip4 There may be a bug in the logic for deciding which egressip is usable by a given pod. Since "egressip-example6" is a superset of "egressip4", would you expect any pods from your example -- including "test-rc-65z6n" -- to use "egressip4" ? The documentation [1] is not very clear on that, so I wonder if this is some undetermined behavior. Or I may be missing something. I will look at the code some more, but I clearly see that ovn-k8 is adding the improper NAT in OVN: [root@3aa61e97a1fe ~]# ovn-nbctl list logical_switch_port hrw_test-rc-qpt8v _uuid : d75b075f-88f5-4bd6-ab4c-636fb5bd908b addresses : ["0a:58:0a:80:02:0c 10.128.2.12"] ... [root@a5eae22bcd51 ~]# ovn-nbctl lr-nat-list GR_ip-10-0-58-47.us-east-2.compute.internal | grep 10.128.2.12 snat 10.0.58.102 10.128.2.12 [root@a5eae22bcd51 ~]# ovn-nbctl lr-nat-list GR_ip-10-0-61-37.us-east-2.compute.internal | grep 10.128.2.12 snat 10.0.58.101 10.128.2.12 [root@a5eae22bcd51 ~]# ovn-nbctl lr-nat-list GR_ip-10-0-67-155.us-east-2.compute.internal | grep 10.128.2.12 snat 10.0.67.100 10.128.2.12 Note from the output above that the pod's ip was not added to any of the egress ips of "egressip-example6", which is the exact opposite of what it should have done. :P [1]: https://docs.openshift.com/container-platform/4.9/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.html * Regarding issue 2 of 2: only 2 out of the 3 snat address are being observed The reason we never see "10.0.67.112" is because of bug 2029742, where ovn_cluster_router is left with duplicate and wrong re-routes. Can you please retry this test with the fixes in this PR: https://github.com/ovn-org/ovn-kubernetes/pull/2735 , or wait for that bug to be merged? [root@a5eae22bcd51 ~]# ovn-nbctl lr-nat-list GR_ip-10-0-58-47.us-east-2.compute.internal TYPE EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT snat 10.0.58.111 10.129.2.23 ... [root@a5eae22bcd51 ~]# ovn-nbctl lr-nat-list GR_ip-10-0-61-37.us-east-2.compute.internal TYPE EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT snat 10.0.58.110 10.129.2.23 ... [root@a5eae22bcd51 ~]# ovn-nbctl lr-nat-list GR_ip-10-0-67-155.us-east-2.compute.internal TYPE EXTERNAL_IP EXTERNAL_PORT LOGICAL_IP EXTERNAL_MAC LOGICAL_PORT snat 10.0.67.112 10.129.2.23 ... [root@a5eae22bcd51 ~]# ovn-nbctl show GR_ip-10-0-58-47.us-east-2.compute.internal router 4cc1d62f-b9cd-4be4-9708-3c3a538bdf9e (GR_ip-10-0-58-47.us-east-2.compute.internal) port rtoj-GR_ip-10-0-58-47.us-east-2.compute.internal mac: "0a:58:64:40:00:07" networks: ["100.64.0.7/16"] ... [root@a5eae22bcd51 ~]# ovn-nbctl show GR_ip-10-0-61-37.us-east-2.compute.internal router fde7b52b-e3ec-41ee-89d5-48504ff93cd2 (GR_ip-10-0-61-37.us-east-2.compute.internal) port rtoj-GR_ip-10-0-61-37.us-east-2.compute.internal mac: "0a:58:64:40:00:05" networks: ["100.64.0.5/16"] ... [root@a5eae22bcd51 ~]# ovn-nbctl show GR_ip-10-0-67-155.us-east-2.compute.internal router 498fe80d-48ce-42e2-8ad7-d42ee766d657 (GR_ip-10-0-67-155.us-east-2.compute.internal) port rtoj-GR_ip-10-0-67-155.us-east-2.compute.internal mac: "0a:58:64:40:00:06" networks: ["100.64.0.6/16"] [root@a5eae22bcd51 ~]# ovn-nbctl lr-policy-list ovn_cluster_router Routing Policies ... 100 ip4.src == 10.129.2.23 reroute 100.64.0.5, 100.64.0.6, 100.64.0.7 <--- "37", "155", "47" 100 ip4.src == 10.129.2.23 reroute 100.64.0.5, 100.64.0.6, 100.64.0.7 <--- DUPLICATE 100 ip4.src == 10.129.2.23 reroute 100.64.0.5, 100.64.0.7 <--- DUPLICATE AND WRONG!!! ... # pod on nodeA -> node-switch on nodeA -> ovn-cluster-router (hits this 100 reroute policy) -> join switch -> GR (snat) -> external switch -> outside
Found flaw in logic where egressip's pod selector was not properly checking the labels of the pod. Potential fix posted upstream: https://github.com/ovn-org/ovn-kubernetes/pull/2742 Alexander asked me to give this bug to him; hopefully that is okay. :^)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056