Bug 1958126

Summary: [OVN]Egressip doesn't take effect
Product: OpenShift Container Platform Reporter: huirwang
Component: NetworkingAssignee: Alexander Constantinescu <aconstan>
Networking sub component: ovn-kubernetes QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aconstan, anusaxen, philipp.dallig
Version: 4.8   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 23:07:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description huirwang 2021-05-07 09:11:45 UTC
Description of problem:
[OVN]Egressip doesn't take effect when egress node and pod node are same one


Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-05-06-210840

How reproducible:


Steps to Reproduce:
1. Label two nodes as egressip nodes
2. Create one egressip object

 oc get egressip
NAME       EGRESSIPS        ASSIGNED NODE                       ASSIGNED EGRESSIPS
egressip   172.31.249.182   huirwang-0507a-t7n9q-worker-46cpf   172.31.249.182

oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2021-05-07T08:38:46Z"
    generation: 2
    managedFields:
    - apiVersion: k8s.ovn.org/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:podSelector: {}
        f:status:
          .: {}
          f:items: {}
      manager: huirwang-0507a-t7n9q-master-1
      operation: Update
      time: "2021-05-07T08:38:46Z"
    - apiVersion: k8s.ovn.org/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:egressIPs: {}
          f:namespaceSelector:
            .: {}
            f:matchLabels:
              .: {}
              f:org: {}
      manager: kubectl-create
      operation: Update
      time: "2021-05-07T08:38:46Z"
    name: egressip
    resourceVersion: "167730"
    uid: 016a95fd-21d9-4c69-b98e-122da8c82505
  spec:
    egressIPs:
    - 172.31.249.182
    namespaceSelector:
      matchLabels:
        org: qe
    podSelector: {}
  status:
    items:
    - egressIP: 172.31.249.182
      node: huirwang-0507a-t7n9q-worker-46cpf
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

2. Create a project 0i9xy and a pod in it. Label the project with org=qe
oc get ns 0i9xy --show-labels
NAME    STATUS   AGE   LABELS
0i9xy   Active   15m   kubernetes.io/metadata.name=0i9xy,org=qe

3. Check the source ip from project 0i9xy
 while true; do date;curl -s --connect-timeout 5 172.31.249.80:9095 ;sleep 2; done
Fri May  7 08:41:13 UTC 2021
172.31.249.43Fri May  7 08:41:15 UTC 2021
172.31.249.43Fri May  7 08:41:17 UTC 2021
172.31.249.43Fri May  7 08:41:19 UTC 2021
172.31.249.43Fri May  7 08:41:21 UTC 2021
172.31.249.43Fri May  7 08:41:23 UTC 2021
172.31.249.43Fri May  7 08:41:25 UTC 2021
172.31.249.43Fri May  7 08:41:27 UTC 2021
........
172.31.249.43Fri May  7 08:53:03 UTC 2021
172.31.249.43Fri May  7 08:53:05 UTC 2021
172.31.249.43Fri May  7 08:53:07 UTC 2021
172.31.249.43Fri May  7 08:53:09 UTC 2021
.......
172.31.249.43Fri May  7 09:04:34 UTC 2021
172.31.249.43Fri May  7 09:04:36 UTC 2021

oc get nodes -o wide
NAME                                STATUS   ROLES    AGE     VERSION                INTERNAL-IP      EXTERNAL-IP      OS-IMAGE                                                       KERNEL-VERSION          CONTAINER-RUNTIME
huirwang-0507a-t7n9q-master-0       Ready    master   6h56m   v1.21.0-rc.0+291e731   172.31.249.18    172.31.249.18    Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8
huirwang-0507a-t7n9q-master-1       Ready    master   6h57m   v1.21.0-rc.0+291e731   172.31.249.193   172.31.249.193   Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8
huirwang-0507a-t7n9q-master-2       Ready    master   6h56m   v1.21.0-rc.0+291e731   172.31.249.126   172.31.249.126   Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8
huirwang-0507a-t7n9q-worker-46cpf   Ready    worker   6h46m   v1.21.0-rc.0+291e731   172.31.249.43    172.31.249.43    Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8
huirwang-0507a-t7n9q-worker-lcflj   Ready    worker   6h46m   v1.21.0-rc.0+291e731   172.31.249.87    172.31.249.87    Red Hat Enterprise Linux CoreOS 48.84.202105061618-0 (Ootpa)   4.18.0-293.el8.x86_64   cri-o://1.21.0-89.rhaos4.8.git3f6209a.el8

Actual results:
The EgressIP didn't take effect, outbound traffic is using node's IP.

Expected results:
Outbound traffic should use EgressIP as configured.

Additional info:

Comment 1 Alexander Constantinescu 2021-05-07 10:58:34 UTC
This BZ affects egress IP but is not caused by it. I am seeing the following listed on all nodes' GR:

[root@huirwang-0507a-t7n9q-master-1 ~]# ovn-nbctl -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt --db ssl:172.31.249.126:9641,ssl:172.31.249.18:9641,ssl:172.31.249.193:9641 lr-nat-list GR_huirwang-0507a-t7n9q-worker-46cpf
TYPE             EXTERNAL_IP        EXTERNAL_PORT    LOGICAL_IP            EXTERNAL_MAC         LOGICAL_PORT
snat             172.31.249.182                      10.128.2.92
snat             172.31.249.182                      10.128.2.106
snat             172.31.249.182                      10.128.2.105
snat             172.31.249.43                       10.128.2.5
snat             172.31.249.43                       10.128.2.6
snat             172.31.249.43                       10.128.2.92
snat             172.31.249.43                       10.128.2.4
snat             172.31.249.43                       10.128.2.49
snat             172.31.249.43                       10.128.2.105
snat             172.31.249.43                       10.128.2.3
snat             172.31.249.43                       10.128.2.28
snat             172.31.249.43                       10.128.2.26
snat             172.31.249.43                       10.128.2.77
snat             172.31.249.43                       10.128.2.106


That is incorrect. The only SNAT that should exist on the GR are the egress IP ones. In this case something is assigning a dedicated SNAT for every pod running on the node. This is in turn "scrambles" the egress IP configuration and causes OVN to use, not the egress IP dedicated SNAT, but the incorrect one - this is why we're not seeing the egress IP on the server's side. 

I've looked at the logs to see what command creates these SNAT objects and I've found the following:

W0507 09:10:16.872586       1 pods.go:334] Failed to get options for port: 0i9xy_test-rc-vc6hh
I0507 09:10:16.872665       1 kube.go:61] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":["10.128.2.105/23"],"mac_address":"0a:58:0a:80:02:69","gateway_ips":["10.128.2.1"],"ip_address":"10.128.2.105/23","gate
way_ip":"10.128.2.1"}}] on pod 0i9xy/test-rc-vc6hh
W0507 09:10:16.908185       1 pods.go:334] Failed to get options for port: 0i9xy_test-rc-7tcn6
I0507 09:10:16.908341       1 kube.go:61] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":["10.128.2.106/23"],"mac_address":"0a:58:0a:80:02:6a","gateway_ips":["10.128.2.1"],"ip_address":"10.128.2.106/23","gate
way_ip":"10.128.2.1"}}] on pod 0i9xy/test-rc-7tcn6
2021-05-07T09:10:16.940Z|04306|nbctl|INFO|Running command run -- add address_set 7a3c0f32-1bf3-4dd2-b1d9-bd157e72410f addresses "\"10.128.2.105\""
2021-05-07T09:10:16.953Z|04307|nbctl|INFO|Running command run --if-exists -- lr-nat-del GR_huirwang-0507a-t7n9q-worker-46cpf snat 10.128.2.105/32
2021-05-07T09:10:16.957Z|04308|nbctl|INFO|Running command run -- lr-nat-add GR_huirwang-0507a-t7n9q-worker-46cpf snat 172.31.249.43 10.128.2.105/32
2021-05-07T09:10:16.969Z|04309|nbctl|INFO|Running command run -- add address_set 7a3c0f32-1bf3-4dd2-b1d9-bd157e72410f addresses "\"10.128.2.106\""
I0507 09:10:16.977094       1 pods.go:289] [0i9xy/test-rc-vc6hh] addLogicalPort took 104.719222ms


It seems that some time during the pod setup in addLogicalPort we start setting up SNAT for the pod on the GR....this is happening for every pod on every node. 

The reason this is happening is because commit: https://github.com/openshift/cluster-network-operator/commit/14a5e41bb9b8fedaec0037b8551be4888e0ac821 added --disable-snat-multiple-gws to ovnkube-master which now does that pod setup in addLogicalPort. 

This is also the reason upstream CI did not pick the problem up (we have E2E tests for egress IP), because that option is OpenShift specific. 

I need to talk to the Platform team about this. But this is clearly a regression that breaks egress IP for OpenShift, and I am thus setting the blocker+ flag. 

Moreover: those pod annotation seem completely off to me: "ip_addresses":["10.128.2.106/23"] is not correct. 

Also another (cosmetic) problem is that even though that flag is provided to ovnkube-master, the logged "parsed config" does not indicate that it's been correctly set:

+ exec /usr/bin/ovnkube --init-master huirwang-0507a-t7n9q-master-1 --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 --metrics-bind-address 127.0.0.1:29102 --gateway-mode shared --gateway-interface br-ex --sb-address ssl:172.31.249.126:9642,ssl:172.31.249.18:9642,ssl:172.31.249.193:9642 --sb-client-privkey /ovn-cert/tls.key --sb-client-cert /ovn-cert/tls.crt --sb-client-cacert /ovn-ca/ca-bundle.crt --sb-cert-common-name ovn --nb-address ssl:172.31.249.126:9641,ssl:172.31.249.18:9641,ssl:172.31.249.193:9641 --nb-client-privkey /ovn-cert/tls.key --nb-client-cert /ovn-cert/tls.crt --nb-client-cacert /ovn-ca/ca-bundle.crt --nbctl-daemon-mode --nb-cert-common-name ovn --enable-multicast --disable-snat-multiple-gws --acl-logging-rate-limit 20
I0507 05:34:25.278043       1 config.go:1437] Parsed config file /run/ovnkube-config/ovnkube.conf
I0507 05:34:25.278112       1 config.go:1438] Parsed config: {Default:{MTU:1400 ConntrackZone:64000 EncapType:geneve EncapIP: EncapPort:6081 InactivityProbe:100000 OpenFlowProbe:180 RawClusterSubnets:10.128.0.0/14/23 ClusterSubnets:[]} Logging:{File: CNIFile: Level:4 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:5 ACLLoggingRateLimit:20} Monitoring:{RawNetFlowTargets: RawSFlowTargets: RawIPFIXTargets: NetFlowTargets:[] SFlowTargets:[] IPFIXTargets:[]} CNI:{ConfDir:/etc/cni/net.d Plugin:ovn-k8s-cni-overlay} OVNKubernetesFeature:{EnableEgressIP:true} Kubernetes:{Kubeconfig: CACert: APIServer:https://api-int.huirwang-0507a.qe.devcluster.openshift.com:6443 Token: CompatServiceCIDR: RawServiceCIDRs:172.30.0.0/16 ServiceCIDRs:[] OVNConfigNamespace:openshift-ovn-kubernetes MetricsBindAddress: OVNMetricsBindAddress: MetricsEnablePprof:false OVNEmptyLbEvents:false PodIP: RawNoHostSubnetNodes: NoHostSubnetNodes:nil HostNetworkNamespace:openshift-host-network} OvnNorth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: northbound:false exec:<nil>} OvnSouth:{Address: PrivKey: Cert: CACert: CertCommonName: Scheme: northbound:false exec:<nil>} Gateway:{Mode:local Interface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64} MasterHA:{ElectionLeaseDuration:60 ElectionRenewDeadline:30 ElectionRetryPeriod:20} HybridOverlay:{Enabled:false RawClusterSubnets: ClusterSubnets:[] VXLANPort:4789} OvnKubeNode:{Mode:full}}

Specifically: DisableSNATMultipleGWs:false which incorrectly indicating that the flag was not provided. The flag was provided, so that should be DisableSNATMultipleGWs:true

Comment 6 errata-xmlrpc 2021-07-27 23:07:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438