Bug 1766583

Summary: [3.11] EgressIP doesn't work with NetworkPolicy unless traffic from default project is allowed
Product: OpenShift Container Platform Reporter: K Chandra Sekar <csekar>
Component: NetworkingAssignee: Juan Luis de Sousa-Valadas <jdesousa>
Networking sub component: openshift-sdn QA Contact: huirwang
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: unspecified CC: acai, aos-bugs, bbennett, danw, dkulkarn, emahoney, farandac, huirwang, jdesousa, piqin, rvanderp, travi, zzhao
Version: 3.11.0Keywords: NeedsTestCase
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1741477 Environment:
Last Closed: 2020-09-03 13:47:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1700431, 1741477, 1741499    
Bug Blocks:    

Description K Chandra Sekar 2019-10-29 13:04:20 UTC
+++ This bug was initially created as a clone of Bug #1741477 +++

+++ This bug was initially created as a clone of Bug #1700431 +++

Description of problem:
Customer reports when they use networkPolicy combined with egressIP unless

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create a project X using egressIP
2. Add egressIP to node A
3. Create a pod in project X which is *not* running on node A.
4. Create a networkPolicy which only allows traffic from itself:

- apiVersion: extensions/v1beta1
  kind: NetworkPolicy
    name: deny-ingress-from-other-namespaces
    - from:
      - podSelector: {}
    podSelector: {}
    - Ingress
5. Go to the pod in project X and try to reach a resource outside OpenShift. Traffic is dropped.
6. Create a rule that allows traffic from the default project (assumes the default netnamespace netid equals 0 and the project has a label project=default)
- apiVersion: extensions/v1beta1
  kind: NetworkPolicy
    name: allow from default
    - from:
      - podSelector: {}
      - namespaceSelector:
            project: default
    podSelector: {}
    - Ingress
Traffic works
7. Completely remove every networkPolicy. Traffic also works

Actual results:
Packet is dropped somewhere.

Expected results:
Packet goes through the egressIP and comes back

Additional info:
Additionally the customer has an egressNetworkPolicy which allows traffic to destination and denies traffic by default. I believe this egressNetworkPolicy is unrelated to the issue.

Opening this new bug as the customer is still facing the issue reported in the above bugzilla as the errata[1] didn't fixed the issue for them

[1] - https://access.redhat.com/errata/RHBA-2019:2816

Comment 4 zhaozhanqi 2019-12-12 06:23:35 UTC
hi, huiran, could you help try if this can be reproduced?

Comment 10 Juan Luis de Sousa-Valadas 2020-01-02 10:39:22 UTC
Hi Chandra,
Because QA cannot reproduce the issue aaand 3.11.146 should already have the fix, I'm going to need the following information:

1- oc get namespace <project with egressIP>
2- oc get hostsubnet/<node hosting the pod> hostsubnet/<node with the hostsubnet>
3- oc get pod <affected pod> -o wide
4- oc get clusternetwork
5- In both nodes (the one with the egressIP and the one hosting the pod): oc rsh <pod name> ovs-ofctl -O OpenFlow13 dump-flows br0
6- In both nodes: iptables-save
7- In both nodes: The file /etc/origin/node/node-config.yaml
8- SDN pod logs of both nodes (I don't really expect anything useful here, but let's give it a shot anyway.

The problem might be:
1- The conntrack action not being added to the flows
2- The conntrack action being added but not being honored by OVS
3- The fix being fine and we're having an unrelated problem

Comment 19 Juan Luis de Sousa-Valadas 2020-01-17 13:04:05 UTC
Chandra, I cannot reproduce it in my environment.

Attempt to reproduce:
# oc get netnamespace test
test      48985     []

# oc get hostsubnet
NAME                    HOST                    HOST IP      SUBNET          EGRESS CIDRS   EGRESS IPS
openshift-master-node   openshift-master-node   []             []
openshift-node-1        openshift-node-1   []             []
openshift-node-2        openshift-node-2   []             []

# oc get networkpolicy -o yaml -n test
apiVersion: v1
- apiVersion: extensions/v1beta1
  kind: NetworkPolicy
    creationTimestamp: "2020-01-17T12:52:04Z"
    generation: 1
    name: default-deny
    namespace: test
    resourceVersion: "113338"
    selfLink: /apis/extensions/v1beta1/namespaces/test/networkpolicies/default-deny
    uid: 294f1b1b-3928-11ea-842b-0242ac110002
    podSelector: {}
    - Ingress
kind: List
  resourceVersion: ""
  selfLink: ""

# oc get pod -o wide -n test
NAME                      READY     STATUS    RESTARTS   AGE       IP           NODE               NOMINATED NODE
hello-openshift-3-2cmbf   1/1       Running   0          1m   openshift-node-2   <none>
hello-openshift-3-r9lr6   1/1       Running   0          17m   openshift-node-1   <none

# oc rsh -n test hello-openshift-3-2cmbf curl  -o /dev/null  -s
# oc rsh -n test hello-openshift-3-r9lr6 curl  -o /dev/null  -s

And the application log of
$ python -m http.server
Serving HTTP on port 8000 ( ... - - [17/Jan/2020 13:56:02] "GET / HTTP/1.1" 200 - - - [17/Jan/2020 13:56:11] "GET / HTTP/1.1" 200 -

It's working on my environment.