Bug 1766583 - [3.11] EgressIP doesn't work with NetworkPolicy unless traffic from default project is allowed
Summary: [3.11] EgressIP doesn't work with NetworkPolicy unless traffic from default p...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.11.z
Assignee: Juan Luis de Sousa-Valadas
QA Contact: huirwang
Depends On: 1700431 1741477 1741499
TreeView+ depends on / blocked
Reported: 2019-10-29 13:04 UTC by K Chandra Sekar
Modified: 2020-09-03 13:47 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1741477
Last Closed: 2020-09-03 13:47:22 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description K Chandra Sekar 2019-10-29 13:04:20 UTC
+++ This bug was initially created as a clone of Bug #1741477 +++

+++ This bug was initially created as a clone of Bug #1700431 +++

Description of problem:
Customer reports when they use networkPolicy combined with egressIP unless

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create a project X using egressIP
2. Add egressIP to node A
3. Create a pod in project X which is *not* running on node A.
4. Create a networkPolicy which only allows traffic from itself:

- apiVersion: extensions/v1beta1
  kind: NetworkPolicy
    name: deny-ingress-from-other-namespaces
    - from:
      - podSelector: {}
    podSelector: {}
    - Ingress
5. Go to the pod in project X and try to reach a resource outside OpenShift. Traffic is dropped.
6. Create a rule that allows traffic from the default project (assumes the default netnamespace netid equals 0 and the project has a label project=default)
- apiVersion: extensions/v1beta1
  kind: NetworkPolicy
    name: allow from default
    - from:
      - podSelector: {}
      - namespaceSelector:
            project: default
    podSelector: {}
    - Ingress
Traffic works
7. Completely remove every networkPolicy. Traffic also works

Actual results:
Packet is dropped somewhere.

Expected results:
Packet goes through the egressIP and comes back

Additional info:
Additionally the customer has an egressNetworkPolicy which allows traffic to destination and denies traffic by default. I believe this egressNetworkPolicy is unrelated to the issue.

Opening this new bug as the customer is still facing the issue reported in the above bugzilla as the errata[1] didn't fixed the issue for them

[1] - https://access.redhat.com/errata/RHBA-2019:2816

Comment 4 zhaozhanqi 2019-12-12 06:23:35 UTC
hi, huiran, could you help try if this can be reproduced?

Comment 10 Juan Luis de Sousa-Valadas 2020-01-02 10:39:22 UTC
Hi Chandra,
Because QA cannot reproduce the issue aaand 3.11.146 should already have the fix, I'm going to need the following information:

1- oc get namespace <project with egressIP>
2- oc get hostsubnet/<node hosting the pod> hostsubnet/<node with the hostsubnet>
3- oc get pod <affected pod> -o wide
4- oc get clusternetwork
5- In both nodes (the one with the egressIP and the one hosting the pod): oc rsh <pod name> ovs-ofctl -O OpenFlow13 dump-flows br0
6- In both nodes: iptables-save
7- In both nodes: The file /etc/origin/node/node-config.yaml
8- SDN pod logs of both nodes (I don't really expect anything useful here, but let's give it a shot anyway.

The problem might be:
1- The conntrack action not being added to the flows
2- The conntrack action being added but not being honored by OVS
3- The fix being fine and we're having an unrelated problem

Comment 19 Juan Luis de Sousa-Valadas 2020-01-17 13:04:05 UTC
Chandra, I cannot reproduce it in my environment.

Attempt to reproduce:
# oc get netnamespace test
test      48985     []

# oc get hostsubnet
NAME                    HOST                    HOST IP      SUBNET          EGRESS CIDRS   EGRESS IPS
openshift-master-node   openshift-master-node   []             []
openshift-node-1        openshift-node-1   []             []
openshift-node-2        openshift-node-2   []             []

# oc get networkpolicy -o yaml -n test
apiVersion: v1
- apiVersion: extensions/v1beta1
  kind: NetworkPolicy
    creationTimestamp: "2020-01-17T12:52:04Z"
    generation: 1
    name: default-deny
    namespace: test
    resourceVersion: "113338"
    selfLink: /apis/extensions/v1beta1/namespaces/test/networkpolicies/default-deny
    uid: 294f1b1b-3928-11ea-842b-0242ac110002
    podSelector: {}
    - Ingress
kind: List
  resourceVersion: ""
  selfLink: ""

# oc get pod -o wide -n test
NAME                      READY     STATUS    RESTARTS   AGE       IP           NODE               NOMINATED NODE
hello-openshift-3-2cmbf   1/1       Running   0          1m   openshift-node-2   <none>
hello-openshift-3-r9lr6   1/1       Running   0          17m   openshift-node-1   <none

# oc rsh -n test hello-openshift-3-2cmbf curl  -o /dev/null  -s
# oc rsh -n test hello-openshift-3-r9lr6 curl  -o /dev/null  -s

And the application log of
$ python -m http.server
Serving HTTP on port 8000 ( ... - - [17/Jan/2020 13:56:02] "GET / HTTP/1.1" 200 - - - [17/Jan/2020 13:56:11] "GET / HTTP/1.1" 200 -

It's working on my environment.

Note You need to log in before you can comment on or make changes to this bug.