Bug 1919398

Summary: Permissive Egress NetworkPolicy (0.0.0.0/0) is blocking all traffic
Product: OpenShift Container Platform Reporter: Robert Bost <rbost>
Component: NetworkingAssignee: Michał Dulko <mdulko>
Networking sub component: kuryr QA Contact: Itzik Brown <itbrown>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: bbennett, chrisw, itbrown, ltomasbo, mdulko, mpatercz, ralonsoh, rlobillo, scohen
Version: 4.6.z   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:36:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1938960    

Description Robert Bost 2021-01-22 19:11:54 UTC
Description of problem: The following permissive NetworkPolicy is actually blocking all egress traffic from Pods in Namespace:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: networkpolicy-example
spec:
  podSelector: {}
  policyTypes:
  - Egress
  - Ingress
  ingress:
  - from:
    - podSelector: {}
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0


Version-Release number of selected component (if applicable):
OpenShift 4.6.8
Kuryr CNI


How reproducible: Always for customer


Additional info:
$ oc get kuryrnetworkpolicy -o yaml
(...)
status:
    podSelector: {}
    securityGroupId: 54ccf01c-64f9-4988-872b-dc8a56f68495
    securityGroupRules:
    - description: Kuryr-Kubernetes NetPolicy SG rule
      direction: ingress
      ethertype: IPv4
      id: 99dd4e52-fe7a-4a74-ba64-485cd0f050bb
      port_range_max: 65535
      port_range_min: 1
      protocol: tcp
      remote_ip_prefix: 172.20.138.0/23
      security_group_id: 54ccf01c-64f9-4988-872b-dc8a56f68495

SecurityGroups from OpenStack reflecting NetworkPolicy above:

    - description: Kuryr-Kubernetes NetPolicy SG rule
      direction: ingress
      ethertype: IPv4
      id: 0e83509a-1f9d-4e36-9eae-9cee40768635
      remote_ip_prefix: 172.40.0.0/16
      security_group_id: 54ccf01c-64f9-4988-872b-dc8a56f68495

    - description: Kuryr-Kubernetes NetPolicy SG rule
      direction: egress
      ethertype: IPv4
      id: e4cd66d7-ea5f-47f4-ae32-68cdadb4a245
      port_range_max: 65535
      port_range_min: 1
      protocol: tcp
      remote_ip_prefix: 0.0.0.0/0
      security_group_id: 54ccf01c-64f9-4988-872b-dc8a56f68495

    - description: Kuryr-Kubernetes NetPolicy SG rule
      direction: egress
      ethertype: IPv4
      id: e164a889-d5e6-481f-803d-1399e9052bb7
      port_range_max: 65535
      port_range_min: 1
      protocol: tcp
      remote_ip_prefix: 172.30.22.34
      security_group_id: 54ccf01c-64f9-4988-872b-dc8a56f68495

Comment 1 Luis Tomas Bolivar 2021-01-25 07:33:38 UTC
If security group 54ccf01c-64f9-4988-872b-dc8a56f68495 is applied on the neutron port associated to the pod where the egress traffic is blocked, then this is a neutron/OVN issue enforcing the SG, as the rule seems to be correct (allowing 0.0.0.0/0 tcp traffic on all the ports, for ipv4)

Comment 3 Marek Paterczyk 2021-01-26 18:19:10 UTC
I have confirmed that the SecurityGroup created by Kuryr is applied to the port associated with the pod I'm testing connectivity from. If you think this bug report belongs to a different product queue, then please go ahead and move it.

One more interesting piece of information:

* The 0.0.0.0/0 rule which is incorrectly blocking has IP Protocol set to TCP and Port Range set to 1-65535. Kuryr creates this SecurityGroup rule from an explicitly defined egress rule in the NetworkPolicy, as documented in the initial description.
* However, a 0.0.0.0/0 rule with IP Protocol set to Any and Port Range set to Any is not blocking (works as expected). Kuryr creates this SecurityGroup rule when no egress rules are defined in the NetworkPolicy and Egress is NOT among policyTypes:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:  
  name: networkpolicy-example
spec:
  ingress:
  - from:
    - podSelector: {}
  podSelector: {}
  policyTypes:
  - Ingress

Comment 4 Marek Paterczyk 2021-01-26 18:21:47 UTC
I should also add that I'm testing connectivity with curl against ports 443 and 8080.

Comment 5 Michał Dulko 2021-01-27 10:04:13 UTC
(In reply to Marek Paterczyk from comment #3)
> I have confirmed that the SecurityGroup created by Kuryr is applied to the
> port associated with the pod I'm testing connectivity from. If you think
> this bug report belongs to a different product queue, then please go ahead
> and move it.

It's all about if the SG rules Kuryr creates are correct and I believe you mentioned they are. Kuryr only relies on Neutron to correctly manage the traffic according to the rules it creates. We can't do much if Neutron is not doing it's job correctly which seems to be the case here? Please also note that we've seen Neutron problems with SG rules in the past, so it's not totally unexpected.

> One more interesting piece of information:
> 
> * The 0.0.0.0/0 rule which is incorrectly blocking has IP Protocol set to
> TCP and Port Range set to 1-65535. Kuryr creates this SecurityGroup rule
> from an explicitly defined egress rule in the NetworkPolicy, as documented
> in the initial description.
> * However, a 0.0.0.0/0 rule with IP Protocol set to Any and Port Range set
> to Any is not blocking (works as expected). Kuryr creates this SecurityGroup
> rule when no egress rules are defined in the NetworkPolicy and Egress is NOT
> among policyTypes:

Is that against K8s Network Policy spec? My understanding is that if there's no Egress in policyTypes, then it's allow-all on egress, which this rule tries to achieve?

> apiVersion: networking.k8s.io/v1
> kind: NetworkPolicy
> metadata:  
>   name: networkpolicy-example
> spec:
>   ingress:
>   - from:
>     - podSelector: {}
>   podSelector: {}
>   policyTypes:
>   - Ingress

Comment 6 Marek Paterczyk 2021-01-27 19:40:38 UTC
> We can't do much if Neutron is not doing it's job correctly which seems to be the case here? Please also note that we've seen Neutron problems with SG rules in the past, so it's not totally unexpected.

What is your recommendation then? I'm ok with moving this bugzilla to OpenStack queue, if this is where we want to direct our focus. NetworkPolicies are very important to us and we may decided not to use Kuryr if this functionality is not reliable (regardless if the issue lies in Kuryr or OpenStack), so I'd like to make sure this gets a proper follow up.

> Is that against K8s Network Policy spec? My understanding is that if there's no Egress in policyTypes, then it's allow-all on egress, which this rule tries to achieve?

Yes, when Egress policy is not specified on a NetworkPolicy, allow-all rules for egress are created. No problem here. I just wanted to point out that those work as expected and how they are different from similar rules which don't.

Comment 7 Robert Bost 2021-01-29 20:44:11 UTC
Moving to OSP networking per conversation in #neutron. We will be capturing a sosreport from the controller node running neutron api and uploading later.

Comment 9 Luis Tomas Bolivar 2021-02-01 08:52:09 UTC
(In reply to Marek Paterczyk from comment #3)
> I have confirmed that the SecurityGroup created by Kuryr is applied to the
> port associated with the pod I'm testing connectivity from. If you think
> this bug report belongs to a different product queue, then please go ahead
> and move it.
> 
> One more interesting piece of information:
> 
> * The 0.0.0.0/0 rule which is incorrectly blocking has IP Protocol set to
> TCP and Port Range set to 1-65535. Kuryr creates this SecurityGroup rule
> from an explicitly defined egress rule in the NetworkPolicy, as documented
> in the initial description.
> * However, a 0.0.0.0/0 rule with IP Protocol set to Any and Port Range set
> to Any is not blocking (works as expected). Kuryr creates this SecurityGroup
> rule when no egress rules are defined in the NetworkPolicy and Egress is NOT
> among policyTypes:
> 
> apiVersion: networking.k8s.io/v1
> kind: NetworkPolicy
> metadata:  
>   name: networkpolicy-example
> spec:
>   ingress:
>   - from:
>     - podSelector: {}
>   podSelector: {}
>   policyTypes:
>   - Ingress

By default, the network policy is TCP, that could be the reason why when this rule gets applied the egress rule get set to allow all TCP egress. Perhaps you need to add protocol for both if what you are testing is not TCP? Or perhaps that is a wrong assumption on the translation Kuryr is doing and we should add default permissions for both TCP and UDP

Comment 10 Marek Paterczyk 2021-02-01 14:06:20 UTC
Hello Luis

I am using curl to test connectivity over TCP. The TCP specific rule should not block (somehow it does).

Comment 12 Luis Tomas Bolivar 2021-02-01 15:14:38 UTC
(In reply to Marek Paterczyk from comment #10)
> Hello Luis
> 
> I am using curl to test connectivity over TCP. The TCP specific rule should
> not block (somehow it does).

ok. And another question, if you want to enable all egress, why not doing something like the next instead of the ipBlock:
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-egress
spec:
  podSelector: {}
  policyTypes:
  - Egress
  - Ingress
  ingress:
  - from:
    - podSelector: {}
  egress:
  - {}

Comment 16 Michał Dulko 2021-02-02 12:39:03 UTC
(In reply to Marek Paterczyk from comment #10)
> Hello Luis
> 
> I am using curl to test connectivity over TCP. The TCP specific rule should
> not block (somehow it does).

I just tried to reproduce this problem using the initial NP you've provided. I am able to curl a specific IP over TCP. I am unable to curl a domain as rule only allows TCP, so it fails on domain resolution as DNS uses UDP. Is that the problem you're seeing here?

I'm trying to verify if defaulting to TCP is a correct behavior of Kuryr here. The API reference seems a bit vague, saying only that port's `protocol` defaults to TCP and that without `ports` we should open all ports (but nothing about protocols).

Comment 17 Michał Dulko 2021-02-02 15:56:23 UTC
(In reply to Michał Dulko from comment #16)
> (In reply to Marek Paterczyk from comment #10)
> > Hello Luis
> > 
> > I am using curl to test connectivity over TCP. The TCP specific rule should
> > not block (somehow it does).
> 
> I just tried to reproduce this problem using the initial NP you've provided.
> I am able to curl a specific IP over TCP. I am unable to curl a domain as
> rule only allows TCP, so it fails on domain resolution as DNS uses UDP. Is
> that the problem you're seeing here?
> 
> I'm trying to verify if defaulting to TCP is a correct behavior of Kuryr
> here. The API reference seems a bit vague, saying only that port's
> `protocol` defaults to TCP and that without `ports` we should open all ports
> (but nothing about protocols).

I tried ovn-kubernetes and when there are no ports defined it just allows all protocols, meaning that Kuryr opening only TCP is behaving differently. So please just confirm if what you've tested as `curl` target was a domain-based URL, not IP-based. That would mean the root cause is in Kuryr, not OVN.

Comment 18 Marek Paterczyk 2021-02-02 16:18:31 UTC
Hello Michał

I confirm lack of DNS access is in fact my problem. Thanks for connecting the dots for me.

Adding this NetworkPolicy allowed my tests to pass:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: networkpolicy-dns
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - ipBlock:
        cidr: 172.30.0.10/32
    ports:
    - port: 53
      protocol: UDP

This is digressing a little, but in non-Kuryr deployments we use following NetworkPolicy rule to allow all egress to all cluster destinations:

  egress:
  - to:
    - namespaceSelector: {}

That covers core cluster services, including DNS. Unfortunately, this rule does work the same way with Kuryr (https://bugzilla.redhat.com/show_bug.cgi?id=1921878). Looks like Kuryr requires service and pod networks to be explicitly covered in NetworkPolicies, which can come as a surprise to some (like me).

Comment 19 Michał Dulko 2021-02-03 08:01:52 UTC
(In reply to Marek Paterczyk from comment #18)
> That covers core cluster services, including DNS. Unfortunately, this rule
> does work the same way with Kuryr
> (https://bugzilla.redhat.com/show_bug.cgi?id=1921878). Looks like Kuryr
> requires service and pod networks to be explicitly covered in
> NetworkPolicies, which can come as a surprise to some (like me).

Yes, I see it as a bug caused by us misunderstanding vague explanations in the NetworkPolicies API reference. We'll tackle this and I'm fairly confident this is easily backportable to 4.6.

Comment 21 rlobillo 2021-02-22 10:43:51 UTC
Verified on OCP4.8.0-0.nightly-2021-02-21-102854 on OSP13(2021-01-20.1) with Amphora provider.

SG rules generated by below NP resource definition allow egress traffic for all protocols and not only TCP:

$ cat np_bz1919398.yaml 
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: np-bz1919398
spec:
  podSelector:
    matchLabels:
      run: demo
  policyTypes:
  - Egress
  - Ingress
  ingress:
  - from:
    - podSelector: {}
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0

Steps:

1. Create test and test2 projects both with kuryr/demo pod exposed by a service on port 80:

$ oc new-project test
$ oc run --image kuryr/demo demo
$ oc expose pod/demo --port 80 --target-port 8080

$ oc get all -n test
NAME       READY   STATUS    RESTARTS   AGE
pod/demo   1/1     Running   0          40m

NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/demo   ClusterIP   172.30.138.91   <none>        80/TCP    40m

$ oc new-project test2
$ oc run --image kuryr/demo demo2
$ oc expose pod/demo2 --port 80 --target-port 8080

$ oc get all -n test2
NAME        READY   STATUS    RESTARTS   AGE
pod/demo2   1/1     Running   0          3m

NAME            TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
service/demo2   ClusterIP   172.30.4.47   <none>        80/TCP    2m39s


2. Apply np on demo pod in test project:

$ cat np_bz1919398.yaml 
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: np-bz1919398
spec:
  podSelector:
    matchLabels:
      run: demo
  policyTypes:
  - Egress
  - Ingress
  ingress:
  - from:
    - podSelector: {}
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0

$ oc apply -f np_bz1919398.yaml -n test

# knp resource generated includes Egress rule apply to IPv4 traffic, 
not only TCP:

$ oc get knp/np-bz1919398 -o json | jq .spec
{
  "egressSgRules": [
    {
      "sgRule": {
        "description": "Kuryr-Kubernetes NetPolicy SG rule",
        "direction": "egress",
        "ethertype": "IPv4",
        "remote_ip_prefix": "0.0.0.0/0"
      }
    },
    {
      "sgRule": {
        "description": "Kuryr-Kubernetes NetPolicy SG rule",
        "direction": "egress",
        "ethertype": "IPv4",
        "remote_ip_prefix": "172.30.138.91"
      }
    }
  ],
  "ingressSgRules": [
    {
      "namespace": "test",
      "sgRule": {
        "description": "Kuryr-Kubernetes NetPolicy SG rule",
        "direction": "ingress",
        "ethertype": "IPv4",
        "remote_ip_prefix": "10.128.124.0/23"
      }
    },
    {
      "sgRule": {
        "description": "Kuryr-Kubernetes NetPolicy SG rule",
        "direction": "ingress",
        "ethertype": "IPv4",
        "remote_ip_prefix": "172.30.0.0/15"
      }
    },
    {
      "sgRule": {
        "description": "Kuryr-Kubernetes NetPolicy SG rule",
        "direction": "ingress",
        "ethertype": "IPv4",
        "remote_ip_prefix": "10.196.0.0/16"
      }
    }
  ],
  "podSelector": {
    "matchLabels": {
      "run": "demo"
    }
  },
  "policyTypes": [
    "Egress",
    "Ingress"
  ]
}

3. Test connectivity:

$ oc rsh -n test demo

1. Ping to external domain:
~ $ curl -s www.google.com
<!doctype html><html dir="rtl" itemscope="" itemtype="http://schema.org/
WebPage" lang="iw"><head><meta content="text/html; charset=UTF-8" http-e
quiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_s
tandard_color_128dp.png" itemprop="image"><title>Google</title><script n
once="tmGrp9BOgBuSGdMD4i89gA==">(function(){window.google={kEI:'3IUzYKu_
IsmVsAfT8o6oDg',kEXPI:'0,18168,1284265[...]

2. Ping to other namespace:
~ $ curl 172.30.4.47
demo2: HELLO! I AM ALIVE!!!

Furthermore, kuryr-tempest tests, NP tests and conformance tests
passed for this build. Please refer to the attachment on 
https://bugzilla.redhat.com/show_bug.cgi?id=1927244#c6

Comment 26 errata-xmlrpc 2021-07-27 22:36:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438