Bug 1919398
Summary: | Permissive Egress NetworkPolicy (0.0.0.0/0) is blocking all traffic | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Robert Bost <rbost> |
Component: | Networking | Assignee: | Michał Dulko <mdulko> |
Networking sub component: | kuryr | QA Contact: | Itzik Brown <itbrown> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | bbennett, chrisw, itbrown, ltomasbo, mdulko, mpatercz, ralonsoh, rlobillo, scohen |
Version: | 4.6.z | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:36:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1938960 |
Description
Robert Bost
2021-01-22 19:11:54 UTC
If security group 54ccf01c-64f9-4988-872b-dc8a56f68495 is applied on the neutron port associated to the pod where the egress traffic is blocked, then this is a neutron/OVN issue enforcing the SG, as the rule seems to be correct (allowing 0.0.0.0/0 tcp traffic on all the ports, for ipv4) I have confirmed that the SecurityGroup created by Kuryr is applied to the port associated with the pod I'm testing connectivity from. If you think this bug report belongs to a different product queue, then please go ahead and move it. One more interesting piece of information: * The 0.0.0.0/0 rule which is incorrectly blocking has IP Protocol set to TCP and Port Range set to 1-65535. Kuryr creates this SecurityGroup rule from an explicitly defined egress rule in the NetworkPolicy, as documented in the initial description. * However, a 0.0.0.0/0 rule with IP Protocol set to Any and Port Range set to Any is not blocking (works as expected). Kuryr creates this SecurityGroup rule when no egress rules are defined in the NetworkPolicy and Egress is NOT among policyTypes: apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: networkpolicy-example spec: ingress: - from: - podSelector: {} podSelector: {} policyTypes: - Ingress I should also add that I'm testing connectivity with curl against ports 443 and 8080. (In reply to Marek Paterczyk from comment #3) > I have confirmed that the SecurityGroup created by Kuryr is applied to the > port associated with the pod I'm testing connectivity from. If you think > this bug report belongs to a different product queue, then please go ahead > and move it. It's all about if the SG rules Kuryr creates are correct and I believe you mentioned they are. Kuryr only relies on Neutron to correctly manage the traffic according to the rules it creates. We can't do much if Neutron is not doing it's job correctly which seems to be the case here? Please also note that we've seen Neutron problems with SG rules in the past, so it's not totally unexpected. > One more interesting piece of information: > > * The 0.0.0.0/0 rule which is incorrectly blocking has IP Protocol set to > TCP and Port Range set to 1-65535. Kuryr creates this SecurityGroup rule > from an explicitly defined egress rule in the NetworkPolicy, as documented > in the initial description. > * However, a 0.0.0.0/0 rule with IP Protocol set to Any and Port Range set > to Any is not blocking (works as expected). Kuryr creates this SecurityGroup > rule when no egress rules are defined in the NetworkPolicy and Egress is NOT > among policyTypes: Is that against K8s Network Policy spec? My understanding is that if there's no Egress in policyTypes, then it's allow-all on egress, which this rule tries to achieve? > apiVersion: networking.k8s.io/v1 > kind: NetworkPolicy > metadata: > name: networkpolicy-example > spec: > ingress: > - from: > - podSelector: {} > podSelector: {} > policyTypes: > - Ingress > We can't do much if Neutron is not doing it's job correctly which seems to be the case here? Please also note that we've seen Neutron problems with SG rules in the past, so it's not totally unexpected. What is your recommendation then? I'm ok with moving this bugzilla to OpenStack queue, if this is where we want to direct our focus. NetworkPolicies are very important to us and we may decided not to use Kuryr if this functionality is not reliable (regardless if the issue lies in Kuryr or OpenStack), so I'd like to make sure this gets a proper follow up. > Is that against K8s Network Policy spec? My understanding is that if there's no Egress in policyTypes, then it's allow-all on egress, which this rule tries to achieve? Yes, when Egress policy is not specified on a NetworkPolicy, allow-all rules for egress are created. No problem here. I just wanted to point out that those work as expected and how they are different from similar rules which don't. Moving to OSP networking per conversation in #neutron. We will be capturing a sosreport from the controller node running neutron api and uploading later. (In reply to Marek Paterczyk from comment #3) > I have confirmed that the SecurityGroup created by Kuryr is applied to the > port associated with the pod I'm testing connectivity from. If you think > this bug report belongs to a different product queue, then please go ahead > and move it. > > One more interesting piece of information: > > * The 0.0.0.0/0 rule which is incorrectly blocking has IP Protocol set to > TCP and Port Range set to 1-65535. Kuryr creates this SecurityGroup rule > from an explicitly defined egress rule in the NetworkPolicy, as documented > in the initial description. > * However, a 0.0.0.0/0 rule with IP Protocol set to Any and Port Range set > to Any is not blocking (works as expected). Kuryr creates this SecurityGroup > rule when no egress rules are defined in the NetworkPolicy and Egress is NOT > among policyTypes: > > apiVersion: networking.k8s.io/v1 > kind: NetworkPolicy > metadata: > name: networkpolicy-example > spec: > ingress: > - from: > - podSelector: {} > podSelector: {} > policyTypes: > - Ingress By default, the network policy is TCP, that could be the reason why when this rule gets applied the egress rule get set to allow all TCP egress. Perhaps you need to add protocol for both if what you are testing is not TCP? Or perhaps that is a wrong assumption on the translation Kuryr is doing and we should add default permissions for both TCP and UDP Hello Luis I am using curl to test connectivity over TCP. The TCP specific rule should not block (somehow it does). (In reply to Marek Paterczyk from comment #10) > Hello Luis > > I am using curl to test connectivity over TCP. The TCP specific rule should > not block (somehow it does). ok. And another question, if you want to enable all egress, why not doing something like the next instead of the ipBlock: --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-all-egress spec: podSelector: {} policyTypes: - Egress - Ingress ingress: - from: - podSelector: {} egress: - {} (In reply to Marek Paterczyk from comment #10) > Hello Luis > > I am using curl to test connectivity over TCP. The TCP specific rule should > not block (somehow it does). I just tried to reproduce this problem using the initial NP you've provided. I am able to curl a specific IP over TCP. I am unable to curl a domain as rule only allows TCP, so it fails on domain resolution as DNS uses UDP. Is that the problem you're seeing here? I'm trying to verify if defaulting to TCP is a correct behavior of Kuryr here. The API reference seems a bit vague, saying only that port's `protocol` defaults to TCP and that without `ports` we should open all ports (but nothing about protocols). (In reply to Michał Dulko from comment #16) > (In reply to Marek Paterczyk from comment #10) > > Hello Luis > > > > I am using curl to test connectivity over TCP. The TCP specific rule should > > not block (somehow it does). > > I just tried to reproduce this problem using the initial NP you've provided. > I am able to curl a specific IP over TCP. I am unable to curl a domain as > rule only allows TCP, so it fails on domain resolution as DNS uses UDP. Is > that the problem you're seeing here? > > I'm trying to verify if defaulting to TCP is a correct behavior of Kuryr > here. The API reference seems a bit vague, saying only that port's > `protocol` defaults to TCP and that without `ports` we should open all ports > (but nothing about protocols). I tried ovn-kubernetes and when there are no ports defined it just allows all protocols, meaning that Kuryr opening only TCP is behaving differently. So please just confirm if what you've tested as `curl` target was a domain-based URL, not IP-based. That would mean the root cause is in Kuryr, not OVN. Hello Michał I confirm lack of DNS access is in fact my problem. Thanks for connecting the dots for me. Adding this NetworkPolicy allowed my tests to pass: kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: networkpolicy-dns spec: podSelector: {} policyTypes: - Egress egress: - to: - ipBlock: cidr: 172.30.0.10/32 ports: - port: 53 protocol: UDP This is digressing a little, but in non-Kuryr deployments we use following NetworkPolicy rule to allow all egress to all cluster destinations: egress: - to: - namespaceSelector: {} That covers core cluster services, including DNS. Unfortunately, this rule does work the same way with Kuryr (https://bugzilla.redhat.com/show_bug.cgi?id=1921878). Looks like Kuryr requires service and pod networks to be explicitly covered in NetworkPolicies, which can come as a surprise to some (like me). (In reply to Marek Paterczyk from comment #18) > That covers core cluster services, including DNS. Unfortunately, this rule > does work the same way with Kuryr > (https://bugzilla.redhat.com/show_bug.cgi?id=1921878). Looks like Kuryr > requires service and pod networks to be explicitly covered in > NetworkPolicies, which can come as a surprise to some (like me). Yes, I see it as a bug caused by us misunderstanding vague explanations in the NetworkPolicies API reference. We'll tackle this and I'm fairly confident this is easily backportable to 4.6. Verified on OCP4.8.0-0.nightly-2021-02-21-102854 on OSP13(2021-01-20.1) with Amphora provider. SG rules generated by below NP resource definition allow egress traffic for all protocols and not only TCP: $ cat np_bz1919398.yaml kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: np-bz1919398 spec: podSelector: matchLabels: run: demo policyTypes: - Egress - Ingress ingress: - from: - podSelector: {} egress: - to: - ipBlock: cidr: 0.0.0.0/0 Steps: 1. Create test and test2 projects both with kuryr/demo pod exposed by a service on port 80: $ oc new-project test $ oc run --image kuryr/demo demo $ oc expose pod/demo --port 80 --target-port 8080 $ oc get all -n test NAME READY STATUS RESTARTS AGE pod/demo 1/1 Running 0 40m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/demo ClusterIP 172.30.138.91 <none> 80/TCP 40m $ oc new-project test2 $ oc run --image kuryr/demo demo2 $ oc expose pod/demo2 --port 80 --target-port 8080 $ oc get all -n test2 NAME READY STATUS RESTARTS AGE pod/demo2 1/1 Running 0 3m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/demo2 ClusterIP 172.30.4.47 <none> 80/TCP 2m39s 2. Apply np on demo pod in test project: $ cat np_bz1919398.yaml kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: np-bz1919398 spec: podSelector: matchLabels: run: demo policyTypes: - Egress - Ingress ingress: - from: - podSelector: {} egress: - to: - ipBlock: cidr: 0.0.0.0/0 $ oc apply -f np_bz1919398.yaml -n test # knp resource generated includes Egress rule apply to IPv4 traffic, not only TCP: $ oc get knp/np-bz1919398 -o json | jq .spec { "egressSgRules": [ { "sgRule": { "description": "Kuryr-Kubernetes NetPolicy SG rule", "direction": "egress", "ethertype": "IPv4", "remote_ip_prefix": "0.0.0.0/0" } }, { "sgRule": { "description": "Kuryr-Kubernetes NetPolicy SG rule", "direction": "egress", "ethertype": "IPv4", "remote_ip_prefix": "172.30.138.91" } } ], "ingressSgRules": [ { "namespace": "test", "sgRule": { "description": "Kuryr-Kubernetes NetPolicy SG rule", "direction": "ingress", "ethertype": "IPv4", "remote_ip_prefix": "10.128.124.0/23" } }, { "sgRule": { "description": "Kuryr-Kubernetes NetPolicy SG rule", "direction": "ingress", "ethertype": "IPv4", "remote_ip_prefix": "172.30.0.0/15" } }, { "sgRule": { "description": "Kuryr-Kubernetes NetPolicy SG rule", "direction": "ingress", "ethertype": "IPv4", "remote_ip_prefix": "10.196.0.0/16" } } ], "podSelector": { "matchLabels": { "run": "demo" } }, "policyTypes": [ "Egress", "Ingress" ] } 3. Test connectivity: $ oc rsh -n test demo 1. Ping to external domain: ~ $ curl -s www.google.com <!doctype html><html dir="rtl" itemscope="" itemtype="http://schema.org/ WebPage" lang="iw"><head><meta content="text/html; charset=UTF-8" http-e quiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_s tandard_color_128dp.png" itemprop="image"><title>Google</title><script n once="tmGrp9BOgBuSGdMD4i89gA==">(function(){window.google={kEI:'3IUzYKu_ IsmVsAfT8o6oDg',kEXPI:'0,18168,1284265[...] 2. Ping to other namespace: ~ $ curl 172.30.4.47 demo2: HELLO! I AM ALIVE!!! Furthermore, kuryr-tempest tests, NP tests and conformance tests passed for this build. Please refer to the attachment on https://bugzilla.redhat.com/show_bug.cgi?id=1927244#c6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |