Hide Forgot
+++ This bug was initially created as a clone of Bug #2061916 +++ If a namespace contained no policies that applied to the entire namespace, then pod which were selected by at least one ingress-only policy and at least one egress-only policy could end up only half-isolated, such that we would either allow ingress to the pod that should have been blocked, or allow egress from it that should have been blocked. Note that unlike bug 2060553, this never _blocks_ traffic that should have been _allowed_, so it does not immediately break previously-working configurations. Also, "egress policies don't get applied" is a bug, but not a regression, since egress policies didn't get applied in 4.9 either. Thus, the only possible regression here is: - If a user upgrading from 4.9 has egress policies applied to some (but not all) pods in a namespace (even though those policies did nothing in 4.9)... - and they also have ingress policies applied to the same pods (but which, again, don't apply to _all_ pods)... - AND they don't have a "default-deny" ingress policy, an "emulate multitenant mode" policy, an "allow traffic from the router to all pods" policy, or *any other* policy with `podSelector: {}` - then there's a 50/50 chance that those pods will end up allowing ingress traffic that they had wanted to block The requirement that they have no ingress policies that apply to the whole namespace means this is unlikely to _actually_ affect anyone, since NP is extremely difficult to use correctly if you don't lay down a baseline policy that applies to all pods first. --- Additional comment from zzhao on 2022-03-09 11:47:59 UTC --- Hi, Dan, following your above description, I did not hit the issue when upgrade from 4.9.23 to 4.10.3 version. Could you help check the networkpolicy is expected. 1. there are test pods with different label # oc get pod -o wide -n z1 --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS hello-sdn-7x5sk 1/1 Running 0 3h43m 10.129.2.15 ip-10-0-168-211.us-east-2.compute.internal <none> <none> name=hellosdn hello-sdn-bq4wp 1/1 Running 0 3h40m 10.131.0.18 ip-10-0-203-92.us-east-2.compute.internal <none> <none> name=hellosdn test-rc-f9gbq 1/1 Running 0 3h43m 10.129.2.14 ip-10-0-168-211.us-east-2.compute.internal <none> <none> name=test-pods test-rc-gw7nn 1/1 Running 0 3h40m 10.131.0.15 ip-10-0-203-92.us-east-2.compute.internal <none> <none> name=test-pods 2. apply one egress type policy for z1 as below: kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: egress-all-otherpod spec: podSelector: matchLabels: name: hellosdn egress: - {} policyTypes: - Egress 3. apply Ingress type policy for z1 as below: kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-all-ingress spec: podSelector: matchLabels: name: hellosdn ingress: - from: - namespaceSelector: matchLabels: team: operations - podSelector: matchLabels: name: test-pods policyTypes: - Ingress 4. Create another namespace z2 and label with 'name=operations' # oc get pod -n z2 --show-labels NAME READY STATUS RESTARTS AGE LABELS hello-sdn-8j8jx 1/1 Running 0 3h44m name=hellosdn hello-sdn-zqr47 1/1 Running 0 3h47m name=hellosdn test-rc-247r8 1/1 Running 0 3h44m name=test-pods test-rc-5slzr 1/1 Running 0 3h47m name=test-pods 5. Create another namespace z3 but without label "name=operatorions' # oc get namespace z2 z3 --show-labels NAME STATUS AGE LABELS z2 Active 8h kubernetes.io/metadata.name=z2,team=operations z3 Active 4h38m kubernetes.io/metadata.name=z3 # oc get pod -n z3 --show-labels NAME READY STATUS RESTARTS AGE LABELS hello-sdn-ld572 1/1 Running 0 3h49m name=hellosdn hello-sdn-ntm5c 1/1 Running 0 3h45m name=hellosdn so test result: ######z2 with test-pods can access z1- hello-sdn pod, this is expected #### # oc exec -n z2 test-rc-247r8 -- curl curl 10.129.2.15:8080 2>/dev/null Hello OpenShift! ######z2 with hellosdn pod also can access z1- hello-sdn pod, I think this is not expected, so file one bug https://bugzilla.redhat.com/show_bug.cgi?id=2062084 for this issue #### # # oc exec -n z2 hello-sdn-8j8jx -- curl curl 10.129.2.15:8080 2>/dev/null Hello OpenShift! ##### z3 with test-pods cannot be access z1-hello-sdn pods. this is expected #### # oc exec -n z3 hello-sdn-ld572 -- curl --connect-timeout 5 10.129.2.15:8080 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:05 --:--:-- 0 curl: (28) Connection timeout after 5000 ms command terminated with exit code 28 ###### Then upgrade the cluster to 4.10.3 version I got same test result with before upgrade. I did not get you mentioned: > - then there's a 50/50 chance that those pods will end up allowing ingress traffic that they had wanted to block Could you help the networkpolicy and the steps, thanks --- Additional comment from danw on 2022-03-09 15:13:21 UTC --- > I did not hit the issue when upgrade from 4.9.23 to 4.10.3 You don't need to do an upgrade to test this (or either of the earlier 4.10 sdn NP bugs). They can all be reproduced starting with a 4.10 cluster and creating egress policies. The two earlier bugs were reported by people doing upgrades but that's just because there aren't many people doing new 4.10 installs and experimenting with egress policies yet. > I did not get you mentioned: > > > - then there's a 50/50 chance that those pods will end up allowing > > ingress traffic that they had wanted to block Yeah, so the bug is "half the time the ingress policy doesn't block the things it should block, and half the time the egress policy doesn't block the things it should block". In the part you quoted I was only talking about the ingress half because I was saying that if the egress policy doesn't work, that it's not technically a _regression_ since egress policies didn't work in 4.9 anyway. But anyway, to test this, you need to change the egress policy so that it allows some egress but not all egress, and then test that (a) the ingress policy blocks the ingress it's supposed to block, and (b) the egress policy blocks the egress that it's supposed to block. So eg, make the z1 egress policy allow egress to z1, but not to z2 and z3. Then if the z3 -> z1.hello-sdn connection fails correctly, check if a z1.hello-sdn -> z3 connection works. That ought to be blocked by the z1 egress policy, but if this bug occurs then it won't be blocked. --- Additional comment from zzhao on 2022-03-10 02:51:20 UTC --- Thanks Dan, I got the steps to reproduce this issue following your comment 1. oc get pod -n z1 -o wide --show-labels # oc get pod -o wide -n z1 --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS hello-sdn-7x5sk 1/1 Running 0 18h 10.129.2.15 ip-10-0-168-211.us-east-2.compute.internal <none> <none> name=hellosdn hello-sdn-bq4wp 1/1 Running 0 18h 10.131.0.18 ip-10-0-203-92.us-east-2.compute.internal <none> <none> name=hellosdn test-rc-f9gbq 1/1 Running 0 18h 10.129.2.14 ip-10-0-168-211.us-east-2.compute.internal <none> <none> name=test-pods test-rc-gw7nn 1/1 Running 0 18h 10.131.0.15 ip-10-0-203-92.us-east-2.compute.internal <none> <none> name=test-pods # oc get pod -o wide -n z3 --show-labels NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS hello-sdn-ld572 1/1 Running 0 18h 10.129.2.13 ip-10-0-168-211.us-east-2.compute.internal <none> <none> name=hellosdn hello-sdn-ntm5c 1/1 Running 0 18h 10.131.0.8 ip-10-0-203-92.us-east-2.compute.internal <none> <none> name=hellosdn # oc get namespace z1 z3 --show-labels NAME STATUS AGE LABELS z1 Active 23h kubernetes.io/metadata.name=z1 z3 Active 19h kubernetes.io/metadata.name=z3 2. apply one egress type policy for z1 kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: egress-all-otherpod spec: podSelector: matchLabels: name: hellosdn egress: - to: - namespaceSelector: matchLabels: team: openshift policyTypes: - Egress 3. ### z1 hellosdn pods cannot be accessed z3 pods, it's expected # oc exec -n z1 hello-sdn-7x5sk -- curl --connect-timeout 5 10.129.2.13:8080 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:05 --:--:-- 0 curl: (28) Connection timeout after 5000 ms command terminated with exit code 28 4. then apply another ingress policy for z1 as below kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-all-ingress spec: podSelector: matchLabels: name: hellosdn ingress: - from: - namespaceSelector: matchLabels: team: operations podSelector: matchLabels: name: test-pods policyTypes: - Ingress # oc get networkpolicy -n z1 NAME POD-SELECTOR AGE allow-all-ingress name=hellosdn 29s egress-all-otherpod name=hellosdn 16m 5. Now z1 hellosdn pods can be able to access z3 pods, it's NOT expected # oc exec -n z1 hello-sdn-7x5sk -- curl --connect-timeout 5 10.129.2.13:8080 2>/dev/null Hello OpenShift!
Verified this bug on 4.10.0-0.nightly-2022-04-05-063640
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10.9 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1241