Bug 2061916
| Summary: | mixed ingress and egress policies can result in half-isolated pods | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Dan Winship <danw> |
| Component: | Networking | Assignee: | Dan Winship <danw> |
| Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | Keywords: | NeedsTestCase |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 10:52:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2062859 | ||
|
Description
Dan Winship
2022-03-08 17:10:24 UTC
Hi, Dan, following your above description, I did not hit the issue when upgrade from 4.9.23 to 4.10.3 version. Could you help check the networkpolicy is expected.
1. there are test pods with different label
# oc get pod -o wide -n z1 --show-labels
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
hello-sdn-7x5sk 1/1 Running 0 3h43m 10.129.2.15 ip-10-0-168-211.us-east-2.compute.internal <none> <none> name=hellosdn
hello-sdn-bq4wp 1/1 Running 0 3h40m 10.131.0.18 ip-10-0-203-92.us-east-2.compute.internal <none> <none> name=hellosdn
test-rc-f9gbq 1/1 Running 0 3h43m 10.129.2.14 ip-10-0-168-211.us-east-2.compute.internal <none> <none> name=test-pods
test-rc-gw7nn 1/1 Running 0 3h40m 10.131.0.15 ip-10-0-203-92.us-east-2.compute.internal <none> <none> name=test-pods
2. apply one egress type policy for z1 as below:
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: egress-all-otherpod
spec:
podSelector:
matchLabels:
name: hellosdn
egress:
- {}
policyTypes:
- Egress
3. apply Ingress type policy for z1 as below:
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-all-ingress
spec:
podSelector:
matchLabels:
name: hellosdn
ingress:
- from:
- namespaceSelector:
matchLabels:
team: operations
- podSelector:
matchLabels:
name: test-pods
policyTypes:
- Ingress
4. Create another namespace z2 and label with 'name=operations'
# oc get pod -n z2 --show-labels
NAME READY STATUS RESTARTS AGE LABELS
hello-sdn-8j8jx 1/1 Running 0 3h44m name=hellosdn
hello-sdn-zqr47 1/1 Running 0 3h47m name=hellosdn
test-rc-247r8 1/1 Running 0 3h44m name=test-pods
test-rc-5slzr 1/1 Running 0 3h47m name=test-pods
5. Create another namespace z3 but without label "name=operatorions'
# oc get namespace z2 z3 --show-labels
NAME STATUS AGE LABELS
z2 Active 8h kubernetes.io/metadata.name=z2,team=operations
z3 Active 4h38m kubernetes.io/metadata.name=z3
# oc get pod -n z3 --show-labels
NAME READY STATUS RESTARTS AGE LABELS
hello-sdn-ld572 1/1 Running 0 3h49m name=hellosdn
hello-sdn-ntm5c 1/1 Running 0 3h45m name=hellosdn
so test result:
######z2 with test-pods can access z1- hello-sdn pod, this is expected ####
# oc exec -n z2 test-rc-247r8 -- curl curl 10.129.2.15:8080 2>/dev/null
Hello OpenShift!
######z2 with hellosdn pod also can access z1- hello-sdn pod, I think this is not expected, so file one bug https://bugzilla.redhat.com/show_bug.cgi?id=2062084 for this issue ####
# # oc exec -n z2 hello-sdn-8j8jx -- curl curl 10.129.2.15:8080 2>/dev/null
Hello OpenShift!
##### z3 with test-pods cannot be access z1-hello-sdn pods. this is expected ####
# oc exec -n z3 hello-sdn-ld572 -- curl --connect-timeout 5 10.129.2.15:8080
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:05 --:--:-- 0
curl: (28) Connection timeout after 5000 ms
command terminated with exit code 28
######
Then upgrade the cluster to 4.10.3 version
I got same test result with before upgrade.
I did not get you mentioned:
> - then there's a 50/50 chance that those pods will end up allowing
ingress traffic that they had wanted to block
Could you help the networkpolicy and the steps, thanks
> I did not hit the issue when upgrade from 4.9.23 to 4.10.3 You don't need to do an upgrade to test this (or either of the earlier 4.10 sdn NP bugs). They can all be reproduced starting with a 4.10 cluster and creating egress policies. The two earlier bugs were reported by people doing upgrades but that's just because there aren't many people doing new 4.10 installs and experimenting with egress policies yet. > I did not get you mentioned: > > > - then there's a 50/50 chance that those pods will end up allowing > > ingress traffic that they had wanted to block Yeah, so the bug is "half the time the ingress policy doesn't block the things it should block, and half the time the egress policy doesn't block the things it should block". In the part you quoted I was only talking about the ingress half because I was saying that if the egress policy doesn't work, that it's not technically a _regression_ since egress policies didn't work in 4.9 anyway. But anyway, to test this, you need to change the egress policy so that it allows some egress but not all egress, and then test that (a) the ingress policy blocks the ingress it's supposed to block, and (b) the egress policy blocks the egress that it's supposed to block. So eg, make the z1 egress policy allow egress to z1, but not to z2 and z3. Then if the z3 -> z1.hello-sdn connection fails correctly, check if a z1.hello-sdn -> z3 connection works. That ought to be blocked by the z1 egress policy, but if this bug occurs then it won't be blocked. Thanks Dan, I got the steps to reproduce this issue following your comment
1. oc get pod -n z1 -o wide --show-labels
# oc get pod -o wide -n z1 --show-labels
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
hello-sdn-7x5sk 1/1 Running 0 18h 10.129.2.15 ip-10-0-168-211.us-east-2.compute.internal <none> <none> name=hellosdn
hello-sdn-bq4wp 1/1 Running 0 18h 10.131.0.18 ip-10-0-203-92.us-east-2.compute.internal <none> <none> name=hellosdn
test-rc-f9gbq 1/1 Running 0 18h 10.129.2.14 ip-10-0-168-211.us-east-2.compute.internal <none> <none> name=test-pods
test-rc-gw7nn 1/1 Running 0 18h 10.131.0.15 ip-10-0-203-92.us-east-2.compute.internal <none> <none> name=test-pods
# oc get pod -o wide -n z3 --show-labels
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
hello-sdn-ld572 1/1 Running 0 18h 10.129.2.13 ip-10-0-168-211.us-east-2.compute.internal <none> <none> name=hellosdn
hello-sdn-ntm5c 1/1 Running 0 18h 10.131.0.8 ip-10-0-203-92.us-east-2.compute.internal <none> <none> name=hellosdn
# oc get namespace z1 z3 --show-labels
NAME STATUS AGE LABELS
z1 Active 23h kubernetes.io/metadata.name=z1
z3 Active 19h kubernetes.io/metadata.name=z3
2. apply one egress type policy for z1
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: egress-all-otherpod
spec:
podSelector:
matchLabels:
name: hellosdn
egress:
- to:
- namespaceSelector:
matchLabels:
team: openshift
policyTypes:
- Egress
3. ### z1 hellosdn pods cannot be accessed z3 pods, it's expected
# oc exec -n z1 hello-sdn-7x5sk -- curl --connect-timeout 5 10.129.2.13:8080
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:05 --:--:-- 0
curl: (28) Connection timeout after 5000 ms
command terminated with exit code 28
4. then apply another ingress policy for z1 as below
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-all-ingress
spec:
podSelector:
matchLabels:
name: hellosdn
ingress:
- from:
- namespaceSelector:
matchLabels:
team: operations
podSelector:
matchLabels:
name: test-pods
policyTypes:
- Ingress
# oc get networkpolicy -n z1
NAME POD-SELECTOR AGE
allow-all-ingress name=hellosdn 29s
egress-all-otherpod name=hellosdn 16m
5. Now z1 hellosdn pods can be able to access z3 pods, it's NOT expected
# oc exec -n z1 hello-sdn-7x5sk -- curl --connect-timeout 5 10.129.2.13:8080 2>/dev/null
Hello OpenShift!
Verified this bug on 4.11.0-0.nightly-2022-03-14-063112 with steps comment 3 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |