Bug 1748034
| Summary: | [3.11] SDN - migrating from multitenant to networkpolicy does not work | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Samuel <smoro> | ||||||
| Component: | Networking | Assignee: | Jason Boxman <jboxman> | ||||||
| Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> | ||||||
| Status: | CLOSED NOTABUG | Docs Contact: | |||||||
| Severity: | urgent | ||||||||
| Priority: | unspecified | CC: | aos-bugs, arghosh, bbeaudoi, bbennett, dcaldwel, jboxman, jolee, mharri, ph.hutter, weliang | ||||||
| Version: | 3.11.0 | Flags: | jboxman:
needinfo-
jboxman: needinfo- |
||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 3.11.z | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2020-05-08 18:37:28 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Samuel
2019-09-02 14:03:39 UTC
I see a needinfo flag: is there anything I can provide you with? What do you need? thanks I can't discuss customer cases in public bugzilla comments - please ensure you have the correct account permissions. As a partner, I don't see private messages here. I would see them on https://access.redhat.com/support/cases/#/case/02458754 though. Thanks The cluster has been upgraded to 3.11.135. Same issue is present. I reproduce the same issue in 3.11.98 when deploy networkpolicy cluster directly. I will work with jtanenba to debug this issue. Tested in latest v3.11.142 cluster with networkpolicy installation directly, see the same failure issue. So it is not issue about the migrating from multitenant to networkpolicy. Check other simple networkpolicy testing, the testing passed. Look the namespace selector in networkpolicy does not function correctly. (In reply to Weibin Liang from comment #10) > Tested in latest v3.11.142 cluster with networkpolicy installation directly, > see the same failure issue. > > So it is not issue about the migrating from multitenant to networkpolicy. > > Check other simple networkpolicy testing, the testing passed. > > Look the namespace selector in networkpolicy does not function correctly. Correction: Look like namespace + pod selector in networkpolicy does not function correctly. (In reply to Weibin Liang from comment #12) > (In reply to Weibin Liang from comment #10) > > Tested in latest v3.11.142 cluster with networkpolicy installation directly, > > see the same failure issue. > > > > So it is not issue about the migrating from multitenant to networkpolicy. > > > > Check other simple networkpolicy testing, the testing passed. > > > > Look the namespace selector in networkpolicy does not function correctly. > > Correction: Look like namespace + pod selector in networkpolicy does not > function correctly. namespace + pod selector was added after 3.11, but namespace selector should work in 3.11 Recall my comment 10 and 12, sorry for confusion. The latest testing results show, the problem only happened when migrating from multitenant to networkpolicy, tested in v3.11.141 namespace + pod selector should be added from 4.0, so it should be worked in 3.11 I remember. (In reply to zhaozhanqi from comment #17) > namespace + pod selector should be added from 4.0, so it should be worked in > 3.11 I remember. sorry, typo 'it should NOT be worked in 3.11' (In reply to zhaozhanqi from comment #17) > namespace + pod selector should be added from 4.0, so it should be worked in > 3.11 I remember. Indeed, the following snippet, taken from k8s docs, won't work ocp 3.x: kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-from-default-namespace spec: podSelector: ingress: - from: - namespaceSelector: matchLabels: name: default podSelector: matchLabels: name: pod-in-default-namespace And I think that's what our doc means, when it is said podSelector can not be used in combination with namespaceSelector. Now, I'm pretty sure the following should work, as it is mentioned in 3.11 docs - and I already used it with other customers running 3.10+: kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-from-default-namespace spec: podSelector: ingress: - from: - namespaceSelector: matchLabels: name: default Regards. @Samuel zhaozhanqi corrected that to 'it should NOT be worked in 3.11' understood, and I'd rather make sure it is clear that the sample we're having issues with should work. Despite using podSelector alongside namespaceSelector, this is a case that should work in 3.11. This is working for me as expected in OpenShift 3.11.141. The API schema in Comment #20 is incorrect. The `spec` object should look like this: spec: podSelector: {} ingress: - from: - namespaceSelector: matchLabels: name: default - podSelector: matchLabels: name: pod-in-this-namespace From what I understand the "from" rules are not evaluated as "AND" but as "OR". The podSelector is not related to the namespaceSelector so "pod-in-default-namespace" is really "pod-in-this-namespace". Either the namespaceSelector or the podSelector may match, the first rule that matches in the from block will allow the traffic. The documentation states "ingress: Each NetworkPolicy may include a list of whitelist ingress rules." https://kubernetes.io/docs/concepts/services-networking/network-policies/#the-networkpolicy-resource Blocks evaluated together (AND logic): - ingress - ports Blocks evaluated separately (OR logic): - ingress.ipBlock - ingress.namespaceSelector - ingress.podSelector I tested the following in the openshift-console project to show the podSelector, when specified properly, does not block traffic and was treated with "OR" when the namespace was defined. (Note, the podSelector is treated as being within the current namespace, not within the aforementioned namespace). apiVersion: extensions/v1beta1 kind: NetworkPolicy metadata: creationTimestamp: 2019-09-06T22:15:23Z generation: 15 name: allow-from-default-namespace namespace: openshift-console resourceVersion: "365202" selfLink: /apis/extensions/v1beta1/namespaces/openshift-console/networkpolicies/allow-from-default-namespace uid: d28fff2c-d0f3-11e9-94eb-000c2924178d spec: ingress: - from: - namespaceSelector: matchLabels: name: default - podSelector: matchLabels: router: foo ports: - port: 8443 protocol: TCP podSelector: matchLabels: app: openshift-console policyTypes: - Ingress [cloud-user@master1 ~]$ oc get -o yaml networkpolicy apiVersion: v1 items: - apiVersion: extensions/v1beta1 kind: NetworkPolicy metadata: creationTimestamp: 2019-09-06T22:15:23Z generation: 15 name: allow-from-default-namespace namespace: openshift-console resourceVersion: "365202" selfLink: /apis/extensions/v1beta1/namespaces/openshift-console/networkpolicies/allow-from-default-namespace uid: d28fff2c-d0f3-11e9-94eb-000c2924178d spec: ingress: - from: - namespaceSelector: matchLabels: name: default - podSelector: matchLabels: router: foo ports: - port: 8443 protocol: TCP podSelector: matchLabels: app: openshift-console policyTypes: - Ingress - apiVersion: extensions/v1beta1 kind: NetworkPolicy metadata: creationTimestamp: 2019-09-06T22:15:58Z generation: 1 name: allow-same-namespace namespace: openshift-console resourceVersion: "359411" selfLink: /apis/extensions/v1beta1/namespaces/openshift-console/networkpolicies/allow-same-namespace uid: e6f77d15-d0f3-11e9-94eb-000c2924178d spec: ingress: - from: - podSelector: {} podSelector: {} policyTypes: - Ingress - apiVersion: extensions/v1beta1 kind: NetworkPolicy metadata: creationTimestamp: 2019-09-06T22:17:54Z generation: 1 name: deny-by-default namespace: openshift-console resourceVersion: "359617" selfLink: /apis/extensions/v1beta1/namespaces/openshift-console/networkpolicies/deny-by-default uid: 2c510d23-d0f4-11e9-94eb-000c2924178d spec: podSelector: {} policyTypes: - Ingress kind: List metadata: resourceVersion: "" selfLink: "" Simple testing: [cloud-user@master1 ~]$ oc get pods -n default -l router=router NAME READY STATUS RESTARTS AGE router-1-d2rkq 1/1 Running 0 2d [cloud-user@master1 ~]$ oc get pods -n default -l router=foo No resources found. [cloud-user@master1 ~]$ oc -n default rsh dc/router curl --insecure --head https://console.openshift-console.endpoints:8443 HTTP/1.1 200 OK [...] [cloud-user@master1 ~]$ oc -n default rsh dc/registry-console curl --insecure --head https://console.openshift-console.endpoints:8443 HTTP/1.1 200 OK [...] [cloud-user@master1 ~]$ oc get -n default -o jsonpath='{.metadata.labels}' pod router-1-d2rkq map[deployment:router-1 deploymentconfig:router router:router] [cloud-user@master1 ~]$ oc get -n default -o jsonpath='{.metadata.labels}' pod registry-console-1-v9275 map[deployment:registry-console-1 deploymentconfig:registry-console name:registry-console] Negative testing with confirmation, also showing : [cloud-user@master1 ~]$ oc -n openshift-web-console rsh deployment/webconsole curl --insecure --head https://console.openshift-console.endpoints:8443 --verbose * About to connect() to console.openshift-console.endpoints port 8443 (#0) * Trying 10.128.0.164... * Connection timed out * Failed connect to console.openshift-console.endpoints:8443; Connection timed out * Closing connection 0 curl: (7) Failed connect to console.openshift-console.endpoints:8443; Connection timed out command terminated with exit code 7 [cloud-user@master1 ~]$ oc label namespace openshift-web-console name=default namespace/openshift-web-console labeled [cloud-user@master1 ~]$ oc -n openshift-web-console rsh deployment/webconsole curl --insecure --head https://console.openshift-console.endpoints:8443 HTTP/1.1 200 OK [...] [cloud-user@master1 ~]$ oc get -n openshift-web-console -o jsonpath='{.metadata.labels}' pod webconsole-7f7f679596-zngg5 map[app:openshift-web-console pod-template-hash:3939235152 webconsole:true] So, thanks to @bbeaudoi suggestion on rocketchat, ... Now that I start my routers without hostnetwork, my initial networkpolicies work just fine, as expected, without requiring any labels on the default namespace. Then, basically, when a Pod in hostnetwork reaches something in the SDN, OVS considers its traffic came from an unknown netid, which translates to netid 0. Now, say you: - deploy a router (or whichever hostnetwork-based Pod) in a Project whose netid is non-0 - you want to setup cross-namespaces networkpolicies allowing that hostnetwork-based Pod communications with some application Then there is no use in labeling the router Pod namespace. Instead, we should label the default namespace, or any namespace whose netid is 0. Which isn't really intuitive, ... Now, say we deploy several routers in different Projects, trying to segregate traffic using networkpolicies and host projects in "groups" thata should not, in any way, see each others. Then we should get rid of hostNetwork. An alternative could be based on NodePorts Services, and EgressNetworkPolicies preventing Pods in our SDN, bypassing customers' corporate firewalls. Not sure there is an actual bug, then. Although that's something our docs could clarify, hopefully. Thanks again to Brian agreed this isn't a bug we should update our docs I've clone this bug and created a docs bug for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1750429 Thanks. I might have been too soft. And sure, we can fix the doc, at the very least.
But then again, we found here that Pods using hostnetwork enters the SDN with a netid0, regardless of their Project.
How isn't this a bug?!
Consider the following:
os_sdn_network_plugin_name: redhat/openshift-ovs-networkpolicy
openshift_additional_projects:
routers-mgmt:
default_node_selector: environment=mgmt
routers-dev:
default_node_selector: environment=dev
routers-prod:
default_node_selector: environment=prod
routers-stage:
default_node_selector: environment=stage
openshift_hosted_routers:
- name: routers-mgmt
certificate:
certfile: xxx
keyfile: xxx
cafile: xxx
replicas: "{{ groups['ingress-mgmt'] | length }}"
serviceaccount: router
namespace: routers-mgmt
edits:
- action: append
key: spec.template.spec.containers[0].env
value:
name: ROUTE_LABELS
value: environment=mgmt
selector: environment=mgmt,node-role.kubernetes.io/ingress=true
[...]
- name: routers-dev
[...]
- name: routers-stage
[...]
- name: routers-prod
[...]
Customer was expecting to have its production routers having exclusive access to prod apps, stage to stage, ... and so on.
And I'm just here deploying the cluster, there's been enough architects already coming up with this, ...
Now, regardless of how many routers project we setup, everything goes through netid0.
networkpolicies matching labels on routers project won't work
networkpolicies matching labels on default namespace would open access for any hostnetwork Pod.
There's no doc fixing that could excuse this. There is a bug to be fixed.
OVS needs to know which namespace is sending traffic, regardless of how Pods are configured.
Otherwise, networkpolicies are just useless.
Created attachment 1613330 [details]
Testing passed
Created attachment 1613331 [details]
Testing failed
Hi Samuel, I run same networkpolicies testing steps in a cluster with openshift-ovs-networkpolicy installed at beginning, and a cluster with migrating from multitenant to openshift-ovs-networkpolicy, curl from a router to a application pod will failed in the cluster with migrating from multitenant to openshift-ovs-networkpolicy. Could you check my two attached logs to see if my testing steps are similar as our customer used? Thanks! Indeed, that's interesting. Last week, I would have said yes, that's pretty much what we did with customer. And that's exactly how I reproduced that issue on my own OKD. Now, as of last week-end, I think there's something more. Actually, customer is not using the default namespace. Our routers are in dedicated Projects (routers-dev, routers-stage, ...) which have their own netid. And it turns out that when a Pod uses hostNetwork, then OVS assumes that traffic belongs to netid 0, regardless of its namespace. So installing a policy in projX allowing traffic from routerX won't work. While installing a policy in projX allowing traffic from the default namespace would allow in traffic from any hostnetwork Pod (all my routers, but also etcd, ...) Regarding the issue you've reproduced, I'm still unsure how to solve it. Best I could say, is try and reboot everything. Somehow, we went through this here. I couldn't tell how for sure. But having your routers in the default namespace, then they already belong to netid 0. Policies matching a label on your default namespace should then appear to work, until you try setting up routers in non-default namespaces. As of yesterday, our networkpolicies work "as expected", as we realized that re-deploying our routers without hostnetwork, and exposing them with a NodePort Service, then all our policies started working as planned. Thanks for looking it up, Regards. Closing because there is a docs bug tracking it https://bugzilla.redhat.com/show_bug.cgi?id=1750429 |