Bug 1928851
| Summary: | manually creating NetNamespaces will break things and this is not obvious | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Dan Winship <danw> |
| Component: | Networking | Assignee: | Mohamed Mahmoud <mmahmoud> |
| Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aconstan, astoycos, zzhao |
| Version: | 4.8 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Users manually modify network namespaces in use by SDN.
Consequence: If this namespace is already in use by network policy, the network policy behavior will be changed causing undesired behaviors.
Fix: While customers are discouraged from manually modifying namespaces in general, the code has been updated to prevent such behavior from corrupting deployed network policies and log a warning with this happens.
Result: test shows the warning has been logged and network policy behavior hasn't been changed.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 22:44:45 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Dan Winship
2021-02-15 16:32:44 UTC
Have a test on 4.8.0-0.nightly-2021-05-11-025533
seems user still can modify the netID,
1. oc new-project z4
# oc get netnamespace z4
NAME NETID EGRESS IPS
z4 15085403
2. using `oc edit netnamespace z4` to update the netid to 0
# oc edit netnamespace z4
error: netnamespaces.network.openshift.io "z4" could not be patched: the body of the request was in an unknown format - accepted media types include: application/json-patch+json, application/merge-patch+json, application/apply-patch+yaml
You can run `oc replace -f /tmp/oc-edit-o92w3.yaml` to try this update again.
# cat /tmp/oc-edit-o92w3.yaml
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: network.openshift.io/v1
kind: NetNamespace
metadata:
creationTimestamp: "2021-05-11T10:59:03Z"
generation: 1
managedFields:
- apiVersion: network.openshift.io/v1
fieldsType: FieldsV1
fieldsV1:
f:netid: {}
f:netname: {}
manager: openshift-sdn-controller
operation: Update
time: "2021-05-11T10:59:03Z"
name: z4
resourceVersion: "189337"
uid: 664c9db1-36c2-446a-8017-63e0142d3c7d
netid: 0
netname: z4
3. # oc replace -f /tmp/oc-edit-o92w3.yaml
netnamespace.network.openshift.io/z4 replaced
4. # oc get netnamespace z4
NAME NETID EGRESS IPS
z4 0 ### Find the netid already updated to 0 same with default
# oc get netnamespace default
NAME NETID EGRESS IPS
default 0
@zzhao 1- how you sure the above build contains my fix ? 2- do u see any log like this "Netid 0 for namespace z4 already exists under different namespace default" 3- I confirmed the fix by adding UT testcases for duplicate and none duplicate cases (In reply to Mohamed Mahmoud from comment #2) > @zzhao > 1- how you sure the above build contains my fix ? > 2- do u see any log like this "Netid 0 for namespace z4 already exists under > different namespace default" > 3- I confirmed the fix by adding UT testcases for duplicate and none > duplicate cases just we are clear we aren't preventing customers from changing netns manually what we are doing in this PR is to protect SDN when this happens so we won't attempt to add/delete policy for this modified netns. in your test pls look for the warning message I shared above to confirm the new behavior we also need to doc in this BZ that customers are discouraged from manually modifying namespaces so we are covered. (In reply to Mohamed Mahmoud from comment #2) > @zzhao > 1- how you sure the above build contains my fix ? we can check this build already including the fix in https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.8.0-0.nightly/release/4.8.0-0.nightly-2021-05-11-025533 > 2- do u see any log like this "Netid 0 for namespace z4 already exists under > different namespace default" yes, I can see `W0512 01:15:37.790443 1730 vnids.go:219] Netid 0 for namespace z4 already exists under different namespace default` > 3- I confirmed the fix by adding UT testcases for duplicate and none > duplicate cases (In reply to Mohamed Mahmoud from comment #3) > (In reply to Mohamed Mahmoud from comment #2) > > @zzhao > > 1- how you sure the above build contains my fix ? > > 2- do u see any log like this "Netid 0 for namespace z4 already exists under > > different namespace default" > > 3- I confirmed the fix by adding UT testcases for duplicate and none > > duplicate cases > > just we are clear we aren't preventing customers from changing netns > manually what we are doing in this PR is to protect SDN when this happens so > we won't attempt to add/delete policy for this modified netns. in your test > pls look for the warning message I shared above to confirm the new behavior > > we also need to doc in this BZ that customers are discouraged from manually > modifying namespaces so we are covered. ok, if only prevent add/delete modified netid, this issue already be fixed when I attempted add deny policy in z4 namespace. it did not affect the default namespace pod communication # cat deny.yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all spec: podSelector: null # oc get networkpolicy -n z4 NAME POD-SELECTOR AGE default-deny-all <none> 7s # oc exec -n z4 test-rc-7gswb -- curl --connect-timeout 4 10.128.2.24:8080 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0 curl: (28) Connection timed out after 4001 milliseconds command terminated with exit code 28 ###check the default pod### # oc get pod -n default -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-rc-jp2kq 1/1 Running 0 10m 10.131.0.24 ip-10-0-215-250.ap-northeast-1.compute.internal <none> <none> test-rc-sdtq6 1/1 Running 0 10m 10.129.3.73 ip-10-0-186-183.ap-northeast-1.compute.internal <none> <none> ##check the pods can communicate each other in default namespace # oc exec -n default test-rc-jp2kq -- curl 10.129.3.73:8080 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 17 100 17 0 0 5666 0 --:--:-- --:--:-- --:--:-- 5666 Hello OpenShift! Move this bug to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |