Bug 1892376
Summary: | Deleted netnamespace could not be re-created | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Rejeeb <rabdulra> |
Component: | Networking | Assignee: | Surya Seetharaman <surya> |
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | aconstan, apurty, bbennett, eparis, surya |
Version: | 4.5 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: Deleting the network namespace before the namespace
Consequence: When you delete the namespace after deleting the network namespace, it will return an error complaining it couldn't find the netns and will not remove the netns from the internal sdn cache. As a result, user cannot recreate the netns later since sdn will not allow this as it thinks the netns is still in play (as per it's local cache).
Fix: While deleting the namespace, if we find the network namespace to be already deleted, instead of error-ing, we emit a warning and proceed to remove the netns from the local cache.
Result: The user can now recreate the netns. Basically this bug fix stops caring about the order of deletion between netns and ns.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:28:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1894155 |
Description
Rejeeb
2020-10-28 15:20:20 UTC
Reproduced on 4.5.14 gcp cluster: Deleting netns before project: $ oc new-project test-amma $ oc get netnamespace | grep test-amma test-amma 2927755 $ oc delete netnamespace test-amma $ oc delete project test-amma $ oc get project | grep test-amma $ oc get netnamespace | grep test-amma $ oc new-project test-amma $ oc get project | grep test-amma test-amma Active $ oc get netnamespace | grep test-amma $ Deleting project before netns: $ oc new-project test-maam $ oc get netnamespace | grep test-maam test-maam 13424157 $ oc delete project test-maam project.project.openshift.io "test-maam" deleted $ oc get netnamespace | grep test-maam $ oc get project | grep test-maam $ oc new-project test-maam $ oc get project | grep test-maam test-maam Active $ oc get netnamespace | grep test-maam test-maam 6935484 My suspicion is that the expected order of deletion would be to delete the project which would wipe out the netns as well. Somehow when the netns is deleted separately this is probably not tracked/removed properly from some cache due to which when the project is re-created with the same name, it is not creating the netns probably cause there is a stale entry in some watch cache. Looking into the code to confirm what is happening. ok so looked into the code. It is as I said in the previous comment. When the namespace/project deletion is triggered on the watcher, it immediately goes to delete the netnamespace and corresponding revokeVNID. Plus it also calls the "releaseNetID" to remove this netns from the vmap *masterVNIDMap cache. The logic doesn't expect the netns to be already deleted and hence it errors and falls-back without removing the netid from the vmap. That is why when the project is recreated with the same name, it doesn't recreate the netid/netns since according to vmap cache, that netns is already supposed to exist. I'll put up a PR that can fix this. Verified this bug on 4.7.0-0.nightly-2020-11-04-224753 1. oc new-project z1 2. oc delete netnamespace z1 3. oc delete project z1 4. oc new-project z1 5. Check the netnamespace is created oc get netnamespace | grep z1 z1 13130068 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |