Bug 1892376 - Deleted netnamespace could not be re-created
Summary: Deleted netnamespace could not be re-created
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.0
Assignee: Surya Seetharaman
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1894155
TreeView+ depends on / blocked
 
Reported: 2020-10-28 15:20 UTC by Rejeeb
Modified: 2021-02-24 15:29 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Deleting the network namespace before the namespace Consequence: When you delete the namespace after deleting the network namespace, it will return an error complaining it couldn't find the netns and will not remove the netns from the internal sdn cache. As a result, user cannot recreate the netns later since sdn will not allow this as it thinks the netns is still in play (as per it's local cache). Fix: While deleting the namespace, if we find the network namespace to be already deleted, instead of error-ing, we emit a warning and proceed to remove the netns from the local cache. Result: The user can now recreate the netns. Basically this bug fix stops caring about the order of deletion between netns and ns.
Clone Of:
Environment:
Last Closed: 2021-02-24 15:28:37 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sdn pull 214 0 None closed Bug 1892376: Ignore if netns is already deleted while deleting ns 2021-02-17 23:09:10 UTC
Red Hat Knowledge Base (Solution) 5525061 0 None None None 2020-11-09 12:40:28 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:29:05 UTC

Description Rejeeb 2020-10-28 15:20:20 UTC
Description of problem:
If the netnamespace is deleted before the project, it does not re-create the netnamespace when the project is re-created.

Version-Release number of selected component (if applicable):
Reproduced in OpenShift v4.5.14

How reproducible:
100%

Steps to Reproduce:

1. Create a new project:

[test45cluster@upi-0 ~]$ oc new-project testproject

2. Check the created project and corresponding netnamespace entry:

[test45cluster@upi-0 ~]$ oc get projects | grep testproject
testproject                                                                 Active
[test45cluster@upi-0 ~]$ oc get netnamespace | grep testproject
testproject                                        16699832   

3. Now delete the netnamespace first and then the project:

[test45cluster@upi-0 ~]$ oc delete netnamespace testproject
netnamespace.network.openshift.io "testproject" deleted

[quicklab@upi-0 ~]$ oc delete project testproject
project.project.openshift.io "testproject" deleted

[test45cluster@upi-0 ~]$ oc get projects | grep testproject
[test45cluster@upi-0 ~]$ oc get netnamespace | grep testproject

4. Now re-create the project with the same name:

[test45cluster@upi-0 ~]$ oc new-project testproject

[test45cluster@upi-0 ~]$ oc get netnamespace | grep testproject
[test45cluster@upi-0 ~]$ oc get projects | grep testproject
testproject                                                                 Active


Actual results: While re-creating the project with the same name, the corresponding netnamespace is not created.


Expected results: Re-creating a project after deleting should also re-create the corresponding netnamespace.


Additional info: If we delete the project first, then it is working as expected.
----------
[test45cluster@upi-0 ~]$ oc new-project testproject2

[test45cluster@upi-0 ~]$ oc get projects | grep testproject2
testproject2                                                                         Active
[test45cluster@upi-0 ~]$ oc get netnamespace | grep testproject2
testproject2                                                16474270   

[test45cluster@upi-0 ~]$ oc delete project testproject2
project.project.openshift.io "testproject2" deleted

[test45cluster@upi-0 ~]$  oc get projects | grep testproject2
[test45cluster@upi-0 ~]$ oc get netnamespace | grep testproject2

[test45cluster@upi-0 ~]$ oc new-project testproject2

[test45cluster@upi-0 ~]$ oc get projects | grep testproject2
testproject2                                                                         Active
[test45cluster@upi-0 ~]$ oc get netnamespace | grep testproject2
testproject2                                                6028713    

----------

Comment 5 Surya Seetharaman 2020-11-02 19:41:27 UTC
Reproduced on 4.5.14 gcp cluster:

Deleting netns before project:

$ oc new-project test-amma
$ oc get netnamespace | grep test-amma
test-amma                                          2927755


$ oc delete netnamespace test-amma
$ oc delete project test-amma
$ oc get project | grep test-amma
$ oc get netnamespace | grep test-amma



$ oc new-project test-amma
$ oc get project | grep test-amma
test-amma                                                         Active
$ oc get netnamespace | grep test-amma
$ 

Deleting project before netns:

$ oc new-project test-maam
$ oc get netnamespace | grep test-maam
test-maam                                          13424157   
$ oc delete project test-maam
project.project.openshift.io "test-maam" deleted
$ oc get netnamespace | grep test-maam
$ oc get project | grep test-maam

$ oc new-project test-maam

$ oc get project | grep test-maam
test-maam                                                         Active
$ oc get netnamespace | grep test-maam
test-maam                                          6935484    


My suspicion is that the expected order of deletion would be to delete the project which would wipe out the netns as well. Somehow when the netns is deleted separately this is probably not tracked/removed properly from some cache due to which when the project is re-created with the same name, it is not creating the netns probably cause there is a stale entry in some watch cache.

Looking into the code to confirm what is happening.

Comment 6 Surya Seetharaman 2020-11-02 20:35:58 UTC
ok so looked into the code.

It is as I said in the previous comment. When the namespace/project deletion is triggered on the watcher, it immediately goes to delete the netnamespace and corresponding revokeVNID. Plus it also calls the "releaseNetID" to remove this netns from the vmap *masterVNIDMap cache. The logic doesn't expect the netns to be already deleted and hence it errors and falls-back without removing the netid from the vmap. That is why when the project is recreated with the same name, it doesn't recreate the netid/netns since according to vmap cache, that netns is already supposed to exist.

I'll put up a PR that can fix this.

Comment 12 zhaozhanqi 2020-11-05 11:52:13 UTC
Verified this bug on 4.7.0-0.nightly-2020-11-04-224753

1. oc new-project z1
2. oc delete netnamespace z1
3. oc delete project z1
4. oc new-project z1
5. Check the netnamespace is created 
 oc get netnamespace | grep z1
z1                                                 13130068

Comment 17 errata-xmlrpc 2021-02-24 15:28:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.