Description of problem: Typically through automation, when a project is deleted and recreated or a specific service is deleted and recreated we see an error that the NodePort is already in use even though the server is deleted and it no longer exists in iptables. Thu 19 Sep 2019 09:24:27 EDT service "nodeport-test" deleted The Service "nodeport-test" is invalid: spec.ports[0].nodePort: Invalid value: 30006: provided port is already allocated Thu 19 Sep 2019 09:24:30 EDT The Service "nodeport-test" is invalid: spec.ports[0].nodePort: Invalid value: 30006: provided port is already allocated Thu 19 Sep 2019 09:24:32 EDT service/nodeport-test created Depending on the environment (possibly load related), you may see no delay or a significant delay into the minutes. It looks like cleaning up the resources of the service is handled via rs.releaseAllocatedResources(svc) asynchronously. This is probably fine for ClusterIPs, but for something usually statically used like a NodePort, you see issues like this. https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/service/storage/rest.go#L267 And depending on how long it takes for the Release to be committed into etcd, determines how long between service deletion and the ability to actually reuse that port: https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/service/allocator/storage/storage.go#L143 https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/service/allocator/storage/storage.go#L163 Version-Release number of selected component (if applicable): Customer 3.10 Internal 3.11 Internal 4.2 nightly How reproducible: Pretty reproducible against the Sept 16th 4.2 nightly Steps to Reproduce: Example Node Port: apiVersion: v1 kind: Service metadata: labels: app: nodeport-admin name: nodeport-test spec: type: NodePort externalTrafficPolicy: Cluster ports: - name: 8080-tcp port: 8080 protocol: TCP nodePort: 30006 selector: deploymentconfig: nodeport-app Example script: #!/bin/bash date oc delete svc nodeport-test #while [ -n "$(oc create -f nodeport.yml 2>&1 > /dev/null)" ]; do while ! oc create -f nodeport.yml; do date done Actual results: The Service "nodeport-test" is invalid: spec.ports[0].nodePort: Invalid value: 30006: provided port is already allocated Expected results: The service should remove the nodeport and sync. Right now it feels like the service is deleted, but the removal of the nodeport is happening async and, depending on the environment, taking a long time. Additional info: This has been discussed and reported upstream: https://bugzilla.redhat.com/show_bug.cgi?id=1571752 https://github.com/kubernetes/kubernetes/issues/32987 https://github.com/kubernetes/kubernetes/issues/73140
This bug hasn't had any engineering activity in the last ~30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale". If you have further information on the current state of the bug, please update it and remove the "LifecycleStale" keyword, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.
The bug is still relevant. The is a PR open looking to address it. https://github.com/kubernetes/kubernetes/pull/89937
This bug hasn't had any activity 7 days after it was marked as LifecycleStale, so we are closing this bug as WONTFIX. If you consider this bug still valuable, please reopen it or create new bug.
https://github.com/kubernetes/kubernetes/pull/89937 is in the merge queue upstream finally.
Created a pick for 3.11. If it works without huge effort, we are fine. If not, this is probably not going to happen.
Per https://github.com/kubernetes/kubernetes/pull/89937 , need a 3.11 cluster that has 3 masters to verify. Currently on hand 3.11 envs are 1 master clusters. Will launch HA cluster tomorrow to verify.
Verified in 3.11.248 HA cluster: $ cat test.sh echo "`date` begins" ./oc delete svc nodeport-test while ! ./oc create -f nodeport.yml; do echo "`date` failed" done $ i=0 $ while true; do bash test.sh let i+=1 echo "time: $i" echo done |& tee test.log # after many loop times in above script, search "failed", no "failed", means the bug is also not reproduced. $ vi test.log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2990