Description of problem: Deleting one service with one nodePort does not release the port and can not be allocated again until 5-10 minutes later. Trying to check if the port is still allocated does not show any evidence: $ netstat -puntal | grep 31414 $ iptables-save | grep 31414 $ ss -tulpn | grep 31414 This kukbernetes bug is related -> https://github.com/kubernetes/kubernetes/issues/32987 As pointed by the users in that bug, this still happens in k8s v1.7.4 v1.8.7 v1.9.4 Version-Release number of selected component (if applicable): OCP versions 3.6 How reproducible: Not happens always and not in all clusters. The time to wait until port is releases also varies. It seems like Kubernetes api / proxy is some how caching nodePort for some grace period although it is not allocated. Steps to Reproduce: 1. Create one service specifying one nodePort 2. Remove service with the nodePort 3. Checking with iptables or netstat does not show that port being used. 4. Immediately try to create the same service with the same port 5. It can't let you create it until have passed some minutes with the following error: error: Service "gateway" is invalid: spec.ports[0].nodePort: Invalid value: 30120: provided port is already allocated serviceaccount "gateway" created Actual results: For some automated deployments where you need to remove and create one service for different customers this breaks the automated deployment. Expected results: To be able to create the service with the same nodePort immediately after it was deleted Additional info: This kukbernetes bug is related -> https://github.com/kubernetes/kubernetes/issues/32987
@Ravi: Please comment with what you have found so far.
@Gonzalo, tested on OCP 3.6 HA env with 3 masters and two nodes, can not reproduce the problem, below is my testing steps and logs, is my testing env too small to see the problem? [root@ip-172-18-7-208 ~]# oc version oc v3.6.173.0.118 kubernetes v1.6.1+5115d708d7 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ec2-54-243-2-232.compute-1.amazonaws.com openshift v3.6.173.0.118 kubernetes v1.6.1+5115d708d7 [root@ip-172-18-7-208 ~]# oc get nodes NAME STATUS AGE VERSION ip-172-18-10-218.ec2.internal Ready,SchedulingDisabled 2h v1.6.1+5115d708d7 ip-172-18-12-193.ec2.internal Ready 2h v1.6.1+5115d708d7 ip-172-18-4-101.ec2.internal Ready 2h v1.6.1+5115d708d7 ip-172-18-5-165.ec2.internal Ready,SchedulingDisabled 2h v1.6.1+5115d708d7 ip-172-18-7-208.ec2.internal Ready,SchedulingDisabled 2h v1.6.1+5115d708d7 [root@ip-172-18-7-208 ~]# cat svc.yaml apiVersion: v1 kind: Service metadata: name: nginx labels: name: nginx spec: type: NodePort ports: - port: 80 nodePort: 30080 name: http - port: 443 nodePort: 31414 name: https selector: name: nginx [root@ip-172-18-7-208 ~]# for i in {1..100}; do oc create -f svc.yaml ; sleep 1; netstat -puntal | grep 31414; iptables-save | grep 31414; ss -tulpn | grep 31414; oc delete svc nginx; sleep 1; netstat -puntal | grep 31414; iptables-save | grep 31414; ss -tulpn | grep 31414; done service "nginx" created tcp6 0 0 :::31414 :::* LISTEN 14733/openshift -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-MARK-MASQ -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-SVC-N3YB2VSZWW2B76BJ -A KUBE-SERVICES -p tcp -m comment --comment "default/nginx:https has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31414 -j REJECT --reject-with icmp-port-unreachable tcp LISTEN 0 128 :::31414 :::* users:(("openshift",pid=14733,fd=21)) service "nginx" deleted service "nginx" created tcp6 0 0 :::31414 :::* LISTEN 14733/openshift -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-MARK-MASQ -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-SVC-N3YB2VSZWW2B76BJ -A KUBE-SERVICES -p tcp -m comment --comment "default/nginx:https has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31414 -j REJECT --reject-with icmp-port-unreachable tcp LISTEN 0 128 :::31414 :::* users:(("openshift",pid=14733,fd=21)) service "nginx" deleted service "nginx" created tcp6 0 0 :::31414 :::* LISTEN 14733/openshift -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-MARK-MASQ -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-SVC-N3YB2VSZWW2B76BJ -A KUBE-SERVICES -p tcp -m comment --comment "default/nginx:https has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31414 -j REJECT --reject-with icmp-port-unreachable tcp LISTEN 0 128 :::31414 :::* users:(("openshift",pid=14733,fd=21)) service "nginx" deleted service "nginx" created tcp6 0 0 :::31414 :::* LISTEN 14733/openshift -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-MARK-MASQ -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-SVC-N3YB2VSZWW2B76BJ -A KUBE-SERVICES -p tcp -m comment --comment "default/nginx:https has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31414 -j REJECT --reject-with icmp-port-unreachable tcp LISTEN 0 128 :::31414 :::* users:(("openshift",pid=14733,fd=21)) service "nginx" deleted ^C [root@ip-172-18-7-208 ~]#
@gmarcote I have tried to reproduce on 3.10 HA setup (3 api servers, 3 etcd cluster) with the given reproduction steps but I was not successful. Weibin tried on 3.6 HA setup (as mentioned in previous comment) and he couldn't reproduce either. I'm pretty sure there is an issue here and my hunch is related to nodePort allocator handling mismatch between in memory map vs data in etcd. If that is the case, I have the patch ready: https://github.com/openshift/origin/commit/f465e8ceff12d4c58a76a480c4d34461eaf4cdbe If you could provide more details about the HA setup and exact reproduction steps then we could test our patch.
We are unable to reproduce this. If you can provide more information then please re-open it.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days