Bug 1571752
Summary: | Adding an already deleted nodePort to service shows already acquired error | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gonzalo Marcote <gmarcote> |
Component: | Networking | Assignee: | Ravi Sankar <rpenta> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Meng Bo <bmeng> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.6.0 | CC: | aos-bugs, bbennett, gmarcote, rpenta, weliang |
Target Milestone: | --- | ||
Target Release: | 3.11.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-15 18:30:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gonzalo Marcote
2018-04-25 11:07:23 UTC
@Ravi: Please comment with what you have found so far. @Gonzalo, tested on OCP 3.6 HA env with 3 masters and two nodes, can not reproduce the problem, below is my testing steps and logs, is my testing env too small to see the problem? [root@ip-172-18-7-208 ~]# oc version oc v3.6.173.0.118 kubernetes v1.6.1+5115d708d7 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ec2-54-243-2-232.compute-1.amazonaws.com openshift v3.6.173.0.118 kubernetes v1.6.1+5115d708d7 [root@ip-172-18-7-208 ~]# oc get nodes NAME STATUS AGE VERSION ip-172-18-10-218.ec2.internal Ready,SchedulingDisabled 2h v1.6.1+5115d708d7 ip-172-18-12-193.ec2.internal Ready 2h v1.6.1+5115d708d7 ip-172-18-4-101.ec2.internal Ready 2h v1.6.1+5115d708d7 ip-172-18-5-165.ec2.internal Ready,SchedulingDisabled 2h v1.6.1+5115d708d7 ip-172-18-7-208.ec2.internal Ready,SchedulingDisabled 2h v1.6.1+5115d708d7 [root@ip-172-18-7-208 ~]# cat svc.yaml apiVersion: v1 kind: Service metadata: name: nginx labels: name: nginx spec: type: NodePort ports: - port: 80 nodePort: 30080 name: http - port: 443 nodePort: 31414 name: https selector: name: nginx [root@ip-172-18-7-208 ~]# for i in {1..100}; do oc create -f svc.yaml ; sleep 1; netstat -puntal | grep 31414; iptables-save | grep 31414; ss -tulpn | grep 31414; oc delete svc nginx; sleep 1; netstat -puntal | grep 31414; iptables-save | grep 31414; ss -tulpn | grep 31414; done service "nginx" created tcp6 0 0 :::31414 :::* LISTEN 14733/openshift -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-MARK-MASQ -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-SVC-N3YB2VSZWW2B76BJ -A KUBE-SERVICES -p tcp -m comment --comment "default/nginx:https has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31414 -j REJECT --reject-with icmp-port-unreachable tcp LISTEN 0 128 :::31414 :::* users:(("openshift",pid=14733,fd=21)) service "nginx" deleted service "nginx" created tcp6 0 0 :::31414 :::* LISTEN 14733/openshift -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-MARK-MASQ -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-SVC-N3YB2VSZWW2B76BJ -A KUBE-SERVICES -p tcp -m comment --comment "default/nginx:https has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31414 -j REJECT --reject-with icmp-port-unreachable tcp LISTEN 0 128 :::31414 :::* users:(("openshift",pid=14733,fd=21)) service "nginx" deleted service "nginx" created tcp6 0 0 :::31414 :::* LISTEN 14733/openshift -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-MARK-MASQ -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-SVC-N3YB2VSZWW2B76BJ -A KUBE-SERVICES -p tcp -m comment --comment "default/nginx:https has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31414 -j REJECT --reject-with icmp-port-unreachable tcp LISTEN 0 128 :::31414 :::* users:(("openshift",pid=14733,fd=21)) service "nginx" deleted service "nginx" created tcp6 0 0 :::31414 :::* LISTEN 14733/openshift -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-MARK-MASQ -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:https" -m tcp --dport 31414 -j KUBE-SVC-N3YB2VSZWW2B76BJ -A KUBE-SERVICES -p tcp -m comment --comment "default/nginx:https has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31414 -j REJECT --reject-with icmp-port-unreachable tcp LISTEN 0 128 :::31414 :::* users:(("openshift",pid=14733,fd=21)) service "nginx" deleted ^C [root@ip-172-18-7-208 ~]# @gmarcote I have tried to reproduce on 3.10 HA setup (3 api servers, 3 etcd cluster) with the given reproduction steps but I was not successful. Weibin tried on 3.6 HA setup (as mentioned in previous comment) and he couldn't reproduce either. I'm pretty sure there is an issue here and my hunch is related to nodePort allocator handling mismatch between in memory map vs data in etcd. If that is the case, I have the patch ready: https://github.com/openshift/origin/commit/f465e8ceff12d4c58a76a480c4d34461eaf4cdbe If you could provide more details about the HA setup and exact reproduction steps then we could test our patch. We are unable to reproduce this. If you can provide more information then please re-open it. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |