Bug 1322942
Summary: | Service with active endpoints not routing traffic, returns connection refused | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> | ||||
Component: | Networking | Assignee: | Ben Bennett <bbennett> | ||||
Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 3.2.0 | CC: | agoldste, aos-bugs, bmeng, ccoleman, jeder, mifiedle, spinolacastro, sross, tdawson | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-05-12 16:35:20 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Mike Fiedler
2016-03-31 17:54:17 UTC
Can you access the registry pods at their pod IPs? curl http://<ip>:5000/ ? Can you access the registry via the service? curl http://172.25.149.234:5000/ ? The registry pod logs indicate that the health check the node performs is able to retrieve /healthz after the time the build failed. I'm wondering if this is some other networking issue that's unrelated to the registry itself. Curl to service IP: root@ip-172-31-15-66: ~ # curl http://172.25.149.234:5000/ curl: (7) Failed connect to 172.25.149.234:5000; Connection refused Curl to pod IP: docker-registry #1 root@ip-172-31-15-66: ~ # curl http://172.20.7.2:5000 root@ip-172-31-15-66: ~ # docker-registry #2 root@ip-172-31-15-66: ~ # curl http://172.20.3.2:5000 root@ip-172-31-15-66: ~ # Chatted on IRC. This is not a registry issue. The registry pods are responding to requests. For some reason, the service is not routing packets to the registry pods. Continuing to debug on IRC. Hmmm... This is odd: "Service 'docker-registry' in namespace 'default' has an Endpoint pointing to pod 172.20.3.2 in namespace 't23'" -- looks like there's some failed builder pods sticking around, and their ip addresses have been reused. It looks like all the affected services seem to have the same issue (reused ip address). This appears to be the likely culprit, then, but I'm unsure of the root cause. We'll continue to investigate. Hello, i have a very close issue, suddenly i can't talk to ClusterIP, just pod IP: ex: # oc get svc NAME CLUSTER-IP EXTERNAL-IP PORT(S) SELECTOR AGE anitta 172.30.228.178 <none> 8080/TCP deploymentconfig=anitta 1h mysql 172.30.11.181 <none> 3306/TCP name=mysql 1h # oc get endpoints NAME ENDPOINTS AGE anitta 10.1.3.15:8080 1h mysql 10.1.6.8:3306 1h # telnet 172.30.11.181 3306 Trying 172.30.11.181... telnet: connect to address 172.30.11.181: Connection refused # telnet 10.1.6.8 3306 Trying 10.1.6.8... Connected to 10.1.6.8. Escape character is '^]'. J 5.6.26+Yba@X{&�ic5z-iO/: I've found the following iptables rules pointing to REJECT target: iptables -nv -L KUBE-SERVICES --line-numbers Chain KUBE-SERVICES (1 references) num pkts bytes target prot opt in out source destination 1 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.236.207 /* mateus/mysqlpizza:mysql has no endpoints */ tcp dpt:3306 reject-with icmp-port-unreachable 2 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.35.6 /* pensatica/postgresql:postgresql has no endpoints */ tcp dpt:5432 reject-with icmp-port-unreachable 3 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.226.230 /* pensatica/web:8080-tcp has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable 4 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.65.219 /* mybff/wordpress:web has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable 5 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.207.105 /* pensatica/elasticsearch:elasticsearch has no endpoints */ tcp dpt:9200 reject-with icmp-port-unreachable 6 5 300 REJECT tcp -- * * 0.0.0.0/0 172.30.11.181 /* anittaoficial/mysql:mysql has no endpoints */ tcp dpt:3306 reject-with icmp-port-unreachable 7 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.137.82 /* bluegreen/green:8080-tcp has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable 8 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.231.72 /* orlandowebtravel/site:8080-tcp has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable 9 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.186.185 /* mateus/api:8080-tcp has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable 10 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.183.213 /* facoeaconteco/mysql:mysql has no endpoints */ tcp dpt:3306 reject-with icmp-port-unreachable 11 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.67.91 /* panda/appphp:8080-tcp has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable 12 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.24.171 /* abdeployment/app-b:8080-tcp has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable 13 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.148.58 /* getup/console:8080-tcp has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable 14 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.248.49 /* bluegreen/blue:8080-tcp has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable 15 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.157.252 /* cdablog/wordpress:web has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable 16 0 0 REJECT tcp -- * * 0.0.0.0/0 172.30.9.41 /* cdablog/mysql:mysql has no endpoints */ tcp dpt:3306 reject-with icmp-port-unreachable Look that if i manually delete the rule, the error is: # telnet 172.30.11.181 3306 Trying 172.30.11.181... telnet: connect to address 172.30.11.181: No route to host This was merged to origin in https://github.com/openshift/origin/pull/8468 This should be in atomic-openshift-3.2.0.16-1.git.0.738b760.el7 which has been built and readied for qe. Checked on atomic-openshift-3.2.0.17 with steps: 1. Setup env with 1 master, 1 node and multitenant plugin 2. Create two namespaces 3. Create a service in ns1 which will match the selector "name=test-pods" 4. Create pod with label "name=test-pods" which will be in error state in ns2 5. Create normal pod with label "name=test-pods" after the above pod failed in ns1 6. Delete the error pod in ns2 7. Check the endpoints in ns1, it has the correct pod ip and port. Check the node log, Apr 19 16:01:19 ose-node1.bmeng.local atomic-openshift-node[38217]: W0419 16:01:19.091269 38217 registry.go:508] IP '10.128.2.2' was marked as used by namespace 'u1p1' (pod '7d851838-0604-11e6-a8fc-525400dd3698')... updating to namespace 'u1p1' (pod 'e0e1516f-0604-11e6-965c-525400dd3698') Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1064 |