Bug 1293251
Summary: | Can not access service endpoint between different nodes. | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Johnny Liu <jialiu> | ||||
Component: | Networking | Assignee: | Dan Winship <danw> | ||||
Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 3.1.0 | CC: | aos-bugs, bleanhar, eparis, haowang, jkrieger, jokerman | ||||
Target Milestone: | --- | Keywords: | Regression, TestBlocker | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-01-26 19:20:25 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Johnny Liu
2015-12-21 08:46:06 UTC
Should be fixed by https://github.com/openshift/openshift-sdn/pull/236 Checked on origin with latest origin code and openshift-sdn code, the issue still can be reproduced. $ oc get po -o wide NAME READY STATUS RESTARTS AGE NODE test-rc-vra8y 1/1 Running 0 10m node2.bmeng.local test-rc-z2hv3 1/1 Running 0 10m node3.bmeng.local All the pods can be accessed from node directly. $ oc get svc NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE test-service 172.30.19.173 <none> 27017/TCP name=test-pods 11m The service can be accessed only from node2 and node3. And when accessing the service on node2 and node3, 50% of tries will fail, due to maybe the failed tries point to the pod on the other node. The below openflow rule appears on all the node cookie=0x0, duration=848.950s, table=3, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:6 hm... works for me... can you get debug.sh output? (https://raw.githubusercontent.com/openshift/openshift-sdn/master/hack/debug.sh) Created attachment 1112328 [details]
debug log
Here is the debug log from my env.
I cannot reproduce this bug anymore with origin build: # openshift version openshift v1.1-730-gad80e1f kubernetes v1.1.0-origin-1107-g4c8e6f4 openshift-sdn build: da8ad5dc5c94012eb222221d909b2b6fa678500f Should be fixed by https://github.com/openshift/openshift-sdn/pull/237 ? Re-test this bug with atomic-openshift-sdn-ovs-3.1.1.1-1.git.0.dba03a7.el7aos.x86_64 in AtomicOpenShift/3.1/2016-01-11.1, still does NOT work. Seem like the fix PR is not merged into OSE from upstream yet. So move the status to MODIFIED. To be easier to track this bug's status, move this bug to "ASSIGNED". moving ON_QA as i'm told there was a new OSE build this morning. Verified this bug with AtomicOpenShift/3.1/2016-01-13.1 puddle, and PASS. [root@openshift-125 ~]# curl 172.30.240.45:5000 [root@openshift-125 ~]# No hang is seen there. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:0070 |