Bug 1567532
| Summary: | Web server works fine, but is inaccessible via a route (Application is not available...) | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Tomáš Glozar <tglozar> |
| Component: | Networking | Assignee: | Ravi Sankar <rpenta> |
| Networking sub component: | router | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | abhgupta, aos-bugs, bbennett, eparis, hongli, jfiala, rpenta, scuppett |
| Version: | 3.9.0 | ||
| Target Milestone: | --- | ||
| Target Release: | 3.10.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Headless service will have service.Spec.ClusterIP=None and this is not ignored as part of un-idling.
Consequence: This will generate incorrect endpoint ID in the HAProxy config and the config will fail to load. This leads to router not servicing any routes.
Fix: Ignore headless services during unidle handling
Result: No problem with HAProxy config loading and router will service routes as expected.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-07-30 19:12:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Tomáš Glozar
2018-04-14 19:35:38 UTC
I can confirm that my routes have stopped working over the weekend too. Related SO question: https://stackoverflow.com/questions/49852367/route-to-application-stopped-working-in-openshift-online-3-9 We received a few reports from users on `starter-ca-central-1` describing the same issue. Routes for some old and newly created apps do not work, the app server itself appears to be running and responding normally when opening a remote shell and curling the localhost. One user reported that replacing the service name with an IP works for them, but I was unable to verify this - neither the service name, nor the cluster IP worked when curling a newly created (a not yet idled) PHP example app on `starter-ca-central-1` for me: ------ sh-4.2$ curl -I 127.0.0.1:8080 HTTP/1.1 200 OK Date: Tue, 17 Apr 2018 12:04:47 GMT Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips Content-Length: 63404 Content-Type: text/html; charset=UTF-8 sh-4.2$ curl -I localhost:8080 HTTP/1.1 200 OK Date: Tue, 17 Apr 2018 12:04:54 GMT Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips Content-Length: 63605 Content-Type: text/html; charset=UTF-8 sh-4.2$ curl -I phpt2.nodejs6.svc:8080 curl: (7) Failed connect to phpt2.nodejs6.svc:8080; No route to host sh-4.2$ curl -I $PHPT2_SERVICE_HOST:8080 curl: (7) Failed connect to 172.30.44.221:8080; No route to host ------ Apparently the same is true for me (neither the service name nor the IP works), so it seems like it actually is the service where the problem is. -------- sh-4.2$ curl -I 127.0.0.1:8080 HTTP/1.1 200 OK Date: Tue, 17 Apr 2018 13:13:09 GMT Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips Last-Modified: Sat, 14 Apr 2018 17:45:14 GMT ETag: "924b-569d28d0c5e80" Accept-Ranges: bytes Content-Length: 37451 Content-Type: text/html; charset=UTF-8 sh-4.2$ curl -I localhost:8080 HTTP/1.1 200 OK Date: Tue, 17 Apr 2018 13:13:19 GMT Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips Last-Modified: Sat, 14 Apr 2018 17:45:14 GMT ETag: "924b-569d28d0c5e80" Accept-Ranges: bytes Content-Length: 37451 Content-Type: text/html; charset=UTF-8 sh-4.2$ curl -I apache-test.apache-test.svc:8080 curl: (7) Failed connect to apache-test.apache-test.svc:8080; No route to host sh-4.2$ curl -I $APACHE_TEST_SERVICE_HOST:8080 curl: (7) Failed connect to 172.30.55.209:8080; No route to host Today, both the service name and cluster IP return the expected response for me since ~ 11:11 UTC for the already deployed test app (that was scaled down over night and scaled up today to 1 replica about 6h ago) on starter-ca-central-1. The respective route is not working yet though, no matter the service is reachable internally via IP and name for almost 3h. Newly created apps appear to be reachable via service IP and name eventually (I noticed 0-15 minutes delay after deployment), but not reachable externally via the respective route again. Stefanie and I debugged this issue and it turned out to be a bad route (clusterIp was None) that was causing the router to fail to parse/load and hence erroring out while loading the config. So, all new routes were not being loaded. While the specific issue has been fixed, we need to dig into why the router did not validate against this route and/or drop/ignore it. Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/e339d7305618a6b596f14e69dd47f8f9aa50a69f Bug 1567532 - Unidle handling in router should ignore headless services. - Currently, service with empty ClusterIP is ignored but headless services can have ClusterIP="None" and those need to be ignored as well. https://github.com/openshift/origin/commit/55b4666ac696293b369258fd6a2fd72c1e33ba7e Merge pull request #19416 from pravisankar/fix-router-endpoint-id Bug 1567532 - Unidle handling in router should ignore headless services Verified this bug on v3.10.0-0.47.0 steps 1. create one pod oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/header-test/dc.json 2. Create headless service oc create service clusterip header-test --clusterip="None" --tcp=8080:8080 3. idle the headless service oc idle header-test 4. Check the haproxy logs in.go:265] headless service z1/header-test was marked as idled, but cannot setup unidling without a cluster IP Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816 |