Description of problem: An application created from the template "Apache HTTP Server (httpd)" using the example Git repository (and presumably any other application) works fine (readiness probe and curl on localhost from terminal are both able to access it normally), but when I try to access it via the route (http://testrote-apache-test.193b.starter-ca-central-1.openshiftapps.com/), it shows an error (Application is not available...) The same thing happens with both HTTP and HTTPS. Version-Release number of selected component (if applicable): v3.9.14 (online version 3.6.0.45.1) How reproducible: Always. Steps to Reproduce: 1. Create a project (and an application) using the "Apache HTTP Server (httpd)" template and the sample repository (https://github.com/openshift/httpd-ex.git). 2. The URL of the generated route is opened in a browser (in my case http://testrote-apache-test.193b.starter-ca-central-1.openshiftapps.com/). Actual results: The error "The application is currently not serving requests at this endpoint. It may not have been started or is still starting." Expected results: Default OpenShift Apache page. Additional info: This may be a duplicate of bug ID 1449438.
I can confirm that my routes have stopped working over the weekend too. Related SO question: https://stackoverflow.com/questions/49852367/route-to-application-stopped-working-in-openshift-online-3-9
We received a few reports from users on `starter-ca-central-1` describing the same issue. Routes for some old and newly created apps do not work, the app server itself appears to be running and responding normally when opening a remote shell and curling the localhost. One user reported that replacing the service name with an IP works for them, but I was unable to verify this - neither the service name, nor the cluster IP worked when curling a newly created (a not yet idled) PHP example app on `starter-ca-central-1` for me: ------ sh-4.2$ curl -I 127.0.0.1:8080 HTTP/1.1 200 OK Date: Tue, 17 Apr 2018 12:04:47 GMT Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips Content-Length: 63404 Content-Type: text/html; charset=UTF-8 sh-4.2$ curl -I localhost:8080 HTTP/1.1 200 OK Date: Tue, 17 Apr 2018 12:04:54 GMT Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips Content-Length: 63605 Content-Type: text/html; charset=UTF-8 sh-4.2$ curl -I phpt2.nodejs6.svc:8080 curl: (7) Failed connect to phpt2.nodejs6.svc:8080; No route to host sh-4.2$ curl -I $PHPT2_SERVICE_HOST:8080 curl: (7) Failed connect to 172.30.44.221:8080; No route to host ------
Apparently the same is true for me (neither the service name nor the IP works), so it seems like it actually is the service where the problem is. -------- sh-4.2$ curl -I 127.0.0.1:8080 HTTP/1.1 200 OK Date: Tue, 17 Apr 2018 13:13:09 GMT Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips Last-Modified: Sat, 14 Apr 2018 17:45:14 GMT ETag: "924b-569d28d0c5e80" Accept-Ranges: bytes Content-Length: 37451 Content-Type: text/html; charset=UTF-8 sh-4.2$ curl -I localhost:8080 HTTP/1.1 200 OK Date: Tue, 17 Apr 2018 13:13:19 GMT Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips Last-Modified: Sat, 14 Apr 2018 17:45:14 GMT ETag: "924b-569d28d0c5e80" Accept-Ranges: bytes Content-Length: 37451 Content-Type: text/html; charset=UTF-8 sh-4.2$ curl -I apache-test.apache-test.svc:8080 curl: (7) Failed connect to apache-test.apache-test.svc:8080; No route to host sh-4.2$ curl -I $APACHE_TEST_SERVICE_HOST:8080 curl: (7) Failed connect to 172.30.55.209:8080; No route to host
Today, both the service name and cluster IP return the expected response for me since ~ 11:11 UTC for the already deployed test app (that was scaled down over night and scaled up today to 1 replica about 6h ago) on starter-ca-central-1. The respective route is not working yet though, no matter the service is reachable internally via IP and name for almost 3h. Newly created apps appear to be reachable via service IP and name eventually (I noticed 0-15 minutes delay after deployment), but not reachable externally via the respective route again.
Stefanie and I debugged this issue and it turned out to be a bad route (clusterIp was None) that was causing the router to fail to parse/load and hence erroring out while loading the config. So, all new routes were not being loaded. While the specific issue has been fixed, we need to dig into why the router did not validate against this route and/or drop/ignore it.
Fixed by https://github.com/openshift/origin/pull/19416
Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/e339d7305618a6b596f14e69dd47f8f9aa50a69f Bug 1567532 - Unidle handling in router should ignore headless services. - Currently, service with empty ClusterIP is ignored but headless services can have ClusterIP="None" and those need to be ignored as well. https://github.com/openshift/origin/commit/55b4666ac696293b369258fd6a2fd72c1e33ba7e Merge pull request #19416 from pravisankar/fix-router-endpoint-id Bug 1567532 - Unidle handling in router should ignore headless services
Verified this bug on v3.10.0-0.47.0 steps 1. create one pod oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/header-test/dc.json 2. Create headless service oc create service clusterip header-test --clusterip="None" --tcp=8080:8080 3. idle the headless service oc idle header-test 4. Check the haproxy logs in.go:265] headless service z1/header-test was marked as idled, but cannot setup unidling without a cluster IP
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816