1567532 – Web server works fine, but is inaccessible via a route (Application is not available...)

Bug 1567532 - Web server works fine, but is inaccessible via a route (Application is not available...)

Summary: Web server works fine, but is inaccessible via a route (Application is not av...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.9.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Ravi Sankar
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-14 19:35 UTC by Tomáš Glozar
Modified:	2022-08-04 22:20 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Headless service will have service.Spec.ClusterIP=None and this is not ignored as part of un-idling. Consequence: This will generate incorrect endpoint ID in the HAProxy config and the config will fail to load. This leads to router not servicing any routes. Fix: Ignore headless services during unidle handling Result: No problem with HAProxy config loading and router will service routes as expected.
Clone Of:
Environment:
Last Closed:	2018-07-30 19:12:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Origin (Github)	19416	0	None	None	None	2018-04-26 16:57:02 UTC
Red Hat Product Errata	RHBA-2018:1816	0	None	None	None	2018-07-30 19:13:33 UTC

Description Tomáš Glozar 2018-04-14 19:35:38 UTC

Description of problem:
An application created from the template "Apache HTTP Server (httpd)" using the example Git repository (and presumably any other application) works fine (readiness probe  and curl on localhost from terminal are both able to access it normally), but when I try to access it via the route (http://testrote-apache-test.193b.starter-ca-central-1.openshiftapps.com/), it shows an error (Application is not available...)

The same thing happens with both HTTP and HTTPS.

Version-Release number of selected component (if applicable):
v3.9.14 (online version 3.6.0.45.1) 

How reproducible:
Always.

Steps to Reproduce:
1. Create a project (and an application) using the "Apache HTTP Server (httpd)" template and the sample repository (https://github.com/openshift/httpd-ex.git).
2. The URL of the generated route is opened in a browser (in my case http://testrote-apache-test.193b.starter-ca-central-1.openshiftapps.com/).

Actual results:
The error "The application is currently not serving requests at this endpoint. It may not have been started or is still starting."

Expected results:
Default OpenShift Apache page.

Additional info:
This may be a duplicate of bug ID 1449438.

Comment 1 Max DW 2018-04-16 08:37:56 UTC

I can confirm that my routes have stopped working over the weekend too.

Related SO question: https://stackoverflow.com/questions/49852367/route-to-application-stopped-working-in-openshift-online-3-9

Comment 2 Jiří Fiala 2018-04-17 12:20:55 UTC

We received a few reports from users on `starter-ca-central-1` describing the same issue. Routes for some old and newly created apps do not work, the app server itself appears to be running and responding normally when opening a remote shell and curling the localhost. One user reported that replacing the service name with an IP works for them, but I was unable to verify this - neither the service name, nor the cluster IP worked when curling a newly created (a not yet idled) PHP example app on `starter-ca-central-1` for me:

------
sh-4.2$ curl -I 127.0.0.1:8080
HTTP/1.1 200 OK
Date: Tue, 17 Apr 2018 12:04:47 GMT
Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips
Content-Length: 63404
Content-Type: text/html; charset=UTF-8

sh-4.2$ curl -I localhost:8080
HTTP/1.1 200 OK
Date: Tue, 17 Apr 2018 12:04:54 GMT
Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips
Content-Length: 63605
Content-Type: text/html; charset=UTF-8

sh-4.2$ curl -I phpt2.nodejs6.svc:8080
curl: (7) Failed connect to phpt2.nodejs6.svc:8080; No route to host
sh-4.2$ curl -I $PHPT2_SERVICE_HOST:8080
curl: (7) Failed connect to 172.30.44.221:8080; No route to host
------

Comment 3 Tomáš Glozar 2018-04-17 13:29:36 UTC

Apparently the same is true for me (neither the service name nor the IP works), so it seems like it actually is the service where the problem is.  

--------
sh-4.2$ curl -I 127.0.0.1:8080
HTTP/1.1 200 OK
Date: Tue, 17 Apr 2018 13:13:09 GMT
Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips
Last-Modified: Sat, 14 Apr 2018 17:45:14 GMT
ETag: "924b-569d28d0c5e80"
Accept-Ranges: bytes
Content-Length: 37451
Content-Type: text/html; charset=UTF-8

sh-4.2$ curl -I localhost:8080
HTTP/1.1 200 OK
Date: Tue, 17 Apr 2018 13:13:19 GMT
Server: Apache/2.4.27 (Red Hat) OpenSSL/1.0.1e-fips
Last-Modified: Sat, 14 Apr 2018 17:45:14 GMT
ETag: "924b-569d28d0c5e80"
Accept-Ranges: bytes
Content-Length: 37451
Content-Type: text/html; charset=UTF-8

sh-4.2$ curl -I apache-test.apache-test.svc:8080
curl: (7) Failed connect to apache-test.apache-test.svc:8080; No route to host
sh-4.2$ curl -I $APACHE_TEST_SERVICE_HOST:8080
curl: (7) Failed connect to 172.30.55.209:8080; No route to host

Comment 4 Jiří Fiala 2018-04-18 14:12:19 UTC

Today, both the service name and cluster IP return the expected response for me since ~ 11:11 UTC for the already deployed test app (that was scaled down over night and scaled up today to 1 replica about 6h ago) on starter-ca-central-1.
The respective route is not working yet though, no matter the service is reachable internally via IP and name for almost 3h.

Newly created apps appear to be reachable via service IP and name eventually (I noticed 0-15 minutes delay after deployment), but not reachable externally via the respective route again.

Comment 5 Abhishek Gupta 2018-04-18 19:58:06 UTC

Stefanie and I debugged this issue and it turned out to be a bad route (clusterIp was None) that was causing the router to fail to parse/load and hence erroring out while loading the config. So, all new routes were not being loaded.

While the specific issue has been fixed, we need to dig into why the router did not validate against this route and/or drop/ignore it.

Comment 6 Ravi Sankar 2018-04-18 21:22:07 UTC

Fixed by https://github.com/openshift/origin/pull/19416

Comment 7 openshift-github-bot 2018-04-23 20:29:50 UTC

Commits pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/e339d7305618a6b596f14e69dd47f8f9aa50a69f
Bug 1567532 - Unidle handling in router should ignore headless services.

- Currently, service with empty ClusterIP is ignored but headless services
can have ClusterIP="None" and those need to be ignored as well.

https://github.com/openshift/origin/commit/55b4666ac696293b369258fd6a2fd72c1e33ba7e
Merge pull request #19416 from pravisankar/fix-router-endpoint-id

Bug 1567532 - Unidle handling in router should ignore headless services

Comment 9 zhaozhanqi 2018-05-17 06:56:37 UTC

Verified this bug on v3.10.0-0.47.0

steps

1. create one pod 
 oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/header-test/dc.json

2. Create headless service
  oc create service clusterip header-test --clusterip="None" --tcp=8080:8080

3. idle the headless service
  oc idle header-test

4. Check the haproxy logs
in.go:265] headless service z1/header-test was marked as idled, but cannot setup unidling without a cluster IP

Comment 11 errata-xmlrpc 2018-07-30 19:12:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816

Note You need to log in before you can comment on or make changes to this bug.