1562184 – Only first connection against an app with "oc idle" is connected

Bug 1562184 - Only first connection against an app with "oc idle" is connected

Summary: Only first connection against an app with "oc idle" is connected

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Dan Mace
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-29 17:50 UTC by Steven Walter
Modified:	2022-08-04 22:20 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-11-04 22:04:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	https://github.com/openshift origin pull 19205	0	None	None	None	2019-11-27 09:08:09 UTC

Description Steven Walter 2018-03-29 17:50:01 UTC

Description of problem:
If you idle a simple web application, e.g. cakephp sample, the following behavior occurs.

First HTTP request is "held", in accordance with idling documentation, until the pod is spun up and becomes active, and then the request is passed to the service.
However, if you make multiple HTTP requests, only the first is held, but while the application is spinning up, the subsequent requests received the "Application is not available" page. 

We believe this to be because the "idling" annotation/process recognizes that the request needs to be held, but the subsequent requests are simply hitting a pod that has not yet started/passed it's readiness checks yet, and are thus treated as normal requests hitting a pod that hasn't started, which is the "Application is not available" page.


Version-Release number of selected component (if applicable):
3.7

How reproducible:
Confirmed

Steps to Reproduce/Actual results:

# oc idle ruby-ex
The service "scc/ruby-ex" has been marked as idled 
...
$ oc get po
NAME              READY     STATUS      RESTARTS   AGE
ruby-ex-1-build   0/1       Completed   0          22m

Now, in two terminal windows I run this same command:
# curl -kv ruby-ex-scc.apps.mamonaku.quicklab.rdu2.cee.redhat.com

When using TCP readiness checks, in the first window I get:
> Accept: */*
> 
* Empty reply from server
* Connection #1 to host ruby-ex-scc.apps.mamonaku.quicklab.rdu2.cee.redhat.com left intact
curl: (52) Empty reply from server

It then returns. While its waiting, though, curls in the other terminal get "Application is not available".

Using HTTP readiness checks results in the first terminal getting the actual output.

Expected results:
This is sort of expected, since from the router's perspective there are just 0 pods running. But better behavior would be that we hold an "idled" flag until a pod is fully spun up, so that ALL connections are saved.

Additional info:

Comment 2 Ben Bennett 2018-04-27 18:58:10 UTC

*** Bug 1567043 has been marked as a duplicate of this bug. ***

Comment 3 Ben Bennett 2018-05-24 15:18:27 UTC

This can't be fixed until the new object-based unidling code lands since there is no way to distinguish the states during undling.  It is targeted for 3.11.

See: https://github.com/openshift/origin/pull/19205

Comment 4 Dan Mace 2018-06-07 14:38:17 UTC

(In reply to Ben Bennett from comment #3)
> This can't be fixed until the new object-based unidling code lands since
> there is no way to distinguish the states during undling.  It is targeted
> for 3.11.
> 
> See: https://github.com/openshift/origin/pull/19205

Does this mean that the bug is fixed with the merge of the linked refactor, or 19205 must land before a fix can be implemented as a followup?

Comment 5 Ben Bennett 2018-06-08 14:00:56 UTC

This bug will be fixed when the https://github.com/openshift/origin/pull/19205 lands.

Comment 6 Dan Mace 2018-08-09 14:22:11 UTC

The API changes required to support this fix are not going to be available until 4.0 at the earliest. As a result I'm targeting a fix for version 4.1.

Comment 8 Dan Mace 2019-11-04 22:04:11 UTC

We don't think this is going to be addressed in OpenShift 3. I'm going to close the bug. If circumstances change, we can of course re-open it.

Note You need to log in before you can comment on or make changes to this bug.