1900989 – accessing the route cannot wake up the idled resources

Bug 1900989 - accessing the route cannot wake up the idled resources

Summary: accessing the route cannot wake up the idled resources

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Andrew McDermott
QA Contact:	Hongan Li
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1920454 1937667 (view as bug list)
Depends On:
Blocks:	1900991
TreeView+	depends on / blocked

Reported:	2020-11-24 07:57 UTC by Hongan Li
Modified:	2024-06-13 23:29 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1900991 (view as bug list)
Environment:
Last Closed:	2021-03-15 05:39:04 UTC
Target Upstream Version:
Embargoed:
Flags:	aygarg: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift oc pull 720	None	closed	Bug 1900989: idle: also annotate the service	2021-02-16 02:57:34 UTC
Github	openshift openshift-apiserver pull 180	None	closed	Bug 1900989: policy/unidling-controller: allow get/update on services	2021-02-16 02:57:33 UTC
Github	openshift openshift-controller-manager pull 165	None	closed	Bug 1900989: unidling: switch away from endpoints to the service	2021-02-16 02:57:34 UTC
Github	openshift origin pull 25844	None	closed	Bug 1900989: idle.sh: Drop endpoints mocks from idle tests	2021-02-16 02:57:33 UTC
Github	openshift origin pull 25850	None	closed	Bug 1900989: test/extended/router: add idle/unidle e2e test	2021-02-16 02:57:34 UTC
Red Hat Product Errata	RHSA-2020:5633	None	None	None	2021-02-24 15:35:55 UTC

Description Hongan Li 2020-11-24 07:57:13 UTC

Description of problem:
accessing the route cannot wake up the idled resources

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-11-22-204912

How reproducible:
100%

Steps to Reproduce:
1. create replication controller with service
   $ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/routing/list_for_caddy.json
2. create a route
   $ oc expose svc service-unsecure
3. $ oc idle service-unsecure
4. generate traffic for the route
   $ curl service-unsecure-hongli.apps.cluster.xxx.com -I
Actual results:
cannot wake up the idled resource. curl route always return "503" 

Expected results:
should wake up the idled resource and return "200 OK"

Additional info:
No this issue in v4.5 and I can see below backend server in haproxy.config:
server ept:service-unsecure:172.30.62.129:27017 172.30.62.129:27017 cookie a4c4cde9584afc00a56110f52cae8a5e weight 256

workaround: curl the service inside the cluster can wake up the idled resource.

Comment 1 Andrew McDermott 2020-11-25 16:39:27 UTC

This was broken by the switch to endpointslices. https://github.com/openshift/router/pull/154

Investigating a fix.

Comment 2 Andrew McDermott 2020-12-04 17:05:52 UTC

Tagging with UpcomingSprint while investigation is either ongoing or
pending. Will be considered for earlier release versions when
diagnosed and resolved.

Comment 3 Andrew McDermott 2021-01-13 17:20:17 UTC

This is a regression in the product marking it as a blocker.

Comment 5 Andrew McDermott 2021-01-19 12:17:38 UTC

Going to verify and test: https://github.com/openshift/openshift-controller-manager/pull/159

Comment 6 Andrew McDermott 2021-01-28 16:35:30 UTC

*** Bug 1920454 has been marked as a duplicate of this bug. ***

Comment 7 Andrew McDermott 2021-01-29 12:08:31 UTC

Update: currently testing the PRs linked.

Comment 9 Andrew McDermott 2021-02-04 08:05:10 UTC

A note for QA testing: you'll need to use an oc binary that comes with either nightly/ci installs as the oc command has been updated to annotate services with the idle annotations.

Comment 10 Hongan Li 2021-02-04 08:47:32 UTC

Thank you for the note, Andrew. The latest oc command is required.

And the bug is verified with 4.7.0-0.nightly-2021-02-03-165316 and passed.

$ oc version 
Client Version: 4.7.0-0.nightly-2021-02-03-165316
Server Version: 4.7.0-0.nightly-2021-02-03-165316
Kubernetes Version: v1.20.0+e761892

## backend as below after idling service
  server ept:service-unsecure:http:172.30.2.40:27017 172.30.2.40:27017 cookie 5d8741b5d3530bc4a31a0ef6a9705a25 weight 256

Comment 15 errata-xmlrpc 2021-02-24 15:35:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 16 peter ducai 2021-03-11 12:04:45 UTC

*** Bug 1937667 has been marked as a duplicate of this bug. ***

Comment 17 aygarg 2021-03-15 05:18:39 UTC

Hello Team,

In one of my cases, the customer is facing the same issue on OpenShift 4.5 where the pod comes up when the SVC is curled from another pod but fails with the route. However, I tried to replicate this on OpenShift 4.7 as the bug is fixed in 4.7 but the issue is still there as per my testing. The pods are not coming up even after using SVC.

~~~
❯ oc version
Client Version: 4.6.16
Server Version: 4.7.0
Kubernetes Version: v1.20.0+bd9e442

❯ oc new-project test

❯ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/routing/list_for_caddy.json
replicationcontroller/caddy-rc created
service/service-secure created
service/service-unsecure created

❯ oc get pod
NAME             READY   STATUS    RESTARTS   AGE
caddy-rc-526jz   1/1     Running   0          11s
caddy-rc-nqsd6   1/1     Running   0          11s

❯ oc get svc
NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     ClusterIP   172.30.253.201   <none>        27443/TCP   23s
service-unsecure   ClusterIP   172.30.56.43     <none>        27017/TCP   23s

❯ oc expose svc service-unsecure
route.route.openshift.io/service-unsecure exposed

❯ oc idle service-unsecure
WARNING: idling when network policies are in place may cause connections to bypass network policy entirely
The service "test/service-unsecure" has been marked as idled
The service will unidle ReplicationController "test/caddy-rc" to 2 replicas once it receives traffic
ReplicationController "test/caddy-rc" has been idled

❯ oc get pod
No resources found in test namespace.

❯ oc get routes
NAME               HOST/PORT                                              PATH   SERVICES           PORT   TERMINATION   WILDCARD
service-unsecure   service-unsecure-test.apps.ayush.example.com                  service-unsecure   http                 None

❯ curl http://service-unsecure-test.apps.ayush.example.com -I
HTTP/1.0 503 Service Unavailable
Pragma: no-cache
Cache-Control: private, max-age=0, no-cache, no-store
Connection: close
Content-Type: text/html

❯ oc get pod
No resources found in test namespace.
~~~

I am not sure if I am doing something wrong or the issue is still there.

Comment 18 aygarg 2021-03-15 05:39:04 UTC

Hello Team,

Apologies for my previous comment.

I was facing the same issue in OpenShift 4.7 because the oc binary was of 4.6 version. After using the correct oc client to idle the services, the pods were coming up properly when route was accessed.

❯ oc version
Client Version: 4.7.1
Server Version: 4.7.0
Kubernetes Version: v1.20.0+bd9e442

I am changing the Bugzilla status back to CLOSED ERRATA now.

Note You need to log in before you can comment on or make changes to this bug.