Bug 1900989 - accessing the route cannot wake up the idled resources
Summary: accessing the route cannot wake up the idled resources
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Andrew McDermott
QA Contact: Hongan Li
URL:
Whiteboard:
: 1920454 1937667 (view as bug list)
Depends On:
Blocks: 1900991
TreeView+ depends on / blocked
 
Reported: 2020-11-24 07:57 UTC by Hongan Li
Modified: 2021-04-06 10:03 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1900991 (view as bug list)
Environment:
Last Closed: 2021-03-15 05:39:04 UTC
Target Upstream Version:
aygarg: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 720 0 None closed Bug 1900989: idle: also annotate the service 2021-02-16 02:57:34 UTC
Github openshift openshift-apiserver pull 180 0 None closed Bug 1900989: policy/unidling-controller: allow get/update on services 2021-02-16 02:57:33 UTC
Github openshift openshift-controller-manager pull 165 0 None closed Bug 1900989: unidling: switch away from endpoints to the service 2021-02-16 02:57:34 UTC
Github openshift origin pull 25844 0 None closed Bug 1900989: idle.sh: Drop endpoints mocks from idle tests 2021-02-16 02:57:33 UTC
Github openshift origin pull 25850 0 None closed Bug 1900989: test/extended/router: add idle/unidle e2e test 2021-02-16 02:57:34 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:35:55 UTC

Description Hongan Li 2020-11-24 07:57:13 UTC
Description of problem:
accessing the route cannot wake up the idled resources

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-11-22-204912

How reproducible:
100%

Steps to Reproduce:
1. create replication controller with service
   $ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/routing/list_for_caddy.json
2. create a route
   $ oc expose svc service-unsecure
3. $ oc idle service-unsecure
4. generate traffic for the route
   $ curl service-unsecure-hongli.apps.cluster.xxx.com -I
Actual results:
cannot wake up the idled resource. curl route always return "503" 

Expected results:
should wake up the idled resource and return "200 OK"

Additional info:
No this issue in v4.5 and I can see below backend server in haproxy.config:
server ept:service-unsecure:172.30.62.129:27017 172.30.62.129:27017 cookie a4c4cde9584afc00a56110f52cae8a5e weight 256

workaround: curl the service inside the cluster can wake up the idled resource.

Comment 1 Andrew McDermott 2020-11-25 16:39:27 UTC
This was broken by the switch to endpointslices. https://github.com/openshift/router/pull/154

Investigating a fix.

Comment 2 Andrew McDermott 2020-12-04 17:05:52 UTC
Tagging with UpcomingSprint while investigation is either ongoing or
pending. Will be considered for earlier release versions when
diagnosed and resolved.

Comment 3 Andrew McDermott 2021-01-13 17:20:17 UTC
This is a regression in the product marking it as a blocker.

Comment 5 Andrew McDermott 2021-01-19 12:17:38 UTC
Going to verify and test: https://github.com/openshift/openshift-controller-manager/pull/159

Comment 6 Andrew McDermott 2021-01-28 16:35:30 UTC
*** Bug 1920454 has been marked as a duplicate of this bug. ***

Comment 7 Andrew McDermott 2021-01-29 12:08:31 UTC
Update: currently testing the PRs linked.

Comment 9 Andrew McDermott 2021-02-04 08:05:10 UTC
A note for QA testing: you'll need to use an oc binary that comes with either nightly/ci installs as the oc command has been updated to annotate services with the idle annotations.

Comment 10 Hongan Li 2021-02-04 08:47:32 UTC
Thank you for the note, Andrew. The latest oc command is required.

And the bug is verified with 4.7.0-0.nightly-2021-02-03-165316 and passed.

$ oc version 
Client Version: 4.7.0-0.nightly-2021-02-03-165316
Server Version: 4.7.0-0.nightly-2021-02-03-165316
Kubernetes Version: v1.20.0+e761892

## backend as below after idling service
  server ept:service-unsecure:http:172.30.2.40:27017 172.30.2.40:27017 cookie 5d8741b5d3530bc4a31a0ef6a9705a25 weight 256

Comment 15 errata-xmlrpc 2021-02-24 15:35:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 16 peter ducai 2021-03-11 12:04:45 UTC
*** Bug 1937667 has been marked as a duplicate of this bug. ***

Comment 17 aygarg 2021-03-15 05:18:39 UTC
Hello Team,

In one of my cases, the customer is facing the same issue on OpenShift 4.5 where the pod comes up when the SVC is curled from another pod but fails with the route. However, I tried to replicate this on OpenShift 4.7 as the bug is fixed in 4.7 but the issue is still there as per my testing. The pods are not coming up even after using SVC.

~~~
❯ oc version
Client Version: 4.6.16
Server Version: 4.7.0
Kubernetes Version: v1.20.0+bd9e442

❯ oc new-project test

❯ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/routing/list_for_caddy.json
replicationcontroller/caddy-rc created
service/service-secure created
service/service-unsecure created

❯ oc get pod
NAME             READY   STATUS    RESTARTS   AGE
caddy-rc-526jz   1/1     Running   0          11s
caddy-rc-nqsd6   1/1     Running   0          11s

❯ oc get svc
NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     ClusterIP   172.30.253.201   <none>        27443/TCP   23s
service-unsecure   ClusterIP   172.30.56.43     <none>        27017/TCP   23s

❯ oc expose svc service-unsecure
route.route.openshift.io/service-unsecure exposed

❯ oc idle service-unsecure
WARNING: idling when network policies are in place may cause connections to bypass network policy entirely
The service "test/service-unsecure" has been marked as idled
The service will unidle ReplicationController "test/caddy-rc" to 2 replicas once it receives traffic
ReplicationController "test/caddy-rc" has been idled

❯ oc get pod
No resources found in test namespace.

❯ oc get routes
NAME               HOST/PORT                                              PATH   SERVICES           PORT   TERMINATION   WILDCARD
service-unsecure   service-unsecure-test.apps.ayush.example.com                  service-unsecure   http                 None

❯ curl http://service-unsecure-test.apps.ayush.example.com -I
HTTP/1.0 503 Service Unavailable
Pragma: no-cache
Cache-Control: private, max-age=0, no-cache, no-store
Connection: close
Content-Type: text/html

❯ oc get pod
No resources found in test namespace.
~~~

I am not sure if I am doing something wrong or the issue is still there.

Comment 18 aygarg 2021-03-15 05:39:04 UTC
Hello Team,

Apologies for my previous comment.

I was facing the same issue in OpenShift 4.7 because the oc binary was of 4.6 version. After using the correct oc client to idle the services, the pods were coming up properly when route was accessed.

❯ oc version
Client Version: 4.7.1
Server Version: 4.7.0
Kubernetes Version: v1.20.0+bd9e442

I am changing the Bugzilla status back to CLOSED ERRATA now.


Note You need to log in before you can comment on or make changes to this bug.