Bug 1900989

Summary: accessing the route cannot wake up the idled resources
Product: OpenShift Container Platform Reporter: Hongan Li <hongli>
Component: NetworkingAssignee: Andrew McDermott <amcdermo>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: abraj, amcdermo, aos-bugs, aygarg, mjoseph, pducai, pmuller, rabdulra, rpuccini, sgreene
Version: 4.6Keywords: Reopened
Target Milestone: ---Flags: aygarg: needinfo-
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1900991 (view as bug list) Environment:
Last Closed: 2021-03-15 05:39:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1900991    

Description Hongan Li 2020-11-24 07:57:13 UTC
Description of problem:
accessing the route cannot wake up the idled resources

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2020-11-22-204912

How reproducible:
100%

Steps to Reproduce:
1. create replication controller with service
   $ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/routing/list_for_caddy.json
2. create a route
   $ oc expose svc service-unsecure
3. $ oc idle service-unsecure
4. generate traffic for the route
   $ curl service-unsecure-hongli.apps.cluster.xxx.com -I
Actual results:
cannot wake up the idled resource. curl route always return "503" 

Expected results:
should wake up the idled resource and return "200 OK"

Additional info:
No this issue in v4.5 and I can see below backend server in haproxy.config:
server ept:service-unsecure:172.30.62.129:27017 172.30.62.129:27017 cookie a4c4cde9584afc00a56110f52cae8a5e weight 256

workaround: curl the service inside the cluster can wake up the idled resource.

Comment 1 Andrew McDermott 2020-11-25 16:39:27 UTC
This was broken by the switch to endpointslices. https://github.com/openshift/router/pull/154

Investigating a fix.

Comment 2 Andrew McDermott 2020-12-04 17:05:52 UTC
Tagging with UpcomingSprint while investigation is either ongoing or
pending. Will be considered for earlier release versions when
diagnosed and resolved.

Comment 3 Andrew McDermott 2021-01-13 17:20:17 UTC
This is a regression in the product marking it as a blocker.

Comment 5 Andrew McDermott 2021-01-19 12:17:38 UTC
Going to verify and test: https://github.com/openshift/openshift-controller-manager/pull/159

Comment 6 Andrew McDermott 2021-01-28 16:35:30 UTC
*** Bug 1920454 has been marked as a duplicate of this bug. ***

Comment 7 Andrew McDermott 2021-01-29 12:08:31 UTC
Update: currently testing the PRs linked.

Comment 9 Andrew McDermott 2021-02-04 08:05:10 UTC
A note for QA testing: you'll need to use an oc binary that comes with either nightly/ci installs as the oc command has been updated to annotate services with the idle annotations.

Comment 10 Hongan Li 2021-02-04 08:47:32 UTC
Thank you for the note, Andrew. The latest oc command is required.

And the bug is verified with 4.7.0-0.nightly-2021-02-03-165316 and passed.

$ oc version 
Client Version: 4.7.0-0.nightly-2021-02-03-165316
Server Version: 4.7.0-0.nightly-2021-02-03-165316
Kubernetes Version: v1.20.0+e761892

## backend as below after idling service
  server ept:service-unsecure:http:172.30.2.40:27017 172.30.2.40:27017 cookie 5d8741b5d3530bc4a31a0ef6a9705a25 weight 256

Comment 15 errata-xmlrpc 2021-02-24 15:35:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 16 peter ducai 2021-03-11 12:04:45 UTC
*** Bug 1937667 has been marked as a duplicate of this bug. ***

Comment 17 aygarg 2021-03-15 05:18:39 UTC
Hello Team,

In one of my cases, the customer is facing the same issue on OpenShift 4.5 where the pod comes up when the SVC is curled from another pod but fails with the route. However, I tried to replicate this on OpenShift 4.7 as the bug is fixed in 4.7 but the issue is still there as per my testing. The pods are not coming up even after using SVC.

~~~
❯ oc version
Client Version: 4.6.16
Server Version: 4.7.0
Kubernetes Version: v1.20.0+bd9e442

❯ oc new-project test

❯ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/routing/list_for_caddy.json
replicationcontroller/caddy-rc created
service/service-secure created
service/service-unsecure created

❯ oc get pod
NAME             READY   STATUS    RESTARTS   AGE
caddy-rc-526jz   1/1     Running   0          11s
caddy-rc-nqsd6   1/1     Running   0          11s

❯ oc get svc
NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service-secure     ClusterIP   172.30.253.201   <none>        27443/TCP   23s
service-unsecure   ClusterIP   172.30.56.43     <none>        27017/TCP   23s

❯ oc expose svc service-unsecure
route.route.openshift.io/service-unsecure exposed

❯ oc idle service-unsecure
WARNING: idling when network policies are in place may cause connections to bypass network policy entirely
The service "test/service-unsecure" has been marked as idled
The service will unidle ReplicationController "test/caddy-rc" to 2 replicas once it receives traffic
ReplicationController "test/caddy-rc" has been idled

❯ oc get pod
No resources found in test namespace.

❯ oc get routes
NAME               HOST/PORT                                              PATH   SERVICES           PORT   TERMINATION   WILDCARD
service-unsecure   service-unsecure-test.apps.ayush.example.com                  service-unsecure   http                 None

❯ curl http://service-unsecure-test.apps.ayush.example.com -I
HTTP/1.0 503 Service Unavailable
Pragma: no-cache
Cache-Control: private, max-age=0, no-cache, no-store
Connection: close
Content-Type: text/html

❯ oc get pod
No resources found in test namespace.
~~~

I am not sure if I am doing something wrong or the issue is still there.

Comment 18 aygarg 2021-03-15 05:39:04 UTC
Hello Team,

Apologies for my previous comment.

I was facing the same issue in OpenShift 4.7 because the oc binary was of 4.6 version. After using the correct oc client to idle the services, the pods were coming up properly when route was accessed.

❯ oc version
Client Version: 4.7.1
Server Version: 4.7.0
Kubernetes Version: v1.20.0+bd9e442

I am changing the Bugzilla status back to CLOSED ERRATA now.