Bug 1900991 - accessing the route cannot wake up the idled resources [NEEDINFO]
Summary: accessing the route cannot wake up the idled resources
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.z
Assignee: Andrew McDermott
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On: 1900989
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-24 07:59 UTC by Hongan Li
Modified: 2022-12-19 05:27 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1900989
Environment:
Last Closed: 2021-04-16 16:13:14 UTC
Target Upstream Version:
amcdermo: needinfo? (pducai)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift oc pull 727 0 None closed [release-4.6] Bug 1900991: idle: also annotate the service 2021-02-11 13:43:02 UTC
Github openshift openshift-apiserver pull 183 0 None closed [release-4.6] Bug 1900991: policy/unidling-controller: allow get/update on services 2021-02-11 13:43:02 UTC
Github openshift openshift-controller-manager pull 167 0 None closed [release-4.6] Bug 1900991: unidling: switch away from endpoints to the service 2021-02-11 13:43:02 UTC
Github openshift origin pull 25855 0 None closed [release-4.6] Bug 1900991: idle.sh: Drop endpoints mocks from idle tests 2021-02-11 13:43:02 UTC
Github openshift origin pull 25858 0 None closed [release-4.6] Bug 1900991: test/extended/router: add idle/unidle e2e test 2021-02-11 13:43:02 UTC
Github openshift router pull 253 0 None closed [release-4.6] Bug 1900991: Move idle check from endpoints to service 2021-02-11 13:43:03 UTC
Red Hat Product Errata RHBA-2021:0510 0 None None None 2021-02-22 13:54:49 UTC

Comment 2 Andrew McDermott 2020-12-04 17:06:26 UTC
Tagging with UpcomingSprint while investigation is either ongoing or
pending. Will be considered for earlier release versions when
diagnosed and resolved.

Comment 4 Andrew McDermott 2021-02-04 08:30:36 UTC
A note for QA testing: you'll need to use an oc binary that comes with the installer as the oc command has been updated to annotate services with the idle annotations.

Comment 6 Arvind iyengar 2021-02-12 06:56:21 UTC
Verified in "4.6.0-0.nightly-2021-02-11-040306" with the same version of oc client. With this payload, it is observed that the idled services wake up and the backend becomes active when the traffic is sent for the respective route: 
-----
$ oc version
Client Version: 4.6.0-0.nightly-2021-02-11-040306
Server Version: 4.6.0-0.nightly-2021-02-11-040306
Kubernetes Version: v1.19.0+6e846d7

$ oc expose svc service-unsecure
route.route.openshift.io/service-unsecure exposed


$ oc get all                    
NAME                 READY   STATUS    RESTARTS   AGE
pod/caddy-rc-6cmm4   1/1     Running   0          47s
pod/caddy-rc-8lfl4   1/1     Running   0          47s

NAME                             DESIRED   CURRENT   READY   AGE
replicationcontroller/caddy-rc   2         2         2       47s

NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service/service-secure     ClusterIP   172.30.130.117   <none>        27443/TCP   47s
service/service-unsecure   ClusterIP   172.30.45.156    <none>        27017/TCP   46s

NAME                                        HOST/PORT                                                                           PATH   SERVICES           PORT   TERMINATION   WILDCARD
route.route.openshift.io/service-unsecure   service-unsecure-test1.apps.ci-ln-3nr7xwk-d5d6b.origin-ci-int-aws.dev.rhcloud.com          service-unsecure   http                 None


$ curl service-unsecure-test1.apps.ci-ln-3nr7xwk-d5d6b.origin-ci-int-aws.dev.rhcloud.com -I
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 28
Content-Type: text/html; charset=utf-8
Last-Modified: Tue, 27 Feb 2018 02:43:29 GMT
Server: Caddy
Date: Fri, 12 Feb 2021 06:48:33 GMT
Set-Cookie: e96c07fa08f2609cadf847f019750244=b71de34503fbaacdd109926c9e8c5af9; path=/; HttpOnly
Cache-control: private

$ oc  idle service-unsecure                                                  
WARNING: idling when network policies are in place may cause connections to bypass network policy entirely
The service "test1/service-unsecure" has been marked as idled 
The service will unidle ReplicationController "test1/caddy-rc" to 2 replicas once it receives traffic 
ReplicationController "test1/caddy-rc" has been idled 

$ oc get all
NAME                             DESIRED   CURRENT   READY   AGE
replicationcontroller/caddy-rc   0         0         0       5m50s

NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service/service-secure     ClusterIP   172.30.130.117   <none>        27443/TCP   5m50s
service/service-unsecure   ClusterIP   172.30.45.156    <none>        27017/TCP   5m49s

NAME                                        HOST/PORT                                                                           PATH   SERVICES           PORT   TERMINATION   WILDCARD
route.route.openshift.io/service-unsecure   service-unsecure-test1.apps.ci-ln-3nr7xwk-d5d6b.origin-ci-int-aws.dev.rhcloud.com          service-unsecure   http                 None


$ curl service-unsecure-test1.apps.ci-ln-3nr7xwk-d5d6b.origin-ci-int-aws.dev.rhcloud.com -I
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 28
Content-Type: text/html; charset=utf-8
Last-Modified: Tue, 27 Feb 2018 02:43:29 GMT
Server: Caddy
Date: Fri, 12 Feb 2021 06:49:26 GMT
Set-Cookie: e96c07fa08f2609cadf847f019750244=a5ce95ebd48d1fc747c37c00fe549a6b; path=/; HttpOnly
Cache-control: private

$ oc get all
NAME                 READY   STATUS    RESTARTS   AGE
pod/caddy-rc-cdqnl   1/1     Running   0          12s
pod/caddy-rc-gg85h   1/1     Running   0          12s

NAME                             DESIRED   CURRENT   READY   AGE
replicationcontroller/caddy-rc   2         2         2       3m5s

NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service/service-secure     ClusterIP   172.30.130.117   <none>        27443/TCP   3m5s
service/service-unsecure   ClusterIP   172.30.45.156    <none>        27017/TCP   3m4s

NAME                                        HOST/PORT                                                                           PATH   SERVICES           PORT   TERMINATION   WILDCARD
route.route.openshift.io/service-unsecure   service-unsecure-test1.apps.ci-ln-3nr7xwk-d5d6b.origin-ci-int-aws.dev.rhcloud.com          service-unsecure   http                 None
-----

Comment 9 errata-xmlrpc 2021-02-22 13:54:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.18 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0510

Comment 10 peter ducai 2021-03-25 09:01:56 UTC
Customer upgraded to 4.6.18 (using 4.6.20 client) but the issue still persists on some routes.

Here's a route that worked with the idle feature:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    openshift.io/host.generated: "true"
  name: nginx
  namespace: jpecora-test
spec:
  host: nginx-test.apps.osesbx.mtb.com
  path: /test
  port:
    targetPort: 8080
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: edge
  to:
    kind: Service
    name: nginx
    weight: 100
  wildcardPolicy: None


And one that didnt:

oc neat get -- route login-ui -n development
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    meta.helm.sh/release-name: login-ui
    meta.helm.sh/release-namespace: development
  labels:
    app.kubernetes.io/managed-by: Helm
  name: login-ui
  namespace: development
spec:
  host: digitalbanking-sbx.apps.osesbx.mtb.com
  path: /login
  port:
    targetPort: 8080
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: edge
  to:
    kind: Service
    name: login-ui
    weight: 100
  wildcardPolicy: None

FYI: adding label openshift.io/host.generated: "true" didn't change anything.

Comment 11 Andrew McDermott 2021-03-25 10:59:13 UTC
(In reply to peter ducai from comment #10)
> Customer upgraded to 4.6.18 (using 4.6.20 client) but the issue still
> persists on some routes.
> 
> Here's a route that worked with the idle feature:
> 
> apiVersion: route.openshift.io/v1
> kind: Route
> metadata:
>   annotations:
>     openshift.io/host.generated: "true"
>   name: nginx
>   namespace: jpecora-test
> spec:
>   host: nginx-test.apps.osesbx.mtb.com
>   path: /test
>   port:
>     targetPort: 8080
>   tls:
>     insecureEdgeTerminationPolicy: Redirect
>     termination: edge
>   to:
>     kind: Service
>     name: nginx
>     weight: 100
>   wildcardPolicy: None
> 
> 
> And one that didnt:
> 
> oc neat get -- route login-ui -n development
> apiVersion: route.openshift.io/v1
> kind: Route
> metadata:
>   annotations:
>     meta.helm.sh/release-name: login-ui
>     meta.helm.sh/release-namespace: development
>   labels:
>     app.kubernetes.io/managed-by: Helm
>   name: login-ui
>   namespace: development
> spec:
>   host: digitalbanking-sbx.apps.osesbx.mtb.com
>   path: /login
>   port:
>     targetPort: 8080
>   tls:
>     insecureEdgeTerminationPolicy: Redirect
>     termination: edge
>   to:
>     kind: Service
>     name: login-ui
>     weight: 100
>   wildcardPolicy: None
> 
> FYI: adding label openshift.io/host.generated: "true" didn't change anything.

Are we saying that both routes/services were idled and only one was automatically unidled when it received traffic?

There was another fix which would ensure that on an upgrade existing idled services would get the annotation added when the cluster-ingress-operator was upgraded. That fix was first available in 4.6.22.

BZ#1927364 - oc idle: Clusters upgrading with an idled workload do not have annotations on the workload's service

Comment 12 Andrew McDermott 2021-04-16 16:13:14 UTC
If you still see this issue in 4.6.22 then please create a new bug.


Note You need to log in before you can comment on or make changes to this bug.