Bug 1953705 - Idled service is not wakened up
Summary: Idled service is not wakened up
Keywords:
Status: CLOSED DUPLICATE of bug 1947836
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Dan Winship
QA Contact: Anurag saxena
URL:
Whiteboard:
: 1954894 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-26 17:10 UTC by Swadeep Asthana
Modified: 2021-06-08 14:21 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-08 14:21:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 26155 0 None open Bug 1953705: rewrite idling tests to not be [Local] 2021-05-17 13:28:41 UTC

Comment 1 zhaozhanqi 2021-04-29 04:34:22 UTC
seems like this is oc version issue. using one old oc version (4.6.0-202010150713.p0-074039a) can reproduce this bug.  However using another oc version(4.8.0-0.nightly-2021-04-23-131610), this works well.

Comment 2 zhaozhanqi 2021-04-29 04:36:01 UTC
*** Bug 1954894 has been marked as a duplicate of this bug. ***

Comment 3 chenchenchen 2021-05-03 14:15:00 UTC
we used oc 4.7.5, same issue.

[root@e53a932d9fff 503]# oc delete project idle1
project.project.openshift.io "idle1" deleted
[root@e53a932d9fff 503]# oc new-project idle1
Now using project "idle1" on server "xxxxxx".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app rails-postgresql-example

to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:

    kubectl create deployment hello-node --image=k8s.gcr.io/serve_hostname

[root@e53a932d9fff 503]# oc new-app httpd
--> Found image 9efdfdd (5 months old) in image stream "openshift/httpd" under tag "2.4-el8" for "httpd"

    Apache httpd 2.4
    ----------------
    Apache httpd 2.4 available as container, is a powerful, efficient, and extensible web server. Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes. Virtual hosting allows one Apache installation to serve many different Web sites.

    Tags: builder, httpd, httpd-24


--> Creating resources ...
    imagestreamtag.image.openshift.io "httpd:2.4-el8" created
    deployment.apps "httpd" created
    service "httpd" created
--> Success
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose service/httpd'
    Run 'oc status' to view your app.
[root@e53a932d9fff 503]# oc get pods
NAME                     READY   STATUS    RESTARTS   AGE
httpd-779cdbc4c5-kjh5b   1/1     Running   0          6s
[root@e53a932d9fff 503]# oc scale deployment httpd --replicas=3
deployment.apps/httpd scaled
[root@e53a932d9fff 503]# oc get pods
NAME                     READY   STATUS    RESTARTS   AGE
httpd-779cdbc4c5-kjh5b   1/1     Running   0          17s
httpd-779cdbc4c5-kzjck   1/1     Running   0          5s
httpd-779cdbc4c5-t5pks   1/1     Running   0          5s
[root@e53a932d9fff 503]# oc expose svc httpd
route.route.openshift.io/httpd exposed
[root@e53a932d9fff 503]# oc get route
NAME    HOST/PORT                                          PATH   SERVICES   PORT       TERMINATION   WILDCARD
httpd   httpd-idle1.xxxxxx          httpd      8080-tcp                 None
[root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx
HTTP/1.1 403 Forbidden
Date: Mon, 26 Apr 2021 13:40:42 GMT
Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1g
Last-Modified: Mon, 15 Jun 2020 11:49:07 GMT
ETag: "f91-5a81e03a232c0"
Accept-Ranges: bytes
Content-Length: 3985
Content-Type: text/html; charset=UTF-8
Set-Cookie: 8c9688a6a6a694b2222c08c1ad1c84ca=40f19a4d620cbcedceaebdcbc9175351; path=/; HttpOnly

[root@e53a932d9fff 503]# oc idle httpd+
error: no valid scalable resources found to idle: endpoints "httpd+" not found
[root@e53a932d9fff 503]# oc idle httpd
The service "idle1/httpd" has been marked as idled
The service will unidle Deployment "idle1/httpd" to 3 replicas once it receives traffic
Deployment "idle1/httpd" has been idled
[root@e53a932d9fff 503]# oc get pods
No resources found in idle1 namespace.
[root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx
HTTP/1.0 503 Service Unavailable
Pragma: no-cache
Cache-Control: private, max-age=0, no-cache, no-store
Connection: close
Content-Type: text/html

[root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx
HTTP/1.0 503 Service Unavailable
Pragma: no-cache
Cache-Control: private, max-age=0, no-cache, no-store
Connection: close
Content-Type: text/html

[root@e53a932d9fff 503]# oc version
Client Version: 4.7.5
Server Version: 4.6.25
Kubernetes Version: v1.19.0+a5a0987
[root@e53a932d9fff 503]#

Comment 4 Alexander Constantinescu 2021-05-05 11:20:56 UTC
*** Bug 1956535 has been marked as a duplicate of this bug. ***

Comment 9 zhaozhanqi 2021-05-06 07:30:22 UTC
(In reply to chenchenchen from comment #3)
> we used oc 4.7.5, same issue.
> 
> [root@e53a932d9fff 503]# oc delete project idle1
> project.project.openshift.io "idle1" deleted
> [root@e53a932d9fff 503]# oc new-project idle1
> Now using project "idle1" on server "xxxxxx".
> 
> You can add applications to this project with the 'new-app' command. For
> example, try:
> 
>     oc new-app rails-postgresql-example
> 
> to build a new example application in Ruby. Or use kubectl to deploy a
> simple Kubernetes application:
> 
>     kubectl create deployment hello-node --image=k8s.gcr.io/serve_hostname
> 
> [root@e53a932d9fff 503]# oc new-app httpd
> --> Found image 9efdfdd (5 months old) in image stream "openshift/httpd"
> under tag "2.4-el8" for "httpd"
> 
>     Apache httpd 2.4
>     ----------------
>     Apache httpd 2.4 available as container, is a powerful, efficient, and
> extensible web server. Apache supports a variety of features, many
> implemented as compiled modules which extend the core functionality. These
> can range from server-side programming language support to authentication
> schemes. Virtual hosting allows one Apache installation to serve many
> different Web sites.
> 
>     Tags: builder, httpd, httpd-24
> 
> 
> --> Creating resources ...
>     imagestreamtag.image.openshift.io "httpd:2.4-el8" created
>     deployment.apps "httpd" created
>     service "httpd" created
> --> Success
>     Application is not exposed. You can expose services to the outside world
> by executing one or more of the commands below:
>      'oc expose service/httpd'
>     Run 'oc status' to view your app.
> [root@e53a932d9fff 503]# oc get pods
> NAME                     READY   STATUS    RESTARTS   AGE
> httpd-779cdbc4c5-kjh5b   1/1     Running   0          6s
> [root@e53a932d9fff 503]# oc scale deployment httpd --replicas=3
> deployment.apps/httpd scaled
> [root@e53a932d9fff 503]# oc get pods
> NAME                     READY   STATUS    RESTARTS   AGE
> httpd-779cdbc4c5-kjh5b   1/1     Running   0          17s
> httpd-779cdbc4c5-kzjck   1/1     Running   0          5s
> httpd-779cdbc4c5-t5pks   1/1     Running   0          5s
> [root@e53a932d9fff 503]# oc expose svc httpd
> route.route.openshift.io/httpd exposed
> [root@e53a932d9fff 503]# oc get route
> NAME    HOST/PORT                                          PATH   SERVICES  
> PORT       TERMINATION   WILDCARD
> httpd   httpd-idle1.xxxxxx          httpd      8080-tcp                 None
> [root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx
> HTTP/1.1 403 Forbidden
> Date: Mon, 26 Apr 2021 13:40:42 GMT
> Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1g
> Last-Modified: Mon, 15 Jun 2020 11:49:07 GMT
> ETag: "f91-5a81e03a232c0"
> Accept-Ranges: bytes
> Content-Length: 3985
> Content-Type: text/html; charset=UTF-8
> Set-Cookie:
> 8c9688a6a6a694b2222c08c1ad1c84ca=40f19a4d620cbcedceaebdcbc9175351; path=/;
> HttpOnly
> 
> [root@e53a932d9fff 503]# oc idle httpd+
> error: no valid scalable resources found to idle: endpoints "httpd+" not
> found
> [root@e53a932d9fff 503]# oc idle httpd
> The service "idle1/httpd" has been marked as idled
> The service will unidle Deployment "idle1/httpd" to 3 replicas once it
> receives traffic
> Deployment "idle1/httpd" has been idled
> [root@e53a932d9fff 503]# oc get pods
> No resources found in idle1 namespace.
> [root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx
> HTTP/1.0 503 Service Unavailable
> Pragma: no-cache
> Cache-Control: private, max-age=0, no-cache, no-store
> Connection: close
> Content-Type: text/html
> 
> [root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx
> HTTP/1.0 503 Service Unavailable
> Pragma: no-cache
> Cache-Control: private, max-age=0, no-cache, no-store
> Connection: close
> Content-Type: text/html
> 
> [root@e53a932d9fff 503]# oc version
> Client Version: 4.7.5
> Server Version: 4.6.25
> Kubernetes Version: v1.19.0+a5a0987
> [root@e53a932d9fff 503]#


Hi, this issue cannot be reproduced with same version with yours

$ ./oc version
Client Version: 4.7.5
Server Version: 4.6.25
Kubernetes Version: v1.19.0+a5a0987

--> Creating resources ...
    imagestreamtag.image.openshift.io "httpd:2.4-el8" created
    deployment.apps "httpd" created
    service "httpd" created
--> Success
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose service/httpd' 
    Run 'oc status' to view your app.
$ ./oc expose svc httpd
route.route.openshift.io/httpd exposed
$ ./oc get route
NAME    HOST/PORT                                              PATH   SERVICES   PORT       TERMINATION   WILDCARD
httpd   httpd-z2.apps.zzhao46033.qe.devcluster.openshift.com          httpd      8080-tcp                 None
$ curl -I httpd-z2.apps.zzhao46033.qe.devcluster.openshift.com
HTTP/1.1 403 Forbidden
Date: Thu, 06 May 2021 07:20:50 GMT
Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1g
Last-Modified: Mon, 15 Jun 2020 11:49:07 GMT
ETag: "f91-5a81e03a232c0"
Accept-Ranges: bytes
Content-Length: 3985
Content-Type: text/html; charset=UTF-8
Set-Cookie: d6e8a8764030a606aea63916ad583785=406bd2fdfd0efbfa3850fd84e5460c0a; path=/; HttpOnly

$ ./oc idle httpd
WARNING: idling when network policies are in place may cause connections to bypass network policy entirely
The service "z2/httpd" has been marked as idled 
The service will unidle Deployment "z2/httpd" to 1 replicas once it receives traffic 
Deployment "z2/httpd" has been idled 
$ ./oc get pod
NAME                    READY   STATUS        RESTARTS   AGE
httpd-855d57cb7-w7fdr   0/1     Terminating   0          55s
$ ./oc get pod
No resources found in z2 namespace.
$ ./oc get pod
No resources found in z2 namespace.
$ curl -I httpd-z2.apps.zzhao46033.qe.devcluster.openshift.com
curl: (52) Empty reply from server
$ curl -I httpd-z2.apps.zzhao46033.qe.devcluster.openshift.com
HTTP/1.1 403 Forbidden
Date: Thu, 06 May 2021 07:21:32 GMT
Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1g
Last-Modified: Mon, 15 Jun 2020 11:49:07 GMT
ETag: "f91-5a81e03a232c0"
Accept-Ranges: bytes
Content-Length: 3985
Content-Type: text/html; charset=UTF-8
Set-Cookie: d6e8a8764030a606aea63916ad583785=478bffffa3ca2b35ff7140fc91c89723; path=/; HttpOnly

$ ./oc get pod
NAME                    READY   STATUS    RESTARTS   AGE
httpd-855d57cb7-6tcc6   1/1     Running   0          17s


@chen.gong this issue already be reproduced in your side? if so. could you help show how many service in your cluster?

Comment 11 chenchenchen 2021-05-06 13:39:19 UTC
this issue has been there since 4.6.17 and went all the way to 4.6.25
in our cluster , there are 102 services
@zhaozhanqi

Comment 12 Dan Winship 2021-05-11 12:23:30 UTC
It appears that if you idle a service with an "old" oc binary (4.6.16 or earlier, or most 4.7 alpha/beta builds) in a "new" cluster (4.6.17 or later, 4.7.0-rc.1 and later, or any 4.8 nightly) then it will not unidle correctly when it receives traffic. (openshift-sdn will emit the NeedPods event but the controller will not scale it up.)

@swasthan, can you confirm the versions of OCP you are using and the version of the "oc" binary that you are using to idle to pods? ("oc version" will tell you both.)

If you are using an "old" oc binary, then getting an updated binary should fix the bug.

If not, then please create a new deployment and service (ie, one that has not been previously idled) and:

1. idle the service
2. get the output of "oc get service NAME -o yaml" and "oc get ep NAME -o yaml"
3. try to connect to the service
4. get the output of "oc get service NAME -o yaml" and "oc get ep NAME -o yaml" again
5. get the output of "oc get events -o yaml"
6. get the output of "oc get pods -n NAMESPACE" (to confirm whether pods have been recreated for the deployment)
7. tar/zip up all the files and attach them to this bug

Comment 15 Dan Winship 2021-05-12 19:34:23 UTC
Ah... the must-gather attached to the case shows that this cluster is using ovn-kubernetes, not openshift-sdn

Comment 16 Dan Winship 2021-05-12 19:38:20 UTC
@swasthan please confirm whether the customer is using openshift-sdn or ovn-kubernetes; this bug was originally filed with "Subcomponent: openshift-sdn", and you attached iptables output that clearly came from an openshift-sdn-based cluster, but the must-gather attached to the case shows that the cluster is using ovn-kubernetes.

(Additionally, I am not able to reproduce the bug on 4.6.25 using openshift-sdn using the exact sequence of commands you used.)

Comment 18 Dan Winship 2021-05-19 12:51:37 UTC
I cannot reproduce this bug with 4.6 and ovn-kubernetes either, using the commands from comment 14.

Please reproduce the bug again, on the customer's cluster, and then get a must-gather after the service fails to unidle.

Comment 20 Dan Winship 2021-06-08 14:21:00 UTC
Sorry, for the delay. This appears to be fixed in 4.6.28. (The symptom here is different from the symptom that the original bug report was about, so it wasn't initially obvious that it was the same bug.)

*** This bug has been marked as a duplicate of bug 1947836 ***


Note You need to log in before you can comment on or make changes to this bug.