seems like this is oc version issue. using one old oc version (4.6.0-202010150713.p0-074039a) can reproduce this bug. However using another oc version(4.8.0-0.nightly-2021-04-23-131610), this works well.
*** Bug 1954894 has been marked as a duplicate of this bug. ***
we used oc 4.7.5, same issue. [root@e53a932d9fff 503]# oc delete project idle1 project.project.openshift.io "idle1" deleted [root@e53a932d9fff 503]# oc new-project idle1 Now using project "idle1" on server "xxxxxx". You can add applications to this project with the 'new-app' command. For example, try: oc new-app rails-postgresql-example to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application: kubectl create deployment hello-node --image=k8s.gcr.io/serve_hostname [root@e53a932d9fff 503]# oc new-app httpd --> Found image 9efdfdd (5 months old) in image stream "openshift/httpd" under tag "2.4-el8" for "httpd" Apache httpd 2.4 ---------------- Apache httpd 2.4 available as container, is a powerful, efficient, and extensible web server. Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes. Virtual hosting allows one Apache installation to serve many different Web sites. Tags: builder, httpd, httpd-24 --> Creating resources ... imagestreamtag.image.openshift.io "httpd:2.4-el8" created deployment.apps "httpd" created service "httpd" created --> Success Application is not exposed. You can expose services to the outside world by executing one or more of the commands below: 'oc expose service/httpd' Run 'oc status' to view your app. [root@e53a932d9fff 503]# oc get pods NAME READY STATUS RESTARTS AGE httpd-779cdbc4c5-kjh5b 1/1 Running 0 6s [root@e53a932d9fff 503]# oc scale deployment httpd --replicas=3 deployment.apps/httpd scaled [root@e53a932d9fff 503]# oc get pods NAME READY STATUS RESTARTS AGE httpd-779cdbc4c5-kjh5b 1/1 Running 0 17s httpd-779cdbc4c5-kzjck 1/1 Running 0 5s httpd-779cdbc4c5-t5pks 1/1 Running 0 5s [root@e53a932d9fff 503]# oc expose svc httpd route.route.openshift.io/httpd exposed [root@e53a932d9fff 503]# oc get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD httpd httpd-idle1.xxxxxx httpd 8080-tcp None [root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx HTTP/1.1 403 Forbidden Date: Mon, 26 Apr 2021 13:40:42 GMT Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1g Last-Modified: Mon, 15 Jun 2020 11:49:07 GMT ETag: "f91-5a81e03a232c0" Accept-Ranges: bytes Content-Length: 3985 Content-Type: text/html; charset=UTF-8 Set-Cookie: 8c9688a6a6a694b2222c08c1ad1c84ca=40f19a4d620cbcedceaebdcbc9175351; path=/; HttpOnly [root@e53a932d9fff 503]# oc idle httpd+ error: no valid scalable resources found to idle: endpoints "httpd+" not found [root@e53a932d9fff 503]# oc idle httpd The service "idle1/httpd" has been marked as idled The service will unidle Deployment "idle1/httpd" to 3 replicas once it receives traffic Deployment "idle1/httpd" has been idled [root@e53a932d9fff 503]# oc get pods No resources found in idle1 namespace. [root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx HTTP/1.0 503 Service Unavailable Pragma: no-cache Cache-Control: private, max-age=0, no-cache, no-store Connection: close Content-Type: text/html [root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx HTTP/1.0 503 Service Unavailable Pragma: no-cache Cache-Control: private, max-age=0, no-cache, no-store Connection: close Content-Type: text/html [root@e53a932d9fff 503]# oc version Client Version: 4.7.5 Server Version: 4.6.25 Kubernetes Version: v1.19.0+a5a0987 [root@e53a932d9fff 503]#
*** Bug 1956535 has been marked as a duplicate of this bug. ***
(In reply to chenchenchen from comment #3) > we used oc 4.7.5, same issue. > > [root@e53a932d9fff 503]# oc delete project idle1 > project.project.openshift.io "idle1" deleted > [root@e53a932d9fff 503]# oc new-project idle1 > Now using project "idle1" on server "xxxxxx". > > You can add applications to this project with the 'new-app' command. For > example, try: > > oc new-app rails-postgresql-example > > to build a new example application in Ruby. Or use kubectl to deploy a > simple Kubernetes application: > > kubectl create deployment hello-node --image=k8s.gcr.io/serve_hostname > > [root@e53a932d9fff 503]# oc new-app httpd > --> Found image 9efdfdd (5 months old) in image stream "openshift/httpd" > under tag "2.4-el8" for "httpd" > > Apache httpd 2.4 > ---------------- > Apache httpd 2.4 available as container, is a powerful, efficient, and > extensible web server. Apache supports a variety of features, many > implemented as compiled modules which extend the core functionality. These > can range from server-side programming language support to authentication > schemes. Virtual hosting allows one Apache installation to serve many > different Web sites. > > Tags: builder, httpd, httpd-24 > > > --> Creating resources ... > imagestreamtag.image.openshift.io "httpd:2.4-el8" created > deployment.apps "httpd" created > service "httpd" created > --> Success > Application is not exposed. You can expose services to the outside world > by executing one or more of the commands below: > 'oc expose service/httpd' > Run 'oc status' to view your app. > [root@e53a932d9fff 503]# oc get pods > NAME READY STATUS RESTARTS AGE > httpd-779cdbc4c5-kjh5b 1/1 Running 0 6s > [root@e53a932d9fff 503]# oc scale deployment httpd --replicas=3 > deployment.apps/httpd scaled > [root@e53a932d9fff 503]# oc get pods > NAME READY STATUS RESTARTS AGE > httpd-779cdbc4c5-kjh5b 1/1 Running 0 17s > httpd-779cdbc4c5-kzjck 1/1 Running 0 5s > httpd-779cdbc4c5-t5pks 1/1 Running 0 5s > [root@e53a932d9fff 503]# oc expose svc httpd > route.route.openshift.io/httpd exposed > [root@e53a932d9fff 503]# oc get route > NAME HOST/PORT PATH SERVICES > PORT TERMINATION WILDCARD > httpd httpd-idle1.xxxxxx httpd 8080-tcp None > [root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx > HTTP/1.1 403 Forbidden > Date: Mon, 26 Apr 2021 13:40:42 GMT > Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1g > Last-Modified: Mon, 15 Jun 2020 11:49:07 GMT > ETag: "f91-5a81e03a232c0" > Accept-Ranges: bytes > Content-Length: 3985 > Content-Type: text/html; charset=UTF-8 > Set-Cookie: > 8c9688a6a6a694b2222c08c1ad1c84ca=40f19a4d620cbcedceaebdcbc9175351; path=/; > HttpOnly > > [root@e53a932d9fff 503]# oc idle httpd+ > error: no valid scalable resources found to idle: endpoints "httpd+" not > found > [root@e53a932d9fff 503]# oc idle httpd > The service "idle1/httpd" has been marked as idled > The service will unidle Deployment "idle1/httpd" to 3 replicas once it > receives traffic > Deployment "idle1/httpd" has been idled > [root@e53a932d9fff 503]# oc get pods > No resources found in idle1 namespace. > [root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx > HTTP/1.0 503 Service Unavailable > Pragma: no-cache > Cache-Control: private, max-age=0, no-cache, no-store > Connection: close > Content-Type: text/html > > [root@e53a932d9fff 503]# curl -I httpd-idle1.xxxxxx > HTTP/1.0 503 Service Unavailable > Pragma: no-cache > Cache-Control: private, max-age=0, no-cache, no-store > Connection: close > Content-Type: text/html > > [root@e53a932d9fff 503]# oc version > Client Version: 4.7.5 > Server Version: 4.6.25 > Kubernetes Version: v1.19.0+a5a0987 > [root@e53a932d9fff 503]# Hi, this issue cannot be reproduced with same version with yours $ ./oc version Client Version: 4.7.5 Server Version: 4.6.25 Kubernetes Version: v1.19.0+a5a0987 --> Creating resources ... imagestreamtag.image.openshift.io "httpd:2.4-el8" created deployment.apps "httpd" created service "httpd" created --> Success Application is not exposed. You can expose services to the outside world by executing one or more of the commands below: 'oc expose service/httpd' Run 'oc status' to view your app. $ ./oc expose svc httpd route.route.openshift.io/httpd exposed $ ./oc get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD httpd httpd-z2.apps.zzhao46033.qe.devcluster.openshift.com httpd 8080-tcp None $ curl -I httpd-z2.apps.zzhao46033.qe.devcluster.openshift.com HTTP/1.1 403 Forbidden Date: Thu, 06 May 2021 07:20:50 GMT Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1g Last-Modified: Mon, 15 Jun 2020 11:49:07 GMT ETag: "f91-5a81e03a232c0" Accept-Ranges: bytes Content-Length: 3985 Content-Type: text/html; charset=UTF-8 Set-Cookie: d6e8a8764030a606aea63916ad583785=406bd2fdfd0efbfa3850fd84e5460c0a; path=/; HttpOnly $ ./oc idle httpd WARNING: idling when network policies are in place may cause connections to bypass network policy entirely The service "z2/httpd" has been marked as idled The service will unidle Deployment "z2/httpd" to 1 replicas once it receives traffic Deployment "z2/httpd" has been idled $ ./oc get pod NAME READY STATUS RESTARTS AGE httpd-855d57cb7-w7fdr 0/1 Terminating 0 55s $ ./oc get pod No resources found in z2 namespace. $ ./oc get pod No resources found in z2 namespace. $ curl -I httpd-z2.apps.zzhao46033.qe.devcluster.openshift.com curl: (52) Empty reply from server $ curl -I httpd-z2.apps.zzhao46033.qe.devcluster.openshift.com HTTP/1.1 403 Forbidden Date: Thu, 06 May 2021 07:21:32 GMT Server: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1g Last-Modified: Mon, 15 Jun 2020 11:49:07 GMT ETag: "f91-5a81e03a232c0" Accept-Ranges: bytes Content-Length: 3985 Content-Type: text/html; charset=UTF-8 Set-Cookie: d6e8a8764030a606aea63916ad583785=478bffffa3ca2b35ff7140fc91c89723; path=/; HttpOnly $ ./oc get pod NAME READY STATUS RESTARTS AGE httpd-855d57cb7-6tcc6 1/1 Running 0 17s @chen.gong this issue already be reproduced in your side? if so. could you help show how many service in your cluster?
this issue has been there since 4.6.17 and went all the way to 4.6.25 in our cluster , there are 102 services @zhaozhanqi
It appears that if you idle a service with an "old" oc binary (4.6.16 or earlier, or most 4.7 alpha/beta builds) in a "new" cluster (4.6.17 or later, 4.7.0-rc.1 and later, or any 4.8 nightly) then it will not unidle correctly when it receives traffic. (openshift-sdn will emit the NeedPods event but the controller will not scale it up.) @swasthan, can you confirm the versions of OCP you are using and the version of the "oc" binary that you are using to idle to pods? ("oc version" will tell you both.) If you are using an "old" oc binary, then getting an updated binary should fix the bug. If not, then please create a new deployment and service (ie, one that has not been previously idled) and: 1. idle the service 2. get the output of "oc get service NAME -o yaml" and "oc get ep NAME -o yaml" 3. try to connect to the service 4. get the output of "oc get service NAME -o yaml" and "oc get ep NAME -o yaml" again 5. get the output of "oc get events -o yaml" 6. get the output of "oc get pods -n NAMESPACE" (to confirm whether pods have been recreated for the deployment) 7. tar/zip up all the files and attach them to this bug
Ah... the must-gather attached to the case shows that this cluster is using ovn-kubernetes, not openshift-sdn
@swasthan please confirm whether the customer is using openshift-sdn or ovn-kubernetes; this bug was originally filed with "Subcomponent: openshift-sdn", and you attached iptables output that clearly came from an openshift-sdn-based cluster, but the must-gather attached to the case shows that the cluster is using ovn-kubernetes. (Additionally, I am not able to reproduce the bug on 4.6.25 using openshift-sdn using the exact sequence of commands you used.)
I cannot reproduce this bug with 4.6 and ovn-kubernetes either, using the commands from comment 14. Please reproduce the bug again, on the customer's cluster, and then get a must-gather after the service fails to unidle.
Sorry, for the delay. This appears to be fixed in 4.6.28. (The symptom here is different from the symptom that the original bug report was about, so it wasn't initially obvious that it was the same bug.) *** This bug has been marked as a duplicate of bug 1947836 ***