+++ This bug was initially created as a clone of Bug #1927080 +++ +++ This bug was initially created as a clone of Bug #1925245 +++ Description of problem: Bug 1900989 fixes `oc idle` in 4.6 and 4.7 by annotating a workload's service with the proper idle annotations, in addition to the workloads endpoints, among other things. Clusters upgrading to a cluster version with the new fixes for Bug 1900989 that have idled workloads will run into issues with unidling, since unidling the idled workload will not work without manual user intervention (the service idle annotations are needed for unidling to work going forward). Steps to Reproduce: 1. Idle a workload (ex: run `oc idle` on a service + deployment + route) 2. Upgrade the cluster to a cluster version containing the fixes for Bug 1900989 Actual results: Curling the idled route does not "wake it up". Expected results: Unidling a route after an upgrade should always work without user intervention. Additional info: --- Additional comment from sgreene on 2021-02-04 16:54:10 UTC --- Note that the fix for this bug should only be available in 4.6 and 4.7, since any clusters upgrading to 4.8 and beyond would already have the idle annotations mirrored over from 4.6.z/4.7.z (we can shave a couple seconds off of operator start time but not performing the idle annotations check in future releases). --- Additional comment from sgreene on 2021-02-04 20:47:00 UTC --- Workaround for customers upgrading with idled workloads to a version of 4.6.z/4.7.z with the new idle changes from Bug 1900989: 0) Wait for upgrade to complete 1) Remove idle annotations from idled endpoints (oc edit ...) note the idled scalable resources and their prior replica count. 2) Manually scale idled scalable resources back up to the desired number of replicas (oc scale ...) 3) Route should now be unidled.
awaiting cherry pick
Tested with the upgrade from "4.5.34" to "4.6.0-0.nightly-2021-03-13-204449" payload which the latest as of writing. With Upgrade to the said payload, the idled routes get woken up and become accessible via curl without any manual interventions: ------ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.34 True False 4m56s Cluster version is 4.5.34 oc version Client Version: 4.5.34 Server Version: 4.5.34 Kubernetes Version: v1.18.3+cdb0358 Create project resources and idle the route: oc get all curl NAME READY STATUS RESTARTS AGE pod/web-server-rc-x6g8h 1/1 Running 0 59s NAME DESIRED CURRENT READY AGE replicationcontroller/web-server-rc 1 1 1 60s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/service-secure ClusterIP 172.30.26.24 <none> 27443/TCP 60s service/service-unsecure ClusterIP 172.30.233.68 <none> 27017/TCP 60s NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/service-unsecure service-unsecure-test1.apps.aiyengar-oc4534.qe.devcluster.openshift.com service-unsecure http None oc idle service-unsecure WARNING: idling when network policies are in place may cause connections to bypass network policy entirely The service "test1/service-unsecure" has been marked as idled The service will unidle ReplicationController "test1/web-server-rc" to 1 replicas once it receives traffic ReplicationController "test1/web-server-rc" has been idled oc get all NAME DESIRED CURRENT READY AGE replicationcontroller/web-server-rc 0 0 0 116s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/service-secure ClusterIP 172.30.26.24 <none> 27443/TCP 116s service/service-unsecure ClusterIP 172.30.233.68 <none> 27017/TCP 116s NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/service-unsecure service-unsecure-test1.apps.aiyengar-oc4534.qe.devcluster.openshift.com service-unsecure http None * Triggering an upgrade results in success: oc adm upgrade --to=4.6.0-0.nightly-2021-03-13-204449 --force warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures. Updating to 4.6.0-0.nightly-2021-03-13-204449 oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.34 True True 49m Working towards 4.6.0-0.nightly-2021-03-13-204449: 29% complete .... oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-03-13-204449 True False 2m52s Cluster version is 4.6.0-0.nightly-2021-03-13-204449 * Curling the idled route yields success where the backend pods are woken up: oc get all NAME DESIRED CURRENT READY AGE replicationcontroller/web-server-rc 0 0 0 98m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/service-secure ClusterIP 172.30.26.24 <none> 27443/TCP 98m service/service-unsecure ClusterIP 172.30.233.68 <none> 27017/TCP 98m NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/service-unsecure service-unsecure-test1.apps.aiyengar-oc4534.qe.devcluster.openshift.com service-unsecure http None curl service-unsecure-test1.apps.aiyengar-oc4534.qe.devcluster.openshift.com Hello-OpenShift web-server-rc-5772w http-8080 oc get all NAME READY STATUS RESTARTS AGE pod/web-server-rc-5772w 1/1 Running 0 8s NAME DESIRED CURRENT READY AGE replicationcontroller/web-server-rc 1 1 1 99m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/service-secure ClusterIP 172.30.26.24 <none> 27443/TCP 99m service/service-unsecure ClusterIP 172.30.233.68 <none> 27017/TCP 99m NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/service-unsecure service-unsecure-test1.apps.aiyengar-oc4534.qe.devcluster.openshift.com service-unsecure http None ------
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.22 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0825