Bug 2107657 - oc idle: Curl is not able to wake the idled workload after clusters upgrade
Summary: oc idle: Curl is not able to wake the idled workload after clusters upgrade
Keywords:
Status: CLOSED DUPLICATE of bug 2041307
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: aos-network-edge-staff
QA Contact: Melvin Joseph
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-15 15:28 UTC by Melvin Joseph
Modified: 2022-08-04 21:58 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-19 16:30:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Melvin Joseph 2022-07-15 15:28:07 UTC
Description of problem:

This bug is a regression of https://bugzilla.redhat.com/show_bug.cgi?id=1927364 bug found during upgrade from 4.6.59-x86_64 - > 4.6.0-0.nightly-2022-07-13-184746

OpenShift release version:
4.6.0-0.nightly-2022-07-13-184746

Cluster Platform:


How reproducible:


Steps to Reproduce (in detail):
melvinjoseph@mjoseph-mac Downloads %  oc new-project test
Now using project "test" on server "https://api.mjoseph-459551.qe.devcluster.openshift.com:6443".

melvinjoseph@mjoseph-mac Downloads % oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/routing/list_for_caddy.json
replicationcontroller/caddy-rc created
service/service-secure created
service/service-unsecure created
melvinjoseph@mjoseph-mac Downloads % oc expose svc service-unsecure
route.route.openshift.io/service-unsecure exposed
melvinjoseph@mjoseph-mac Downloads % oc get all
curl NAME                 READY   STATUS    RESTARTS   AGE
pod/caddy-rc-k9zz7   1/1     Running   0          13s
pod/caddy-rc-wxqk5   1/1     Running   0          13s

NAME                             DESIRED   CURRENT   READY   AGE
replicationcontroller/caddy-rc   2         2         2       13s

NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
service/service-secure     ClusterIP   172.30.228.11   <none>        27443/TCP   13s
service/service-unsecure   ClusterIP   172.30.229.16   <none>        27017/TCP   13s

NAME                                        HOST/PORT                                                               PATH   SERVICES           PORT   TERMINATION   WILDCARD
route.route.openshift.io/service-unsecure   service-unsecure-test.apps.mjoseph-459551.qe.devcluster.openshift.com          service-unsecure   http                 None
melvinjoseph@mjoseph-mac Downloads % curl service-unsecure-test.apps.mjoseph-459551.qe.devcluster.openshift.com
Hello-OpenShift-1 http-8080
melvinjoseph@mjoseph-mac Downloads % oc idle service-unsecure
The service "test/service-unsecure" has been marked as idled 
The service will unidle ReplicationController "test/caddy-rc" to 2 replicas once it receives traffic 
ReplicationController "test/caddy-rc" has been idled 
melvinjoseph@mjoseph-mac Downloads % 6. Check the servcie service-unsecure
oc get svc service-unsecure  -o yaml
zsh: command not found: 6.
apiVersion: v1
kind: Service
metadata:
  annotations:
    idling.alpha.openshift.io/idled-at: "2022-07-15T12:10:03Z"
    idling.alpha.openshift.io/unidle-targets: '[{"kind":"ReplicationController","name":"caddy-rc","replicas":2}]'
  creationTimestamp: "2022-07-15T12:09:35Z"
  labels:
    name: service-unsecure
  name: service-unsecure
  namespace: test
  resourceVersion: "44813"
  selfLink: /api/v1/namespaces/test/services/service-unsecure
  uid: 574481cb-e9de-4c97-8d1c-80a2a64009b8
spec:
  clusterIP: 172.30.229.16
  ports:
  - name: http
    port: 27017
    protocol: TCP
    targetPort: 8080
  selector:
    name: caddy-pods
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
melvinjoseph@mjoseph-mac Downloads % 
melvinjoseph@mjoseph-mac Downloads % oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:3864bf2f74ce66cb596753d2ddd3cb7b8d8977e4e3e70ae2bd9660c92328378d --allow-explicit-upgrade=true --force
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Updating to release image registry.ci.openshift.org/ocp/release@sha256:3864bf2f74ce66cb596753d2ddd3cb7b8d8977e4e3e70ae2bd9660c92328378d

melvinjoseph@mjoseph-mac Downloads % oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.0-0.nightly-2022-07-13-184746   True        False         False      30m
cloud-credential                           4.6.0-0.nightly-2022-07-13-184746   True        False         False      176m
cluster-autoscaler                         4.6.0-0.nightly-2022-07-13-184746   True        False         False      172m
config-operator                            4.6.0-0.nightly-2022-07-13-184746   True        False         False      174m
console                                    4.6.0-0.nightly-2022-07-13-184746   True        False         False      50m
csi-snapshot-controller                    4.6.0-0.nightly-2022-07-13-184746   True        False         False      50m
dns                                        4.6.0-0.nightly-2022-07-13-184746   True        False         False      173m
etcd                                       4.6.0-0.nightly-2022-07-13-184746   True        False         False      172m
image-registry                             4.6.0-0.nightly-2022-07-13-184746   True        False         False      61m
ingress                                    4.6.0-0.nightly-2022-07-13-184746   True        False         False      166m
insights                                   4.6.0-0.nightly-2022-07-13-184746   True        False         False      174m
kube-apiserver                             4.6.0-0.nightly-2022-07-13-184746   True        False         False      172m
kube-controller-manager                    4.6.0-0.nightly-2022-07-13-184746   True        False         False      172m
kube-scheduler                             4.6.0-0.nightly-2022-07-13-184746   True        False         False      171m
kube-storage-version-migrator              4.6.0-0.nightly-2022-07-13-184746   True        False         False      61m
machine-api                                4.6.0-0.nightly-2022-07-13-184746   True        False         False      168m
machine-approver                           4.6.0-0.nightly-2022-07-13-184746   True        False         False      173m
machine-config                             4.6.0-0.nightly-2022-07-13-184746   True        False         False      30m
marketplace                                4.6.0-0.nightly-2022-07-13-184746   True        False         False      50m
monitoring                                 4.6.0-0.nightly-2022-07-13-184746   True        False         False      29m
network                                    4.6.0-0.nightly-2022-07-13-184746   True        False         False      174m
node-tuning                                4.6.0-0.nightly-2022-07-13-184746   True        False         False      94m
openshift-apiserver                        4.6.0-0.nightly-2022-07-13-184746   True        False         False      30m
openshift-controller-manager               4.6.0-0.nightly-2022-07-13-184746   True        False         False      93m
openshift-samples                          4.6.0-0.nightly-2022-07-13-184746   True        False         False      84m
operator-lifecycle-manager                 4.6.0-0.nightly-2022-07-13-184746   True        False         False      173m
operator-lifecycle-manager-catalog         4.6.0-0.nightly-2022-07-13-184746   True        False         False      173m
operator-lifecycle-manager-packageserver   4.6.0-0.nightly-2022-07-13-184746   True        False         False      46m
service-ca                                 4.6.0-0.nightly-2022-07-13-184746   True        False         False      174m
storage                                    4.6.0-0.nightly-2022-07-13-184746   True        False         False      174m
melvinjoseph@mjoseph-mac Downloads % 
melvinjoseph@mjoseph-mac Downloads % 
melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2022-07-13-184746   True        False         29m     Cluster version is 4.6.0-0.nightly-2022-07-13-184746
melvinjoseph@mjoseph-mac Downloads % oc get svc service-unsecure  -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    idling.alpha.openshift.io/idled-at: "2022-07-15T12:10:03Z"
    idling.alpha.openshift.io/unidle-targets: '[{"kind":"ReplicationController","name":"caddy-rc","replicas":2}]'
  creationTimestamp: "2022-07-15T12:09:35Z"
  labels:
    name: service-unsecure
  name: service-unsecure
  namespace: test
  resourceVersion: "44813"
  selfLink: /api/v1/namespaces/test/services/service-unsecure
  uid: 574481cb-e9de-4c97-8d1c-80a2a64009b8
spec:
  clusterIP: 172.30.229.16
  ports:
  - name: http
    port: 27017
    protocol: TCP
    targetPort: 8080
  selector:
    name: caddy-pods
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
melvinjoseph@mjoseph-mac Downloads % oc get po
No resources found in test namespace.
melvinjoseph@mjoseph-mac Downloads % oc get route
NAME               HOST/PORT                                                               PATH   SERVICES           PORT   TERMINATION   WILDCARD
service-unsecure   service-unsecure-test.apps.mjoseph-459551.qe.devcluster.openshift.com          service-unsecure   http                 None
melvinjoseph@mjoseph-mac Downloads % curl -Ik http://service-unsecure-test.apps.mjoseph-459551.qe.devcluster.openshift.com

HTTP/1.1 503 Service Unavailable
Pragma: no-cache
Cache-Control: private, max-age=0, no-cache, no-store
Content-Type: text/html
Date: Fri, 15 Jul 2022 14:19:32 GMT
X-Cache: MISS from f4a5b3556007
X-Cache-Lookup: MISS from f4a5b3556007:3128
Via: 1.1 f4a5b3556007 (squid/4.13)
Connection: keep-alive

melvinjoseph@mjoseph-mac Downloads % oc get po                                                                  
No resources found in test namespace.

elvinjoseph@mjoseph-mac Downloads % oc get all
NAME                             DESIRED   CURRENT   READY   AGE
replicationcontroller/caddy-rc   0         0         0       131m

NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
service/service-secure     ClusterIP   172.30.228.11   <none>        27443/TCP   131m
service/service-unsecure   ClusterIP   172.30.229.16   <none>        27017/TCP   131m

NAME                                        HOST/PORT                                                               PATH   SERVICES           PORT   TERMINATION   WILDCARD
route.route.openshift.io/service-unsecure   service-unsecure-test.apps.mjoseph-459551.qe.devcluster.openshift.com          service-unsecure   http                 None
melvinjoseph@mjoseph-mac Downloads % oc get route
NAME               HOST/PORT                                                               PATH   SERVICES           PORT   TERMINATION   WILDCARD
service-unsecure   service-unsecure-test.apps.mjoseph-459551.qe.devcluster.openshift.com          service-unsecure   http                 None


Actual results:


Expected results:
curl service-unsecure-test.apps.mjoseph-rout14.qe.devcluster.openshift.com   
Hello-OpenShift-1 http-8080

oc get svc service-unsecure  -o yaml                       
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2021-11-15T08:40:38Z"
  labels:
    name: service-unsecure
  name: service-unsecure
  namespace: test
  resourceVersion: "157534"
  uid: 15468a60-b2e5-4972-a7d3-a10f44e87ecf

Impact of the problem:


Additional info:



** Please do not disregard the report template; filling the template out as much as possible will allow us to help you. Please consider attaching a must-gather archive (via `oc adm must-gather`). Please review must-gather contents for sensitive information before attaching any must-gathers to a bugzilla report.  You may also mark the bug private if you wish.

Comment 1 Melvin Joseph 2022-07-15 15:31:36 UTC
profile:- upi-on-vsphere/versioned-installer-vmc7-ovn-static_network-hw14-ci

Comment 2 Miciah Dashiel Butler Masters 2022-07-18 05:21:45 UTC
This is potentially a blocker.  I'll raise this with my team to investigate it as soon as we can.  

What version of oc are you using to idle the route?  

Do the endpoints and endpointslice objects have the idling annotations set before/after upgrade?

Comment 3 Miciah Dashiel Butler Masters 2022-07-18 05:24:20 UTC
https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.6.0-0.nightly/release/4.6.0-0.nightly-2022-07-13-184746?from=4.6.59 shows only one change, in ironic-machine-os-downloader; no changes in oc, router, kube-proxy, ovn-kubernetes, or any other component that could conceivably cause this issue.

Comment 5 Melvin Joseph 2022-07-18 11:19:31 UTC
(In reply to Miciah Dashiel Butler Masters from comment #2)
> This is potentially a blocker.  I'll raise this with my team to investigate
> it as soon as we can.  
> 
> What version of oc are you using to idle the route?  
> Initially i test with 4.10 oc client, but today i tested the same with 4.6.59 oc client.
> Do the endpoints and endpointslice objects have the idling annotations set
> before/after upgrade?
The idling annotations are set before the upgrade.

Comment 6 Melvin Joseph 2022-07-18 11:20:47 UTC
@Miciah,  can be the issue linked to this https://access.redhat.com/solutions/6671241?

Comment 7 Miciah Dashiel Butler Masters 2022-07-18 14:08:19 UTC
(In reply to Melvin Joseph from comment #6)
> @Miciah,  can be the issue linked to this
> https://access.redhat.com/solutions/6671241?

This Access article does seem to describe the same issue as this BZ.  If I understand the article correctly, this is a long-standing regression in OVN-Kubernetes, not a new regression in 4.6.z.  I'll set blocker-.

Comment 8 Melvin Joseph 2022-07-19 02:02:15 UTC
Team,
I was trying to find whether the regression is hitting in all profile and want to share one finding, it seems the issue is not on all profiles of 4.6.z. 
Today i tested in IPI GCP the bug is not hitting.

melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
oc version
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.59    True        False         12m     Cluster version is 4.6.59
melvinjoseph@mjoseph-mac Downloads % oc version
Client Version: 4.6.59
Server Version: 4.6.59
Kubernetes Version: v1.19.16+8203b20

<----snip--->
melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2022-07-13-184746   True        False         23m     Cluster version is 4.6.0-0.nightly-2022-07-13-184746
melvinjoseph@mjoseph-mac Downloads % curl service-unsecure-test.apps.mjoseph-bug1.qe.gcp.devcluster.openshift.com 
Hello-OpenShift-1 http-8080
melvinjoseph@mjoseph-mac Downloads % oc get infrastructure cluster -o=jsonpath={.spec.platformSpec.type}   
GCP% 

and idling annotation is also removed.

But the bug hit twice on this `upi-on-vsphere/versioned-installer-vmc7-ovn-static_network-hw14-ci` profile.

Comment 9 Miciah Dashiel Butler Masters 2022-07-19 16:30:33 UTC
Based on <https://access.redhat.com/solutions/6671241> and bug 2041307, comment 12, idling is known not to work on OpenShift 4.7 and earlier when using OVN-Kubernetes.  Users who are affected by this issue should upgrade.

*** This bug has been marked as a duplicate of bug 2041307 ***


Note You need to log in before you can comment on or make changes to this bug.