Bug 1745509 - The pods of the customized "catalogsource" failed to run after upgrade to 4.1.13 from 4.1.11
Summary: The pods of the customized "catalogsource" failed to run after upgrade to 4.1...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Evan Cordell
QA Contact: Fan Jia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-26 09:50 UTC by Jian Zhang
Modified: 2019-08-27 14:55 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-27 14:55:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jian Zhang 2019-08-26 09:50:00 UTC
Description of problem:
After upgrade to 4.1.13 from the 4.1.11, the pods of the `catalogsource` failed to run.
mac:~ jianzhang$ oc get deployment -n openshift-marketplace
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
certified-operators        0/0     0            0           163m
cluster-logging-operator   0/0     0            0           113m
community-operators        1/1     1            1           163m
elasticsearch              0/0     0            0           113m
marketplace-operator       1/1     1            1           169m
redhat-operators           1/1     1            1           163m
simplecsc                  0/0     0            0           121m


Version-Release number of selected component (if applicable):
4.1.11 > 4.1.13

How reproducible:
Not sure

Steps to Reproduce:
1. Install OCP 4.1.11.
2. Create a csc object, like below:
$ oc create -f  https://raw.githubusercontent.com/emmajiafan/openshift-testfiles/master/v4.0/marketplace/csc/csc.yaml 
$ oc get packagemanifest -n openshift-marketplace | wc -l
     108

3. Upgrade the cluster to 4.1.13
4, Check the pakcagemanifest.
mac:~ jianzhang$ oc get packagemanifest -n openshift-marketplace | wc -l
      72
5, Check the corresponding pods of this "simplecsc".

Actual results:
Pods fialed to run:
mac:~ jianzhang$ oc get deployment -n openshift-marketplace
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
certified-operators        0/0     0            0           163m
cluster-logging-operator   0/0     0            0           113m
community-operators        1/1     1            1           163m
elasticsearch              0/0     0            0           113m
marketplace-operator       1/1     1            1           169m
redhat-operators           1/1     1            1           163m
simplecsc                  0/0     0            0           121m


Expected results:
The customized csc should work well after the cluster upgrading. 

Additional info:
Seems like the Liveness/Readiness failed to response, like below:

mac:~ jianzhang$ oc describe deployment -n openshift-marketplace simplecsc 
Name:                   simplecsc
Namespace:              openshift-marketplace
CreationTimestamp:      Mon, 26 Aug 2019 15:25:14 +0800
Labels:                 csc-owner-name=simplecsc
                        csc-owner-namespace=openshift-marketplace
Annotations:            deployment.kubernetes.io/revision: 3
Selector:               marketplace.catalogSourceConfig=simplecsc
Replicas:               0 desired | 0 updated | 0 total | 0 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:       marketplace.catalogSourceConfig=simplecsc
  Annotations:  openshift-marketplace-update-hash: 15be6c6640cecc04
  Containers:
   simplecsc:
    Image:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a2a42776fc859fb0e20330ea8201a11f024e5de1388c07195b395f2ef5a9c21d
    Port:       50051/TCP
    Host Port:  0/TCP
    Command:
      appregistry-server
      -r
      https://quay.io/cnr|community-operators
      -o
      etcd
    Liveness:     exec [grpc_health_probe -addr=localhost:50051] delay=5s timeout=1s period=10s #success=1 #failure=30
    Readiness:    exec [grpc_health_probe -addr=localhost:50051] delay=5s timeout=1s period=10s #success=1 #failure=30
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   simplecsc-6d4d4dd667 (0/0 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  122m  deployment-controller  Scaled up replica set simplecsc-5d6d7f6987 to 1
  Normal  ScalingReplicaSet  67m   deployment-controller  Scaled down replica set simplecsc-5d6d7f6987 to 0

The CSC object report "success".
mac:~ jianzhang$ oc get csc -n openshift-marketplace simplecsc -o yaml
apiVersion: operators.coreos.com/v1
kind: CatalogSourceConfig
metadata:
  creationTimestamp: "2019-08-26T07:25:15Z"
  finalizers:
  - finalizer.catalogsourceconfigs.operators.coreos.com
  generation: 16
  name: simplecsc
  namespace: openshift-marketplace
  resourceVersion: "107851"
  selfLink: /apis/operators.coreos.com/v1/namespaces/openshift-marketplace/catalogsourceconfigs/simplecsc
  uid: a6132e8e-c7d2-11e9-b5ac-0270899799a4
spec:
  csDisplayName: Custom
  csPublisher: Custom
  packages: etcd,amq-streams
  targetNamespace: openshift-operators
status:
  currentPhase:
    lastTransitionTime: "2019-08-26T09:37:41Z"
    lastUpdateTime: "2019-08-26T09:37:41Z"
    phase:
      message: The object has been successfully reconciled
      name: Succeeded
  packageRepositioryVersions:
    amq-streams: 2.0.0
    etcd: 0.0.12

The catalogsource/svc objects still exist.
mac:~ jianzhang$ oc get catalogsource -n openshift-operators simplecsc -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  creationTimestamp: "2019-08-26T07:25:14Z"
  generation: 5
  labels:
    csc-owner-name: simplecsc
    csc-owner-namespace: openshift-marketplace
  name: simplecsc
  namespace: openshift-operators
  resourceVersion: "112271"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/catalogsources/simplecsc
  uid: a5b3ee0f-c7d2-11e9-a446-06a73435a7a2
spec:
  address: 172.30.152.184:50051
  displayName: Custom
  icon:
    base64data: ""
    mediatype: ""
  publisher: Custom
  sourceType: grpc
status:
  lastSync: "2019-08-26T09:47:11Z"
  registryService:
    createdAt: "2019-08-26T09:47:01Z"
    protocol: grpc

mac:~ jianzhang$ oc get svc -n openshift-marketplace 
NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
certified-operators        ClusterIP   172.30.17.139    <none>        50051/TCP   12m
cluster-logging-operator   ClusterIP   172.30.225.115   <none>        50051/TCP   12m
community-operators        ClusterIP   172.30.135.147   <none>        50051/TCP   12m
elasticsearch              ClusterIP   172.30.20.243    <none>        50051/TCP   12m
redhat-operators           ClusterIP   172.30.231.114   <none>        50051/TCP   12m
simplecsc                  ClusterIP   172.30.152.184   <none>        50051/TCP   10m

Comment 3 Anping Li 2019-08-27 00:16:20 UTC
The bug couldn't be reproduced in the other two upgraded clusters.  The bug is reported on a cluster shared by whole QE team.


Note You need to log in before you can comment on or make changes to this bug.