Description of problem: After upgrade to 4.1.13 from the 4.1.11, the pods of the `catalogsource` failed to run. mac:~ jianzhang$ oc get deployment -n openshift-marketplace NAME READY UP-TO-DATE AVAILABLE AGE certified-operators 0/0 0 0 163m cluster-logging-operator 0/0 0 0 113m community-operators 1/1 1 1 163m elasticsearch 0/0 0 0 113m marketplace-operator 1/1 1 1 169m redhat-operators 1/1 1 1 163m simplecsc 0/0 0 0 121m Version-Release number of selected component (if applicable): 4.1.11 > 4.1.13 How reproducible: Not sure Steps to Reproduce: 1. Install OCP 4.1.11. 2. Create a csc object, like below: $ oc create -f https://raw.githubusercontent.com/emmajiafan/openshift-testfiles/master/v4.0/marketplace/csc/csc.yaml $ oc get packagemanifest -n openshift-marketplace | wc -l 108 3. Upgrade the cluster to 4.1.13 4, Check the pakcagemanifest. mac:~ jianzhang$ oc get packagemanifest -n openshift-marketplace | wc -l 72 5, Check the corresponding pods of this "simplecsc". Actual results: Pods fialed to run: mac:~ jianzhang$ oc get deployment -n openshift-marketplace NAME READY UP-TO-DATE AVAILABLE AGE certified-operators 0/0 0 0 163m cluster-logging-operator 0/0 0 0 113m community-operators 1/1 1 1 163m elasticsearch 0/0 0 0 113m marketplace-operator 1/1 1 1 169m redhat-operators 1/1 1 1 163m simplecsc 0/0 0 0 121m Expected results: The customized csc should work well after the cluster upgrading. Additional info: Seems like the Liveness/Readiness failed to response, like below: mac:~ jianzhang$ oc describe deployment -n openshift-marketplace simplecsc Name: simplecsc Namespace: openshift-marketplace CreationTimestamp: Mon, 26 Aug 2019 15:25:14 +0800 Labels: csc-owner-name=simplecsc csc-owner-namespace=openshift-marketplace Annotations: deployment.kubernetes.io/revision: 3 Selector: marketplace.catalogSourceConfig=simplecsc Replicas: 0 desired | 0 updated | 0 total | 0 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: marketplace.catalogSourceConfig=simplecsc Annotations: openshift-marketplace-update-hash: 15be6c6640cecc04 Containers: simplecsc: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a2a42776fc859fb0e20330ea8201a11f024e5de1388c07195b395f2ef5a9c21d Port: 50051/TCP Host Port: 0/TCP Command: appregistry-server -r https://quay.io/cnr|community-operators -o etcd Liveness: exec [grpc_health_probe -addr=localhost:50051] delay=5s timeout=1s period=10s #success=1 #failure=30 Readiness: exec [grpc_health_probe -addr=localhost:50051] delay=5s timeout=1s period=10s #success=1 #failure=30 Environment: <none> Mounts: <none> Volumes: <none> Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: <none> NewReplicaSet: simplecsc-6d4d4dd667 (0/0 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 122m deployment-controller Scaled up replica set simplecsc-5d6d7f6987 to 1 Normal ScalingReplicaSet 67m deployment-controller Scaled down replica set simplecsc-5d6d7f6987 to 0 The CSC object report "success". mac:~ jianzhang$ oc get csc -n openshift-marketplace simplecsc -o yaml apiVersion: operators.coreos.com/v1 kind: CatalogSourceConfig metadata: creationTimestamp: "2019-08-26T07:25:15Z" finalizers: - finalizer.catalogsourceconfigs.operators.coreos.com generation: 16 name: simplecsc namespace: openshift-marketplace resourceVersion: "107851" selfLink: /apis/operators.coreos.com/v1/namespaces/openshift-marketplace/catalogsourceconfigs/simplecsc uid: a6132e8e-c7d2-11e9-b5ac-0270899799a4 spec: csDisplayName: Custom csPublisher: Custom packages: etcd,amq-streams targetNamespace: openshift-operators status: currentPhase: lastTransitionTime: "2019-08-26T09:37:41Z" lastUpdateTime: "2019-08-26T09:37:41Z" phase: message: The object has been successfully reconciled name: Succeeded packageRepositioryVersions: amq-streams: 2.0.0 etcd: 0.0.12 The catalogsource/svc objects still exist. mac:~ jianzhang$ oc get catalogsource -n openshift-operators simplecsc -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: creationTimestamp: "2019-08-26T07:25:14Z" generation: 5 labels: csc-owner-name: simplecsc csc-owner-namespace: openshift-marketplace name: simplecsc namespace: openshift-operators resourceVersion: "112271" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operators/catalogsources/simplecsc uid: a5b3ee0f-c7d2-11e9-a446-06a73435a7a2 spec: address: 172.30.152.184:50051 displayName: Custom icon: base64data: "" mediatype: "" publisher: Custom sourceType: grpc status: lastSync: "2019-08-26T09:47:11Z" registryService: createdAt: "2019-08-26T09:47:01Z" protocol: grpc mac:~ jianzhang$ oc get svc -n openshift-marketplace NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE certified-operators ClusterIP 172.30.17.139 <none> 50051/TCP 12m cluster-logging-operator ClusterIP 172.30.225.115 <none> 50051/TCP 12m community-operators ClusterIP 172.30.135.147 <none> 50051/TCP 12m elasticsearch ClusterIP 172.30.20.243 <none> 50051/TCP 12m redhat-operators ClusterIP 172.30.231.114 <none> 50051/TCP 12m simplecsc ClusterIP 172.30.152.184 <none> 50051/TCP 10m
The bug couldn't be reproduced in the other two upgraded clusters. The bug is reported on a cluster shared by whole QE team.