Bug 1768820

Summary: Projects get stuck in Terminating status with "object *v1beta1.ServiceBindingList does not implement the protobuf marshalling interface and cannot be encoded to a protobuf message ..."
Product: OpenShift Container Platform Reporter: Jian Zhang <jiazha>
Component: Service CatalogAssignee: Fabian von Feilitzsch <fabian>
Status: CLOSED ERRATA QA Contact: Fan Jia <jfan>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.3.0CC: aos-bugs, bandrade, chuo, fabian, jesusr, jfan, lxia, mfojtik, piqin, tbuskey, xjiang
Target Milestone: ---Keywords: TestBlocker
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-23 11:10:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jian Zhang 2019-11-05 10:15:18 UTC
Description of problem:
Projects get stuck in Terminating status for a long time. For example:
openshift-operators-redhat                              Terminating   6h3m
...
qitang2                                                 Terminating   139m
qitang3                                                 Terminating   93m
test-operator                                           Terminating   66m

Version-Release number of selected component (if applicable):
Cluster version is 4.3.0-0.nightly-2019-11-02-092336 

How reproducible:
always

Steps to Reproduce:
1. Install the OCP 4.3
2. Create a namespace.
# oc create ns test-operator 
3. Delete it.
# oc delete ns test-operator 

Actual results:
The test-operator project got stuck in the Terminating status all the time.

Expected results:
The namespace can be deleted successfully.

Additional info:
I also related all resource in it, but it still in Terminating status.
clusterserviceversion.operators.coreos.com "elasticsearch-operator.4.3.0-201911041716" deleted
mac:~ jianzhang$ oc get csv -n test-operators
No resources found.
mac:~ jianzhang$ oc get sub -n test-operators
No resources found.
mac:~ jianzhang$ oc get catalogsource -n test-operators
No resources found.

The copied CSV file will be recreated once you delete it.
mac:~ jianzhang$ oc get csv -n test-operators
NAME                                        DISPLAY                  VERSION              REPLACES   PHASE
elasticsearch-operator.4.3.0-201911041716   Elasticsearch Operator   4.3.0-201911041716              Succeeded


Related logs:
mac:~ jianzhang$ oc logs kube-apiserver-preserve-huirwang-110-mjnxx-control-plane-0  -c kube-apiserver-10  -n openshift-kube-apiserver  |grep test-operator
...
I1105 08:59:50.053553       1 trace.go:116] Trace[52881584]: "Delete" url:/api/v1/namespaces/test-operator/podtemplates (started: 2019-11-05 08:59:48.825887763 +0000 UTC m=+4913.704412558) (total time: 1.227650927s):
I1105 08:59:56.065361       1 trace.go:116] Trace[1701676062]: "List etcd3" key:/roles/test-operator,resourceVersion:,limit:0,continue: (started: 2019-11-05 08:59:51.106018174 +0000 UTC m=+4915.984542978) (total time: 4.959294172s):
I1105 08:59:56.065489       1 trace.go:116] Trace[1633355098]: "Delete" url:/apis/rbac.authorization.k8s.io/v1/namespaces/test-operator/roles (started: 2019-11-05 08:59:51.105912766 +0000 UTC m=+4915.984437561) (total time: 4.959543554s):
I1105 09:00:03.229646       1 trace.go:116] Trace[968228445]: "List etcd3" key:/osb.openshift.io/automationbrokers/test-operator,resourceVersion:,limit:0,continue: (started: 2019-11-05 08:59:56.167930343 +0000 UTC m=+4921.046455146) (total time: 7.061667739s):
I1105 09:00:03.229962       1 trace.go:116] Trace[1304112249]: "Delete" url:/apis/osb.openshift.io/v1/namespaces/test-operator/automationbrokers (started: 2019-11-05 08:59:56.167774266 +0000 UTC m=+4921.046299050) (total time: 7.062175285s):
I1105 09:08:22.933986       1 trace.go:116] Trace[247287010]: "List etcd3" key:/operators.coreos.com/subscriptions/test-operators,resourceVersion:,limit:0,continue: (started: 2019-11-05 09:08:22.028700734 +0000 UTC m=+5426.907225535) (total time: 905.265981ms):
I1105 09:08:22.934157       1 trace.go:116] Trace[2126415962]: "List" url:/apis/operators.coreos.com/v1alpha1/namespaces/test-operators/subscriptions (started: 2019-11-05 09:08:22.028680978 +0000 UTC m=+5426.907205760) (total time: 905.46437ms):
I1105 09:08:26.054225       1 trace.go:116] Trace[604097737]: "List etcd3" key:/operators.coreos.com/subscriptions/test-operators,resourceVersion:,limit:0,continue: (started: 2019-11-05 09:08:25.028746226 +0000 UTC m=+5429.907271030) (total time: 1.025450501s):
I1105 09:08:26.054441       1 trace.go:116] Trace[975003958]: "List" url:/apis/operators.coreos.com/v1alpha1/namespaces/test-operators/subscriptions (started: 2019-11-05 09:08:25.028720338 +0000 UTC m=+5429.907245120) (total time: 1.025707508s):
I1105 09:08:26.054749       1 trace.go:116] Trace[616813751]: "List etcd3" key:/operators.coreos.com/subscriptions/test-operators,resourceVersion:,limit:0,continue: (started: 2019-11-05 09:08:24.82873049 +0000 UTC m=+5429.707255286) (total time: 1.226004633s):
I1105 09:08:26.055122       1 trace.go:116] Trace[1100523733]: "List" url:/apis/operators.coreos.com/v1alpha1/namespaces/test-operators/subscriptions (started: 2019-11-05 09:08:24.828696203 +0000 UTC m=+5429.707220989) (total time: 1.226413248s):

Comment 1 Xingxing Xia 2019-11-05 10:56:02 UTC
Take the project "qitang3" for example:
oc get project qitang3 -o yaml # shows below ContentDeletionFailed message about service catalog objects
apiVersion: project.openshift.io/v1
kind: Project
metadata:
  annotations:
    openshift.io/description: ""
    openshift.io/display-name: ""
    openshift.io/requester: system:admin
    openshift.io/sa.scc.mcs: s0:c29,c9
    openshift.io/sa.scc.supplemental-groups: 1000830000/10000
    openshift.io/sa.scc.uid-range: 1000830000/10000
  creationTimestamp: "2019-11-05T07:32:23Z"
  deletionTimestamp: "2019-11-05T07:32:36Z"
  name: qitang3
  resourceVersion: "582633"
  selfLink: /apis/project.openshift.io/v1/projects/qitang3
  uid: 0896cf23-332c-43a6-8bd5-cfedc2615e68
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2019-11-05T07:38:12Z"
    message: All resources successfully discovered
    reason: ResourcesDiscovered
    status: "False"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2019-11-05T07:32:43Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2019-11-05T07:38:12Z"
    message: 'Failed to delete all resource types, 5 remaining: object *v1beta1.ServiceBindingList
      does not implement the protobuf marshalling interface and cannot be encoded
      to a protobuf message, object *v1beta1.ServiceBrokerList does not implement
      the protobuf marshalling interface and cannot be encoded to a protobuf message,
      object *v1beta1.ServiceClassList does not implement the protobuf marshalling
      interface and cannot be encoded to a protobuf message, object *v1beta1.ServiceInstanceList
      does not implement the protobuf marshalling interface and cannot be encoded
      to a protobuf message, object *v1beta1.ServicePlanList does not implement the
      protobuf marshalling interface and cannot be encoded to a protobuf message'
    reason: ContentDeletionFailed
    status: "True"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2019-11-05T07:32:43Z"
    message: All content successfully removed
    reason: ContentRemoved
    status: "False"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2019-11-05T07:32:43Z"
    message: All content-preserving finalizers finished
    reason: ContentHasNoFinalizers
    status: "False"
    type: NamespaceFinalizersRemaining
  phase: Terminating

Comment 2 Jesus M. Rodriguez 2019-11-06 22:30:38 UTC
Can you please give me a bit more information? The original comment reproducer simply states to create a namespace then delete it. Comment #2 seems to indicate that the Service Catalog has been enabled explicitly because by default it is not enabled. How was the environment deployed? Are these service catalog deployed projects or not?

Comment 3 Jian Zhang 2019-11-07 09:25:51 UTC
Jesus,

Yes, ServiceCatalog was enabled manually. And, the ASB was deployed.

Comment 4 Qin Ping 2019-11-14 02:44:48 UTC
Hit this issue too.

At the same time, found the service-catalog-apiserver clusteroperator is not in the correct status.

$ oc describe co service-catalog-apiserver
Name:         service-catalog-apiserver
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2019-11-13T02:28:44Z
  Generation:          1
  Resource Version:    221379
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/service-catalog-apiserver
  UID:                 efb57472-70de-4e7f-bcc8-c10ddddf21e7
Spec:
Status:
  Conditions:
    Last Transition Time:  2019-11-13T02:28:46Z
    Reason:                AsExpected
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2019-11-13T02:47:18Z
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2019-11-13T11:40:03Z
    Message:               Available: v1beta1.servicecatalog.k8s.io is not ready: 503
    Reason:                Available
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-11-13T02:46:42Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:               <nil>
  Related Objects:
    Group:     
    Name:      openshift-config
    Resource:  namespaces
    Group:     
    Name:      openshift-config-managed
    Resource:  namespaces
    Group:     
    Name:      openshift-service-catalog-apiserver-operator
    Resource:  namespaces
    Group:     
    Name:      openshift-service-catalog-apiserver
    Resource:  namespaces
    Group:     apiregistration.k8s.io
    Name:      v1beta1.servicecatalog.k8s.io
    Resource:  apiservices
  Versions:
    Name:     operator
    Version:  4.3.0-0.nightly-2019-11-12-185229
    Name:     service-catalog-apiserver
    Version:  
Events:       <none>

Comment 5 Jesus M. Rodriguez 2019-11-20 15:40:26 UTC
Fixed by PR https://github.com/openshift/service-catalog/pull/59

Comment 7 Fan Jia 2019-11-22 07:30:19 UTC
The latest nightly build doesn't include the fix pr, will test when the nightly build is ready.

Comment 9 Fan Jia 2019-11-25 07:12:23 UTC
test env:
cv:4.3.0-0.nightly-2019-11-24-183610

test result:
1. oc new-project kaka
2. enable service-catalog-apiserver & service-catalog-controller-manager
3. oc delete ns kaka
ns "kaka" is deleted successfully.

Comment 11 errata-xmlrpc 2020-01-23 11:10:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062