Bug 1637737

Summary:	Service catalog controller segmentation fault
Product:	OpenShift Container Platform	Reporter:	Robert Bost <rbost>
Component:	Service Catalog	Assignee:	Jay Boyd <jaboyd>
Status:	CLOSED ERRATA	QA Contact:	Jian Zhang <jiazha>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.10.0	CC:	chezhang, cshereme, dyan, jaboyd, jfan, jiazha, rbost, ssadhale, zitang
Target Milestone:	---
Target Release:	3.11.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Previously if a Service Instance failed provisioning for the maximum reconciliation period (default is 7 days) the Service Catalog controller manager pod would crash trying to finalize the state of the failed instance. This is now properly handled and the instance is set to a failed provisioning status.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-11-20 03:10:46 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Robert Bost 2018-10-09 22:03:22 UTC

Description of problem:
builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:1712
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:1699
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:856
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:713
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:275
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:241
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller.go:239
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller.go:357
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller.go:374
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller.go:283
/usr/lib/golang/src/runtime/asm_amd64.s:2337
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x158ccb0]

goroutine 200 [running]:
github.com/kubernetes-incubator/service-catalog/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x111
panic(0x18cbdc0, 0x357c7f0)
	/usr/lib/golang/src/runtime/panic.go:491 +0x283
github.com/kubernetes-incubator/service-catalog/pkg/controller.(*controller).processProvisionFailure(0xc420704380, 0xc4202be480, 0x0, 0xc420a7e5a0, 0x1b8b801, 0x41, 0xc420a7e5a0)
	/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:1712 +0x40
github.com/kubernetes-incubator/service-catalog/pkg/controller.(*controller).processTerminalProvisionFailure(0xc420704380, 0xc4202be480, 0x0, 0xc420a7e5a0, 0x1b8b801, 0x41, 0xc420a7e5a0)
	/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:1699 +0x5b
github.com/kubernetes-incubator/service-catalog/pkg/controller.(*controller).processServiceInstancePollingFailureRetryTimeout(0xc420704380, 0xc4202be480, 0x0, 0x1b2e1aa, 0x7)
	/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:856 +0x25b
github.com/kubernetes-incubator/service-catalog/pkg/controller.(*controller).pollServiceInstance(0xc420704380, 0xc4206f3080, 0x1b2af7a, 0x4)
	/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:713 +0x742
github.com/kubernetes-incubator/service-catalog/pkg/controller.(*controller).reconcileServiceInstance(0xc420704380, 0xc4206f3080, 0x0, 0xc4206f3080)
	/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:275 +0x2f5
github.com/kubernetes-incubator/service-catalog/pkg/controller.(*controller).reconcileServiceInstanceKey(0xc420704380, 0xc4203aa5c0, 0x1c, 0xc4207bbc98, 0xc420534300)
	/builddir/build/BUILD/atomic-enterprise-service-catalog-git-1446.727628e/_output/local/go/src/github.com/kubernetes-incubator/service-catalog/pkg/controller/controller_instance.go:241 +0x2fc
github.com/kubernetes-incubator/service-catalog/pkg/controller.(*controller).(github.com/kubernetes-incubator/service-catalog/pkg/controller.reconcileServiceInstanceKey)-fm(0xc4203aa5c0, 0x1c, 0xc42056c700, 0x18220a0)


Version-Release number of selected component (if applicable):
registry.access.redhat.com/openshift3/ose-service-catalog:v3.10.34


How reproducible: Always for customer environment.

Comment 2 Jay Boyd 2018-10-10 13:30:34 UTC

Is this blocking the customer?  Is the Service Catalog controller manager pod constantly in a panic/restart/panic/restart state?  IE the "bad" instance may need to be deleted.

You indicated it's always reproducible - what are the steps to reproduce?

Looks like this may be addressed by upstream https://github.com/kubernetes-incubator/service-catalog/pull/2259

Comment 3 Jay Boyd 2018-10-10 13:57:25 UTC

This looks to only happen when the reconciliationRetryDuration is exceeded which is  7 days.  So I imagine someone tried to provision an instance and the broker failed with a retry-able error and we kept retrying (with an exponential backoff) for 7 days?

Comment 10 Jay Boyd 2018-11-01 00:10:55 UTC

correction from comment #8 - fixed in 3.11.z in atomic-enterprise-service-catalog-3.11.0-0.30.0

Comment 12 Jian Zhang 2018-11-02 07:01:13 UTC

The version info:
[root@ip-172-18-0-56 ~]# oc exec controller-manager-x8jfr -- service-catalog --version
v3.11.36;Upstream:v0.1.35

The Service Catalog works well, I did not find the crash after a day's running, and I recreated it. LGTM, verify it.

[root@ip-172-18-0-56 ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
apiserver-bkhst            1/1       Running   0          1h
controller-manager-x8jfr   1/1       Running   0          1h

Comment 14 errata-xmlrpc 2018-11-20 03:10:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3537