Bug 1774212

Summary: kube-apiserver-operator panic during gcp install
Product: OpenShift Container Platform Reporter: Phil Cameron <pcameron>
Component: openshift-apiserverAssignee: Michal Fojtik <mfojtik>
Status: CLOSED ERRATA QA Contact: Xingxing Xia <xxia>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.3.0CC: aos-bugs, lmohanty, mfojtik, scuppett, sttts, wking
Target Milestone: ---Keywords: Reopened, Upgrades
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1781286 1809296 (view as bug list) Environment:
Last Closed: 2020-03-02 19:29:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1803739, 1803742    
Bug Blocks: 1781286, 1809296    

Description Phil Cameron 2019-11-19 19:39:32 UTC
Description of problem:openshift-install create cluster - gcp cluster install has an Observed a panic: "invalid memory address or nil pointer dereference" in the kube-apiserver-operator


Version-Release number of selected component (if applicable):

openshift-install-linux-4.3.0-0.ci-2019-11-19-095016.tar.gz


How reproducible:
Not sure caught it once.

openshift-install --dir workdir create cluster
select gcp cluster
In another window 
openshift-install --dir workdir gather bootstrap
near the end of bootup before teh bootstrap gets cleaned up.
Unpack teh log bundle and grep -r "Observed a panic"


Steps to Reproduce:
1.
2.
3.
openshift-install --dir workdir create cluster
select gcp cluster
In another window 
openshift-install --dir workdir gather bootstrap
near the end of bootup before teh bootstrap gets cleaned up.
Unpack teh log bundle and grep -r "Observed a panic"


Actual results:
log-bundle-20191119135942/control-plane/10.0.0.4/containers/kube-apiserver-operator-718834744bb9724162385a0e4ef7a87a14ff59e6d66ec0e549c11310922f3468.log
...
I1119 18:57:48.956179       1 backing_resource_controller.go:138] Starting BackingResourceController
E1119 18:57:48.956282       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 857 [running]:
github.com/openshift/cluster-kube-apiserver-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1ec3760, 0x3e227c0)
        /go/src/github.com/openshift/cluster-kube-apiserver-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
github.com/openshift/cluster-kube-apiserver-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/src/github.com/openshift/cluster-kube-apiserver-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x1ec3760, 0x3e227c0)
        /usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/openshift/cluster-kube-apiserver-operator/vendor/github.com/openshift/library-go/pkg/operator/staticpod/controller/monitoring.(*MonitoringResourceController).Run(0x0, 0x1, 0xc000876120)
        /go/src/github.com/openshift/cluster-kube-apiserver-operator/vendor/github.com/openshift/library-go/pkg/operator/staticpod/controller/monitoring/monitoring_resource_controller.go:160 +0x57
created by github.com/openshift/cluster-kube-apiserver-operator/vendor/github.com/openshift/library-go/pkg/operator/staticpod.(*staticPodOperatorControllers).Run
        /go/src/github.com/openshift/cluster-kube-apiserver-operator/vendor/github.com/openshift/library-go/pkg/operator/staticpod/controllers.go:291 +0x1e8
I1119 18:57:48.956366       1 unsupportedconfigoverrides_controller.go:151] Starting UnsupportedConfigOverridesController






Expected results:


Additional info:

Comment 3 Michal Fojtik 2019-12-11 09:55:45 UTC
*** Bug 1781286 has been marked as a duplicate of this bug. ***

Comment 4 Xingxing Xia 2019-12-12 09:57:22 UTC
Checked 4.3.0-0.nightly-2019-12-10-235659 gcp install, the kube-apiserver-operator container log is like:
...
I1212 03:37:00.835919       1 backing_resource_controller.go:138] Starting BackingResourceController
I1212 03:37:00.835948       1 unsupportedconfigoverrides_controller.go:151] Starting UnsupportedConfigOverridesController
...

No above panic between above log lines.

Comment 6 errata-xmlrpc 2020-01-23 11:13:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062

Comment 7 W. Trevor King 2020-02-25 17:58:14 UTC
A 4.3.2 -> 4.3.3 update job seems to have this same panic [1,2]:

E0219 23:51:36.357919       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 629 [running]:
github.com/openshift/cluster-kube-scheduler-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1d84e20, 0x3bf2bc0)
	/go/src/github.com/openshift/cluster-kube-scheduler-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
github.com/openshift/cluster-kube-scheduler-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/cluster-kube-scheduler-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x1d84e20, 0x3bf2bc0)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5
github.com/openshift/cluster-kube-scheduler-operator/vendor/github.com/openshift/library-go/pkg/operator/staticpod/controller/monitoring.(*MonitoringResourceController).Run(0x0, 0x1, 0xc0005fe2a0)
	/go/src/github.com/openshift/cluster-kube-scheduler-operator/vendor/github.com/openshift/library-go/pkg/operator/staticpod/controller/monitoring/monitoring_resource_controller.go:160 +0x57
created by github.com/openshift/cluster-kube-scheduler-operator/vendor/github.com/openshift/library-go/pkg/operator/staticpod.(*staticPodOperatorControllers).Run
	/go/src/github.com/openshift/cluster-kube-scheduler-operator/vendor/github.com/openshift/library-go/pkg/operator/staticpod/controllers.go:291 +0x1e8
I0219 23:51:36.358415       1 target_config_reconciler.go:112] Starting TargetConfigReconciler

Checking against the nominal fix:

$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.2-x86_64 | grep scheduler
  cluster-kube-scheduler-operator               https://github.com/openshift/cluster-kube-scheduler-operator               da32b6a109a7729678c889935ff310809263077a
$ git cat-file -p da32b6a109a7729678c889935ff310809263077a:go.sum | grep library-go
github.com/openshift/library-go v0.0.0-20191112181215-0597a29991ca h1:na+aH2m/OfMUzlh3l757OOPruejkk+Q0dBhI63v6B1o=
github.com/openshift/library-go v0.0.0-20191112181215-0597a29991ca/go.mod h1:NBttNjZpWwup/nthuLbPAPSYC8Qyo+BBK5bCtFoyYjo=
github.com/openshift/library-go v0.0.0-20191118102510-4e2c7112d252 h1:GY3oBSyQIkjpn4UPzs2Fz78zgXlJT6tZUvcVpec+frg=
github.com/openshift/library-go v0.0.0-20191118102510-4e2c7112d252/go.mod h1:NBttNjZpWwup/nthuLbPAPSYC8Qyo+BBK5bCtFoyYjo=

Then in library-go:

$ git --no-pager log --first-parent --oneline origin/release-4.3 | grep '0597a299\|4e2c7112d\|cherry-pick-614'
998f403c4 Merge pull request #634 from openshift-cherrypick-robot/cherry-pick-614-to-release-4.3
4e2c7112d Merge pull request #593 from mfojtik/observer-log
0597a2999 Merge pull request #586 from mfojtik/observer-fixed-split

So this fix didn't make it into 4.3.2, let alone 4.3.0.

$ oc adm release info --commits quay.io/openshift-release-dev/ocp-release:4.3.3-x86_64 | grep scheduler
  cluster-kube-scheduler-operator               https://github.com/openshift/cluster-kube-scheduler-operator               da32b6a109a7729678c889935ff310809263077a

Same commit for 4.3.3, so isn't there either.  Hopefully it makes it into 4.3.4...

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade/61
[2]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-azure-upgrade/61/artifacts/e2e-azure-upgrade/pods/openshift-kube-scheduler-operator_openshift-kube-scheduler-operator-5788dc494-pkvvw_kube-scheduler-operator-container.log

Comment 8 W. Trevor King 2020-02-25 18:00:43 UTC
Linking the library-go vendor bump which seems to have pulled in the fix.

Comment 9 W. Trevor King 2020-02-25 19:12:27 UTC
*** Bug 1807197 has been marked as a duplicate of this bug. ***

Comment 10 Lalatendu Mohanty 2020-02-28 13:25:42 UTC
Fixing the Bugzilla state as it was reopened.