Bug 1732614

Summary: machine-api-operator panic: runtime error: invalid memory address or nil pointer dereference
Product: OpenShift Container Platform Reporter: Seth Jennings <sjenning>
Component: Cloud ComputeAssignee: Alberto <agarcial>
Status: CLOSED ERRATA QA Contact: sunzhaohua <zhsun>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: agarcial, zhsun
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:30:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Seth Jennings 2019-07-23 21:50:07 UTC
Hit this in CI

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/23461/pull-ci-openshift-origin-master-e2e-aws/11505

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/23461/pull-ci-openshift-origin-master-e2e-aws/11505/artifacts/e2e-aws/pods/openshift-machine-api_machine-api-operator-5669c4cdff-ssnf9_machine-api-operator_previous.log

I0723 20:36:37.879244       1 start.go:58] Version: 0.1.0-463-g0c0a0666-dirty
I0723 20:36:37.882340       1 leaderelection.go:217] attempting to acquire leader lease  openshift-machine-api/machine-api-operator...
I0723 20:38:33.536335       1 leaderelection.go:227] successfully acquired lease openshift-machine-api/machine-api-operator
I0723 20:38:33.543384       1 operator.go:111] Starting Machine API Operator
I0723 20:38:33.642964       1 start.go:104] Synced up machine api informer caches
I0723 20:38:33.643594       1 operator.go:120] Synced up caches
I0723 20:38:33.654707       1 status.go:58] Syncing status: re-syncing
I0723 20:38:34.682851       1 sync.go:40] Synced up all components
I0723 20:38:34.686491       1 status.go:89] Syncing status: available
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x118beef]

goroutine 124 [running]:
github.com/openshift/machine-api-operator/pkg/metrics.MachineCollector.collectMachineMetrics(0x174bf40, 0xc0002b0190, 0x174bf80, 0xc0002b01b0, 0xc000042074, 0x15, 0xc00007e780)
	/go/src/github.com/openshift/machine-api-operator/pkg/metrics/metrics.go:90 +0x33f
github.com/openshift/machine-api-operator/pkg/metrics.(*MachineCollector).Collect(0xc0004282d0, 0xc00007e780)
	/go/src/github.com/openshift/machine-api-operator/pkg/metrics/metrics.go:59 +0x86
github.com/openshift/machine-api-operator/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
	/go/src/github.com/openshift/machine-api-operator/vendor/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x19d
created by github.com/openshift/machine-api-operator/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
	/go/src/github.com/openshift/machine-api-operator/vendor/github.com/prometheus/client_golang/prometheus/registry.go:526 +0xdf8

Comment 2 Alberto 2019-07-24 09:13:49 UTC
This code issue was uncovered by a machine not becoming a node consistently for any reason, so it's worth investigating that

Comment 4 sunzhaohua 2019-07-25 08:38:51 UTC
Verifed.

create a failed machine with 4.2.0-0.nightly-2019-07-21-222447

$ oc logs -f machine-api-operator-7ffdb69466-m447h
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x1107923]

goroutine 13867 [running]:
github.com/openshift/machine-api-operator/pkg/metrics.MachineCollector.collectMachineMetrics(0x1644560, 0xc0003a61e0, 0x16445a0, 0xc0003a6200, 0xc000040044, 0x15, 0xc000107da0)
	/go/src/github.com/openshift/machine-api-operator/pkg/metrics/metrics.go:90 +0x2b3
github.com/openshift/machine-api-operator/pkg/metrics.(*MachineCollector).Collect(0xc00039e420, 0xc000107da0)
	/go/src/github.com/openshift/machine-api-operator/pkg/metrics/metrics.go:59 +0x4a
github.com/openshift/machine-api-operator/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
	/go/src/github.com/openshift/machine-api-operator/vendor/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x193
created by github.com/openshift/machine-api-operator/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
	/go/src/github.com/openshift/machine-api-operator/vendor/github.com/prometheus/client_golang/prometheus/registry.go:526 +0xe23

create a failed machine with 4.2.0-0.nightly-2019-07-21-222447, without this error.

Comment 5 Vikas Choudhary 2019-08-04 06:07:39 UTC
*** Bug 1733474 has been marked as a duplicate of this bug. ***

Comment 7 errata-xmlrpc 2019-10-16 06:30:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922