Bug 1911467 - machineset-controller container observing panic
Summary: machineset-controller container observing panic
Keywords:
Status: CLOSED DUPLICATE of bug 1890038
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Alberto
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-29 16:00 UTC by Apoorva Jagtap
Modified: 2024-03-25 17:42 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-04 10:48:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Apoorva Jagtap 2020-12-29 16:00:02 UTC
Description of problem:

While upgrading cluster from v4.5.16 to v4.6.8, the machine-api-controller pods stuck in CrashLoopBackoff state with machineset-controller container observing panic:
~~~
$ omg get po -n openshift-machine-api
NAME                                          READY  STATUS   RESTARTS  AGE
cluster-autoscaler-operator-7d798866b6-2qvwh  2/2    Running  0         1d
machine-api-controllers-6db879646d-g8lmv      6/7    Running  302       22h
machine-api-controllers-7c5d86bdf8-6rwtk      6/7    Running  272       20h
machine-api-operator-588bbcbcd6-m6vmj         2/2    Running  0         5d

~~~
$ omg logs machine-api-controllers-7c5d86bdf8-6rwtk -c machineset-controller

2020-12-29T06:48:35.523967946Z I1229 06:48:35.523856       1 request.go:581] Throttling request took 4.054290308s, request: GET:https://172.30.0.1:443/apis/security.istio.io/v1beta1?timeout=32s
2020-12-29T06:48:35.644538709Z panic: runtime error: invalid memory address or nil pointer dereference
2020-12-29T06:48:35.644538709Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x141b406]
2020-12-29T06:48:35.644538709Z 
2020-12-29T06:48:35.644538709Z goroutine 1 [running]:
2020-12-29T06:48:35.644538709Z github.com/openshift/machine-api-operator/pkg/apis/machine/v1beta1.getMachineDefaulterOperation2020-12-29T06:48:35.644617819Z (0x0, 0x0)
2020-12-29T06:48:35.644617819Z 	/go/src/github.com/openshift/machine-api-operator/pkg/apis/machine/v1beta1/machine_webhook.go:216 +0x26
2020-12-29T06:48:35.644617819Z github.com/openshift/machine-api-operator/pkg/apis/machine/v1beta1.createMachineDefaulter(...)
2020-12-29T06:48:35.644617819Z 	/go/src/github.com/openshift/machine-api-operator/pkg/apis/machine/v1beta1/machine_webhook.go:210
2020-12-29T06:48:35.644617819Z github.com/openshift/machine-api-operator/pkg/apis/machine/v1beta1.NewMachineDefaulter(0xc00051c900, 0xc0002b00e0, 0x17f39d8)
2020-12-29T06:48:35.644617819Z 	/go/src/github.com/openshift/machine-api-operator/pkg/apis/machine/v1beta1/machine_webhook.go:203 +0x65
2020-12-29T06:48:35.644637951Z main.main()
2020-12-29T06:48:35.644637951Z 	/go/src/github.com/openshift/machine-api-operator/cmd/machineset/main.go:2020-12-29T06:48:35.644653854Z 123 +0x656

~~~

Actual results:
- Upgrade stuck due to pods not being in Ready state.

Expected results:
- The upgrade should complete as expected.

Comment 2 Joel Speed 2021-01-04 10:48:07 UTC
We were relying on the following comment [1]:

> // This value will be synced with to the `status.platform` and `status.platformStatus.type`.
> // Currently this value cannot be changed once set.

Having reviewed another BZ [2] assigned to a different team, their proposed fix will resolve the issue we are seeing here.

Marking this as a duplicate as there is nothing for us to do. If the issue isn't resolved when the other BZ is merged, please reopen this.

[1]: ttps://github.com/openshift/api/blob/78a19e96f9ebe77b8e0e6019f3cd3b4ae53b0fda/config/v1/types_infrastructure.go#L198-L199
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1890038

*** This bug has been marked as a duplicate of bug 1890038 ***


Note You need to log in before you can comment on or make changes to this bug.