Bug 1775009
| Summary: | machine-config operators controller segfaults during 4.2.4 upgrade | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Antonio Murdaca <amurdaca> |
| Component: | Machine Config Operator | Assignee: | Antonio Murdaca <amurdaca> |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.3.0 | CC: | ChetRHosey, clasohm, dahernan, hyagi, jkaur, kgarriso, mnguyen, mzali, rfairley, smoro |
| Target Milestone: | --- | ||
| Target Release: | 4.3.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1772680 | Environment: | |
| Last Closed: | 2020-01-23 11:13:36 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1772680, 1775013 | ||
|
Description
Antonio Murdaca
2019-11-21 11:20:36 UTC
*** Bug 1767591 has been marked as a duplicate of this bug. *** Currently upgrading to 4.2.9, and again facing with issues the the machine-config-controller
$ oc logs -n openshift-machine-config-operator machine-config-controller-69c5cf857-r5vrc -f
I1207 03:43:28.567141 1 start.go:50] Version: v4.2.9-201911261133-dirty (d780d197a9c5848ba786982c0c4aaa7487297046)
E1207 03:45:24.285369 1 event.go:247] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"machine-config-controller", GenerateName:"", Namespace:"openshift-machine-config-operator", SelfLink:"/api/v1/namespaces/openshift-machine-config-operator/configmaps/machine-config-controller", UID:"b114d4a4-071e-11ea-a595-52540079f30f", ResourceVersion:"43565395", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63709360698, loc:(*time.Location)(0x28f62e0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{\"holderIdentity\":\"machine-config-controller-69c5cf857-r5vrc_bb152387-18a3-11ea-90c2-0a580a810177\",\"leaseDurationSeconds\":90,\"acquireTime\":\"2019-12-07T03:45:24Z\",\"renewTime\":\"2019-12-07T03:45:24Z\",\"leaderTransitions\":262}"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'no kind is registered for the type v1.ConfigMap in scheme "github.com/openshift/machine-config-operator/cmd/common/helpers.go:30"'. Will not report event: 'Normal' 'LeaderElection' 'machine-config-controller-69c5cf857-r5vrc_bb152387-18a3-11ea-90c2-0a580a810177 became leader'
E1207 03:45:24.372755 1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/asm_amd64.s:522
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:513
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:82
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/signal_unix.go:390
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:84
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:604
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:615
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:125
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:397
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:115
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:195
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:554
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/asm_amd64.s:1333
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xe60bf5]
goroutine 241 [running]:
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x108
panic(0x15aec80, 0x28dbcd0)
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:513 +0x1b9
github.com/openshift/machine-config-operator/pkg/apis/machineconfiguration.openshift.io/v1.(*MachineConfigPool).GetNamespace(0x0, 0x0, 0x19e98a0)
<autogenerated>:1 +0x5
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc(0x1778ea0, 0x0, 0xc00082d260, 0x1528080, 0xc0009d83c0, 0x12a05f200)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:84 +0x114
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc(0x1778ea0, 0x0, 0xc0009d83c0, 0x12a05f200, 0x0, 0x0)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261 +0x6a
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc0004fc0a0, 0x0, 0x12a05f200)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:604 +0x41
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(0xc0004fc0a0, 0x0)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:615 +0x44
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault-fm(0x0)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:125 +0x34
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addNode(0xc0004fc0a0, 0x1794e00, 0xc000379b00)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:397 +0x1a1
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addNode-fm(0x1794e00, 0xc000379b00)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:115 +0x3e
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(0xc000578990, 0xc0005789a0, 0xc0005789b0, 0x1794e00, 0xc000379b00)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:195 +0x49
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0x0, 0xc0008e2300, 0xc0003a1c20)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:554 +0x21d
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc000c2fe18, 0x429692, 0xc0003a1c50)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265 +0x51
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548 +0x79
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00093f768)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000c2ff68, 0xdf8475800, 0x0, 0x1582101, 0xc0007eb020)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xbe
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc00093f768, 0xdf8475800, 0xc0007eb020)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc000143200)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546 +0x8d
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run-fm()
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390 +0x2a
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc000614530, 0xc0009d8000)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62
Still waiting for new MachineConfigs generation, the rest of my operators were done upgrading about an hour ago. Will check tomorrow.
4.2.9 done applying, couple minutes ago again, machine-config-controller has been segfaulting a lot it calmed down about 9h ago, the rendered machine configs did show up and a first bunch of nodes were upgraded could not finish, as the controller segfaulted again, and again the rest of my nodes started rebooting about 30 minutes ago, and are now all running v1.14.6+31a56cf75. Somehow the machine-config-controller is still running. For now. all in all, it took 16h30 for 4.2.9 to deploy, which is my new record. Hi any progress on verifying this fix? We need to backport to 4.2.z. Verified on 4.3.0-0.nightly-2019-12-20-025144. Upgraded to 4.3.0-0.nightly-2019-12-20-152137. No segfaults detected.
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-130-225.ec2.internal Ready master 26m v1.16.2
ip-10-0-133-97.ec2.internal Ready worker 17m v1.16.2
ip-10-0-150-28.ec2.internal Ready master 26m v1.16.2
ip-10-0-156-216.ec2.internal Ready worker 17m v1.16.2
ip-10-0-160-237.ec2.internal Ready master 26m v1.16.2
ip-10-0-173-20.ec2.internal Ready worker 17m v1.16.2
$ oc label node/ip-10-0-133-97.ec2.internal node-role.kubernetes.io/infra=""
node/ip-10-0-133-97.ec2.internal labeled
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-130-225.ec2.internal Ready master 27m v1.16.2
ip-10-0-133-97.ec2.internal Ready infra,worker 18m v1.16.2
ip-10-0-150-28.ec2.internal Ready master 27m v1.16.2
ip-10-0-156-216.ec2.internal Ready worker 17m v1.16.2
ip-10-0-160-237.ec2.internal Ready master 27m v1.16.2
ip-10-0-173-20.ec2.internal Ready worker 18m v1.16.2
$ cat infra.mcp.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/infra: ""
$ oc create -f infra.mcp.yaml
machineconfigpool.machineconfiguration.openshift.io/infra created
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT
infra rendered-infra-3cf246f5dc359eeb44060c37d2d5982e True False False 0 0 0 0
master rendered-master-68866b3f5c6924f954fc93ed42e7bb70 True False False 3 3 3 0
worker rendered-worker-3cf246f5dc359eeb44060c37d2d5982e True False False 3 3 3 0
$ oc adm upgrade --force --to-image=registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-20-152137
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-20-152137
$ oc -n openshift-machine-config-operator logs machine-config-controller-76769b5476-ptv2w | grep -i 'nil pointer'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |