Bug 1775009 - machine-config operators controller segfaults during 4.2.4 upgrade
Summary: machine-config operators controller segfaults during 4.2.4 upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.3.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
: 1767591 (view as bug list)
Depends On:
Blocks: 1772680 1775013
TreeView+ depends on / blocked
 
Reported: 2019-11-21 11:20 UTC by Antonio Murdaca
Modified: 2023-03-24 16:08 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1772680
Environment:
Last Closed: 2020-01-23 11:13:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1280 0 'None' closed Bug 1775009: pkg/controller: do not enqueue a nil MCP 2020-12-10 19:30:30 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:13:54 UTC

Description Antonio Murdaca 2019-11-21 11:20:36 UTC
+++ This bug was initially created as a clone of Bug #1772680 +++

Description of problem:

Upgrade from 4.2.2 to 4.2.4 stuck while upgrading machine-config operator

Version-Release number of selected component (if applicable):

4.2.2, to 4.2.4, on bare-metal

How reproducible:

Unclear

Steps to Reproduce:
1. Proceed with upgrade

Actual results:

upgrade stuck

Expected results:

...

Additional info:

$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.4     True        False         False      4d8h
cloud-credential                           4.2.4     True        False         False      26d
cluster-autoscaler                         4.2.4     True        False         False      26d
console                                    4.2.4     True        False         False      38m
dns                                        4.2.4     True        False         False      26d
image-registry                             4.2.4     True        False         False      46m
ingress                                    4.2.4     True        False         False      4d8h
insights                                   4.2.4     True        False         False      26d
kube-apiserver                             4.2.4     True        False         False      26d
kube-controller-manager                    4.2.4     True        False         False      26d
kube-scheduler                             4.2.4     True        False         False      26d
machine-api                                4.2.4     True        False         False      26d
machine-config                             4.2.2     False       True          True       31m
marketplace                                4.2.4     True        False         False      45m
monitoring                                 4.2.4     True        False         False      37m
network                                    4.2.4     True        False         False      26d
node-tuning                                4.2.4     True        False         False      48m
openshift-apiserver                        4.2.4     True        False         False      15d
openshift-controller-manager               4.2.4     True        False         False      26d
openshift-samples                          4.2.4     True        False         False      47m
operator-lifecycle-manager                 4.2.4     True        False         False      26d
operator-lifecycle-manager-catalog         4.2.4     True        False         False      26d
operator-lifecycle-manager-packageserver   4.2.4     True        False         False      38m
service-ca                                 4.2.4     True        False         False      26d
service-catalog-apiserver                  4.2.4     True        False         False      26d
service-catalog-controller-manager         4.2.4     True        False         False      26d
storage                                    4.2.4     True        False         False      48m

$ oc describe co machine-config
Name:         machine-config
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2019-10-19T13:02:11Z
  Generation:          1
  Resource Version:    19925870
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/machine-config
  UID:                 aa5b767a-f270-11e9-a34e-525400e1605b
Spec:
Status:
  Conditions:
    Last Transition Time:  2019-11-14T20:19:16Z
    Message:               Cluster not available for 4.2.4
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-11-14T20:21:20Z
    Message:               Working towards 4.2.4
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2019-11-14T20:19:16Z
    Message:               Unable to apply 4.2.4: timed out waiting for the condition during waitForControllerConfigToBeCompleted: controllerconfig is not completed: status for ControllerConfig machine-config-controller is being reported for 2, expecting it for 3
    Reason:                MachineConfigControllerFailed
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2019-10-19T13:03:26Z
    Reason:                AsExpected
    Status:                True
    Type:                  Upgradeable
  Extension:
  Related Objects:
    Group:     
    Name:      openshift-machine-config-operator
    Resource:  namespaces
    Group:     machineconfiguration.openshift.io
    Name:      master
    Resource:  machineconfigpools
    Group:     machineconfiguration.openshift.io
    Name:      worker
    Resource:  machineconfigpools
    Group:     machineconfiguration.openshift.io
    Name:      machine-config-controller
    Resource:  controllerconfigs
  Versions:
    Name:     operator
    Version:  4.2.2
Events:       <none>

$ oc get pods
NAME                                         READY   STATUS    RESTARTS   AGE
etcd-quorum-guard-6899c458b4-57bh8           1/1     Running   1          15d
etcd-quorum-guard-6899c458b4-m5xzc           1/1     Running   1          15d
etcd-quorum-guard-6899c458b4-qtk6f           1/1     Running   1          15d
machine-config-controller-746ddfd848-vw5n6   1/1     Running   6          17m
machine-config-daemon-4gdfs                  1/1     Running   0          28m
machine-config-daemon-6g8c2                  1/1     Running   0          28m
machine-config-daemon-6j6sq                  1/1     Running   0          29m
machine-config-daemon-6tkpl                  1/1     Running   0          27m
machine-config-daemon-7k5fv                  1/1     Running   0          28m
machine-config-daemon-8xmqf                  1/1     Running   0          30m
machine-config-daemon-9n986                  1/1     Running   0          29m
machine-config-daemon-fcrrc                  1/1     Running   0          30m
machine-config-daemon-lkxkg                  1/1     Running   0          29m
machine-config-daemon-nx2qh                  1/1     Running   0          27m
machine-config-daemon-pbw9t                  1/1     Running   0          27m
machine-config-operator-65dbcdb7db-q2m4m     1/1     Running   0          32m
machine-config-server-7jwzh                  1/1     Running   2          15d
machine-config-server-9zqdq                  1/1     Running   2          15d
machine-config-server-blxx9                  1/1     Running   2          15d

$ oc logs -f machine-config-controller-746ddfd848-vw5n6 -p
I1114 20:47:04.029470       1 start.go:50] Version: v4.2.4-201911050122-dirty (55bb5fc17da0c3d76e4ee6a55732f0cba93e8520)
E1114 20:48:59.705026       1 event.go:247] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"machine-config-controller", GenerateName:"", Namespace:"openshift-machine-config-operator", SelfLink:"/api/v1/namespaces/openshift-machine-config-operator/configmaps/machine-config-controller", UID:"b114d4a4-071e-11ea-a595-52540079f30f", ResourceVersion:"19925042", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63709360698, loc:(*time.Location)(0x28f62e0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{\"holderIdentity\":\"machine-config-controller-746ddfd848-vw5n6_ea0cb2d6-071f-11ea-a14e-0a580a8100cd\",\"leaseDurationSeconds\":90,\"acquireTime\":\"2019-11-14T20:48:59Z\",\"renewTime\":\"2019-11-14T20:48:59Z\",\"leaderTransitions\":4}"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'no kind is registered for the type v1.ConfigMap in scheme "github.com/openshift/machine-config-operator/cmd/common/helpers.go:30"'. Will not report event: 'Normal' 'LeaderElection' 'machine-config-controller-746ddfd848-vw5n6_ea0cb2d6-071f-11ea-a14e-0a580a8100cd became leader'
E1114 20:48:59.759629       1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/asm_amd64.s:522
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:513
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:82
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/signal_unix.go:390
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:84
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:604
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:615
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:125
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:397
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:115
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:195
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:554
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/asm_amd64.s:1333
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xe60bf5]

goroutine 257 [running]:
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x108
panic(0x15aec80, 0x28dbcd0)
	/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:513 +0x1b9
github.com/openshift/machine-config-operator/pkg/apis/machineconfiguration.openshift.io/v1.(*MachineConfigPool).GetNamespace(0x0, 0x0, 0x19e98a0)
	<autogenerated>:1 +0x5
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc(0x1778ea0, 0x0, 0xc0005669c0, 0x1528080, 0xc000c75630, 0x12a05f200)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:84 +0x114
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc(0x1778ea0, 0x0, 0xc000c75630, 0x12a05f200, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261 +0x6a
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc000562140, 0x0, 0x12a05f200)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:604 +0x41
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(0xc000562140, 0x0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:615 +0x44
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault-fm(0x0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:125 +0x34
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addNode(0xc000562140, 0x1794e00, 0xc0003672c0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:397 +0x1a1
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addNode-fm(0x1794e00, 0xc0003672c0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:115 +0x3e
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(0xc0003380b0, 0xc0003380c0, 0xc0003380d0, 0x1794e00, 0xc0003672c0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:195 +0x49
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0x0, 0xc0009ae000, 0xc00095c190)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:554 +0x21d
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc000963e18, 0x429692, 0xc00095c1c0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265 +0x51
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548 +0x79
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00099b768)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000963f68, 0xdf8475800, 0x0, 0x1582101, 0xc000766180)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xbe
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc00099b768, 0xdf8475800, 0xc000766180)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc00035c500)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546 +0x8d
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run-fm()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390 +0x2a
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc000452060, 0xc0009ac000)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62

Tried to delete the machine-config-controller configmap, it gets re-created, unclear what's going on.

Note I've added an 'infra' machineconfigpool:

$ oc get machineconfigpool
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED
infra    rendered-infra-71fac7974e1cadf6d63f1de505849340    True      False      False
master   rendered-master-66ca81fa67f9bace6638464be9256570   True      False      False
worker   rendered-worker-68ddaaf29b41fed5fb4b9a3339fcb1a5   False     True       False

$ oc describe machineconfigpool worker
Name:         worker
Namespace:    
Labels:       machineconfiguration.openshift.io/mco-built-in=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2019-10-19T13:02:17Z
  Generation:          3
  Resource Version:    19898743
  Self Link:           /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker
  UID:                 ad9f8790-f270-11e9-a34e-525400e1605b
Spec:
  Configuration:
    Name:  rendered-worker-68ddaaf29b41fed5fb4b9a3339fcb1a5
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-worker
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-ad9f8790-f270-11e9-a34e-525400e1605b-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-ssh
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  worker
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/worker:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2019-10-19T13:02:57Z
    Message:               
    Reason:                
    Status:                False
    Type:                  NodeDegraded
    Last Transition Time:  2019-10-19T13:02:57Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2019-10-19T13:03:02Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2019-11-01T19:54:21Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2019-11-01T19:54:21Z
    Message:               All nodes are updating to rendered-worker-68ddaaf29b41fed5fb4b9a3339fcb1a5
    Reason:                
    Status:                True
    Type:                  Updating
  Configuration:
    Name:  rendered-worker-68ddaaf29b41fed5fb4b9a3339fcb1a5
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-worker
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-ad9f8790-f270-11e9-a34e-525400e1605b-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-ssh
  Degraded Machine Count:     0
  Machine Count:              5
  Observed Generation:        3
  Ready Machine Count:        4
  Unavailable Machine Count:  1
  Updated Machine Count:      5
Events:                       <none>

$ oc get machineconfig
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
00-infra                                                                                               2.2.0             25d
00-master                                                   d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             26d
00-worker                                                   d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             26d
01-infra-container-runtime                                                                             2.2.0             25d
01-infra-kubelet                                                                                       2.2.0             25d
01-master-container-runtime                                 d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             26d
01-master-kubelet                                           d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             26d
01-worker-container-runtime                                 d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             26d
01-worker-kubelet                                           d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             26d
99-infra-ad9f8790-f270-11e9-a34e-525400e1605b-registries                                               2.2.0             25d
99-infra-ssh                                                                                           2.2.0             25d
99-master-ad9d318b-f270-11e9-a34e-525400e1605b-registries   d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             26d
99-master-ssh                                                                                          2.2.0             26d
99-worker-ad9f8790-f270-11e9-a34e-525400e1605b-registries   d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             26d
99-worker-ssh                                                                                          2.2.0             26d
rendered-infra-0506920a222781a19fff88a4196deef4             62b0b6d2a751a5f364f2e6d5c9cfe63419668777   2.2.0             25d
rendered-infra-71fac7974e1cadf6d63f1de505849340             d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             15d
rendered-master-66ca81fa67f9bace6638464be9256570            d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             15d
rendered-master-747943425e64364488e51d15e5281265            62b0b6d2a751a5f364f2e6d5c9cfe63419668777   2.2.0             26d
rendered-worker-5e70256103cc4d0ce0162430de7233a1            62b0b6d2a751a5f364f2e6d5c9cfe63419668777   2.2.0             26d
rendered-worker-68ddaaf29b41fed5fb4b9a3339fcb1a5            d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             15d

Cluster was initially deployed with 4.2.0, successfully upgraded to 4.2.2.

--- Additional comment from Samuel on 2019-11-15 08:38:31 UTC ---

As a follow up, in the last hour, for some reason, upgrade did complete, the controller is done segfaulting, is now running.


$ oc get clusterversion
oc NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.4     True        False         40m     Cluster version is 4.2.4
gsyn@sisyphe:~/git/worteks/docker/pingdom-java$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
00-infra                                                                                               2.2.0             26d
00-master                                                   55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             26d
00-worker                                                   55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             26d
01-infra-container-runtime                                                                             2.2.0             26d
01-infra-kubelet                                                                                       2.2.0             26d
01-master-container-runtime                                 55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             26d
01-master-kubelet                                           55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             26d
01-worker-container-runtime                                 55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             26d
01-worker-kubelet                                           55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             26d
99-infra-ad9f8790-f270-11e9-a34e-525400e1605b-registries                                               2.2.0             26d
99-infra-ssh                                                                                           2.2.0             26d
99-master-ad9d318b-f270-11e9-a34e-525400e1605b-registries   55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             26d
99-master-ssh                                                                                          2.2.0             26d
99-worker-ad9f8790-f270-11e9-a34e-525400e1605b-registries   55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             26d
99-worker-ssh                                                                                          2.2.0             26d
rendered-infra-0506920a222781a19fff88a4196deef4             62b0b6d2a751a5f364f2e6d5c9cfe63419668777   2.2.0             26d
rendered-infra-71fac7974e1cadf6d63f1de505849340             d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             16d
rendered-infra-bed1dda9d08a9f80cd088e667c206fb4             55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             88m
rendered-master-66ca81fa67f9bace6638464be9256570            d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             16d
rendered-master-747943425e64364488e51d15e5281265            62b0b6d2a751a5f364f2e6d5c9cfe63419668777   2.2.0             26d
rendered-master-a54208d3fe789a6c2647471c1a6b2015            55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             88m
rendered-worker-5e70256103cc4d0ce0162430de7233a1            62b0b6d2a751a5f364f2e6d5c9cfe63419668777   2.2.0             26d
rendered-worker-68ddaaf29b41fed5fb4b9a3339fcb1a5            d73d5c6c95499f2820853c799366a157b0c0bd09   2.2.0             16d
rendered-worker-917c185a9e38f77f491fc863367e60fb            55bb5fc17da0c3d76e4ee6a55732f0cba93e8520   2.2.0             88m
syn@sisyphe:~/git/worteks/docker/pingdom-java$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.2.4     True        False         False      4d20h
cloud-credential                           4.2.4     True        False         False      26d
cluster-autoscaler                         4.2.4     True        False         False      26d
console                                    4.2.4     True        False         False      80m
dns                                        4.2.4     True        False         False      26d
image-registry                             4.2.4     True        False         False      10m
ingress                                    4.2.4     True        False         False      4d20h
insights                                   4.2.4     True        False         False      26d
kube-apiserver                             4.2.4     True        False         False      26d
kube-controller-manager                    4.2.4     True        False         False      26d
kube-scheduler                             4.2.4     True        False         False      26d
machine-api                                4.2.4     True        False         False      26d
machine-config                             4.2.4     True        False         False      41m
marketplace                                4.2.4     True        False         False      49m
monitoring                                 4.2.4     True        False         False      42m
network                                    4.2.4     True        False         False      26d
node-tuning                                4.2.4     True        False         False      35m
openshift-apiserver                        4.2.4     True        False         False      45m
openshift-controller-manager               4.2.4     True        False         False      26d
openshift-samples                          4.2.4     True        False         False      12h
operator-lifecycle-manager                 4.2.4     True        False         False      26d
operator-lifecycle-manager-catalog         4.2.4     True        False         False      26d
operator-lifecycle-manager-packageserver   4.2.4     True        False         False      49m
service-ca                                 4.2.4     True        False         False      26d
service-catalog-apiserver                  4.2.4     True        False         False      26d
service-catalog-controller-manager         4.2.4     True        False         False      26d
storage                                    4.2.4     True        False         False      12h

$ oc get pods
NAME                                         READY   STATUS    RESTARTS   AGE
etcd-quorum-guard-d59cc8dd8-2qwtf            1/1     Running   0          41m
etcd-quorum-guard-d59cc8dd8-2ww5d            1/1     Running   0          41m
etcd-quorum-guard-d59cc8dd8-zqfv9            1/1     Running   0          42m
machine-config-controller-746ddfd848-fhqb2   1/1     Running   6          62m
...

$ oc logs machine-config-controller-746ddfd848-fhqb2 
I1115 07:52:09.691754       1 start.go:50] Version: v4.2.4-201911050122-dirty (55bb5fc17da0c3d76e4ee6a55732f0cba93e8520)
E1115 07:54:05.385361       1 event.go:247] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"machine-config-controller", GenerateName:"", Namespace:"openshift-machine-config-operator", SelfLink:"/api/v1/namespaces/openshift-machine-config-operator/configmaps/machine-config-controller", UID:"b114d4a4-071e-11ea-a595-52540079f30f", ResourceVersion:"20335459", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63709360698, loc:(*time.Location)(0x28f62e0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{\"holderIdentity\":\"machine-config-controller-746ddfd848-fhqb2_d3ad8ddc-077c-11ea-a0e0-0a580a8200a4\",\"leaseDurationSeconds\":90,\"acquireTime\":\"2019-11-15T07:54:05Z\",\"renewTime\":\"2019-11-15T07:54:05Z\",\"leaderTransitions\":102}"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'no kind is registered for the type v1.ConfigMap in scheme "github.com/openshift/machine-config-operator/cmd/common/helpers.go:30"'. Will not report event: 'Normal' 'LeaderElection' 'machine-config-controller-746ddfd848-fhqb2_d3ad8ddc-077c-11ea-a0e0-0a580a8200a4 became leader'
E1115 07:54:05.439555       1 template_controller.go:120] couldn't get ControllerConfig on secret callback &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:""}, Status:"Failure", Message:"controllerconfig.machineconfiguration.openshift.io \"machine-config-controller\" not found", Reason:"NotFound", Details:(*v1.StatusDetails)(0xc0000d0360), Code:404}}
I1115 07:54:05.613702       1 node_controller.go:147] Starting MachineConfigController-NodeController
I1115 07:54:05.614035       1 render_controller.go:123] Starting MachineConfigController-RenderController
I1115 07:54:05.614065       1 template_controller.go:182] Starting MachineConfigController-TemplateController
I1115 07:54:05.713924       1 container_runtime_config_controller.go:189] Starting MachineConfigController-ContainerRuntimeConfigController
I1115 07:54:05.713924       1 kubelet_config_controller.go:159] Starting MachineConfigController-KubeletConfigController
I1115 07:54:05.719257       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1115 07:54:05.731130       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1115 07:54:05.745190       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1115 07:54:05.753925       1 container_runtime_config_controller.go:713] Applied ImageConfig cluster on MachineConfigPool master
I1115 07:54:05.770186       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1115 07:54:05.778868       1 container_runtime_config_controller.go:713] Applied ImageConfig cluster on MachineConfigPool worker
I1115 07:54:05.815764       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1115 07:54:05.926071       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1115 07:54:06.094310       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1115 07:54:06.421131       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1115 07:54:07.090127       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1115 07:54:08.411033       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1115 07:54:10.450891       1 status.go:82] Pool master: All nodes are updated with rendered-master-a54208d3fe789a6c2647471c1a6b2015
I1115 07:54:10.450983       1 status.go:82] Pool infra: All nodes are updated with rendered-infra-bed1dda9d08a9f80cd088e667c206fb4

...

Upgraded did complete, in 12 hours.

Is 4.2.4 safe to apply to customer clusters?

Any explanation? What could be going on?

--- Additional comment from Samuel on 2019-11-21 10:50:32 UTC ---

Hi,

The machine-config-controller Pod is still crashing with segfaults. Is there anything else you need to further investigate?

$ oc get pods -n openshift-machine-config-operator
NAME                                         READY   STATUS    RESTARTS   AGE
etcd-quorum-guard-d59cc8dd8-2qwtf            1/1     Running   0          6d2h
etcd-quorum-guard-d59cc8dd8-2ww5d            1/1     Running   0          6d2h
etcd-quorum-guard-d59cc8dd8-zqfv9            1/1     Running   0          6d2h
machine-config-controller-746ddfd848-fhqb2   1/1     Running   62         6d3h
machine-config-daemon-4gdfs                  1/1     Running   1          6d14h
machine-config-daemon-6g8c2                  1/1     Running   1          6d14h
machine-config-daemon-6j6sq                  1/1     Running   1          6d14h
...
$ oc logs -n openshift-machine-config-operator -p machine-config-controller-746ddfd848-fhqb2
I1117 04:44:57.725629       1 start.go:50] Version: v4.2.4-201911050122-dirty (55bb5fc17da0c3d76e4ee6a55732f0cba93e8520)
E1117 04:46:53.468163       1 event.go:247] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"machine-config-controller", GenerateName:"", Namespace:"openshift-machine-config-operator", SelfLink:"/api/v1/namespaces/openshift-machine-config-operator/configmaps/machine-config-controller", UID:"b114d4a4-071e-11ea-a595-52540079f30f", ResourceVersion:"22123789", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63709360698, loc:(*time.Location)(0x28f62e0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{\"holderIdentity\":\"machine-config-controller-746ddfd848-fhqb2_01bafee3-08f5-11ea-bfb3-0a580a8200a4\",\"leaseDurationSeconds\":90,\"acquireTime\":\"2019-11-17T04:46:53Z\",\"renewTime\":\"2019-11-17T04:46:53Z\",\"leaderTransitions\":157}"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'no kind is registered for the type v1.ConfigMap in scheme "github.com/openshift/machine-config-operator/cmd/common/helpers.go:30"'. Will not report event: 'Normal' 'LeaderElection' 'machine-config-controller-746ddfd848-fhqb2_01bafee3-08f5-11ea-bfb3-0a580a8200a4 became leader'
E1117 04:46:53.531543       1 template_controller.go:120] couldn't get ControllerConfig on secret callback &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:""}, Status:"Failure", Message:"controllerconfig.machineconfiguration.openshift.io \"machine-config-controller\" not found", Reason:"NotFound", Details:(*v1.StatusDetails)(0xc000239260), Code:404}}
...
I1118 22:02:53.411142       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1118 22:03:13.896279       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1118 22:03:54.987446       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
E1118 22:05:16.933433       1 kubelet_config_controller.go:308] GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1118 22:05:16.933687       1 kubelet_config_controller.go:309] Dropping featureconfig "cluster" out of the queue: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1118 22:06:16.939797       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1118 22:06:16.949732       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
...
I1118 22:10:42.662049       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1118 22:11:23.627181       1 kubelet_config_controller.go:303] Error syncing kubeletconfig cluster: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
E1118 22:12:45.566035       1 kubelet_config_controller.go:308] GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
I1118 22:12:45.566260       1 kubelet_config_controller.go:309] Dropping featureconfig "cluster" out of the queue: GenerateMachineConfigsforRole failed with error failed to read dir "/etc/mcc/templates/infra": open /etc/mcc/templates/infra: no such file or directory
E1118 22:13:34.578758       1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/asm_amd64.s:522
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:513
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:82
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/signal_unix.go:390
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:84
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:604
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:615
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:125
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:475
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:116
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:202
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:552
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/asm_amd64.s:1333
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xe60bf5]

goroutine 158 [running]:
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x108
panic(0x15aec80, 0x28dbcd0)
	/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:513 +0x1b9
github.com/openshift/machine-config-operator/pkg/apis/machineconfiguration.openshift.io/v1.(*MachineConfigPool).GetNamespace(0x0, 0x0, 0x19e98a0)
	<autogenerated>:1 +0x5
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc(0x1778ea0, 0x0, 0xc00057e900, 0x1528080, 0xc00084ece0, 0x12a05f200)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:84 +0x114
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc(0x1778ea0, 0x0, 0xc00084ece0, 0x12a05f200, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261 +0x6a
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc0000e25a0, 0x0, 0x12a05f200)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:604 +0x41
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(0xc0000e25a0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:615 +0x44
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault-fm(0x0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:125 +0x34
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).updateNode(0xc0000e25a0, 0x1794e00, 0xc000864000, 0x1794e00, 0xc000ce8dc0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:475 +0x84c
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).updateNode-fm(0x1794e00, 0xc000864000, 0x1794e00, 0xc000ce8dc0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:116 +0x52
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(0xc0003beaf0, 0xc0003beb00, 0xc0003beb10, 0x1794e00, 0xc000864000, 0x1794e00, 0xc000ce8dc0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:202 +0x5d
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc0000dc5c8, 0xc00095be00, 0xc0000d4a00)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:552 +0x18b
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc000f7fe18, 0x429692, 0xc0000d4a30)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265 +0x51
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548 +0x79
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc0000dc768)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000f7ff68, 0xdf8475800, 0x0, 0x1582101, 0xc00009ab40)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xbe
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc0000dc768, 0xdf8475800, 0xc00009ab40)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc000360380)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546 +0x8d
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run-fm()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390 +0x2a
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc00059e1b0, 0xc0004505f0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62

Comment 2 Ryan Phillips 2019-12-05 15:23:13 UTC
*** Bug 1767591 has been marked as a duplicate of this bug. ***

Comment 3 Samuel 2019-12-07 03:50:55 UTC
Currently upgrading to 4.2.9, and again facing with issues the the machine-config-controller

$ oc logs -n openshift-machine-config-operator machine-config-controller-69c5cf857-r5vrc -f
I1207 03:43:28.567141       1 start.go:50] Version: v4.2.9-201911261133-dirty (d780d197a9c5848ba786982c0c4aaa7487297046)
E1207 03:45:24.285369       1 event.go:247] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"machine-config-controller", GenerateName:"", Namespace:"openshift-machine-config-operator", SelfLink:"/api/v1/namespaces/openshift-machine-config-operator/configmaps/machine-config-controller", UID:"b114d4a4-071e-11ea-a595-52540079f30f", ResourceVersion:"43565395", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63709360698, loc:(*time.Location)(0x28f62e0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"control-plane.alpha.kubernetes.io/leader":"{\"holderIdentity\":\"machine-config-controller-69c5cf857-r5vrc_bb152387-18a3-11ea-90c2-0a580a810177\",\"leaseDurationSeconds\":90,\"acquireTime\":\"2019-12-07T03:45:24Z\",\"renewTime\":\"2019-12-07T03:45:24Z\",\"leaderTransitions\":262}"}, OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'no kind is registered for the type v1.ConfigMap in scheme "github.com/openshift/machine-config-operator/cmd/common/helpers.go:30"'. Will not report event: 'Normal' 'LeaderElection' 'machine-config-controller-69c5cf857-r5vrc_bb152387-18a3-11ea-90c2-0a580a810177 became leader'
E1207 03:45:24.372755       1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/asm_amd64.s:522
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:513
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:82
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/signal_unix.go:390
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:84
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:604
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:615
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:125
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:397
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:115
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:195
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:554
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71
/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/asm_amd64.s:1333
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xe60bf5]

goroutine 241 [running]:
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x108
panic(0x15aec80, 0x28dbcd0)
	/opt/rh/go-toolset-1.11/root/usr/lib/go-toolset-1.11-golang/src/runtime/panic.go:513 +0x1b9
github.com/openshift/machine-config-operator/pkg/apis/machineconfiguration.openshift.io/v1.(*MachineConfigPool).GetNamespace(0x0, 0x0, 0x19e98a0)
	<autogenerated>:1 +0x5
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc(0x1778ea0, 0x0, 0xc00082d260, 0x1528080, 0xc0009d83c0, 0x12a05f200)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:84 +0x114
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc(0x1778ea0, 0x0, 0xc0009d83c0, 0x12a05f200, 0x0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:261 +0x6a
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc0004fc0a0, 0x0, 0x12a05f200)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:604 +0x41
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(0xc0004fc0a0, 0x0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:615 +0x44
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault-fm(0x0)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:125 +0x34
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addNode(0xc0004fc0a0, 0x1794e00, 0xc000379b00)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:397 +0x1a1
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addNode-fm(0x1794e00, 0xc000379b00)
	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:115 +0x3e
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(0xc000578990, 0xc0005789a0, 0xc0005789b0, 0x1794e00, 0xc000379b00)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:195 +0x49
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0x0, 0xc0008e2300, 0xc0003a1c20)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:554 +0x21d
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc000c2fe18, 0x429692, 0xc0003a1c50)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:265 +0x51
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:548 +0x79
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00093f768)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000c2ff68, 0xdf8475800, 0x0, 0x1582101, 0xc0007eb020)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xbe
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc00093f768, 0xdf8475800, 0xc0007eb020)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc000143200)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:546 +0x8d
github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache.(*processorListener).run-fm()
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:390 +0x2a
github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc000614530, 0xc0009d8000)
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62

Still waiting for new MachineConfigs generation, the rest of my operators were done upgrading about an hour ago. Will check tomorrow.

Comment 4 Samuel 2019-12-07 20:03:58 UTC
4.2.9 done applying, couple minutes ago

again, machine-config-controller has been segfaulting a lot

it calmed down about 9h ago, the rendered machine configs did show up and a first bunch of nodes were upgraded

could not finish, as the controller segfaulted again, and again

the rest of my nodes started rebooting about 30 minutes ago, and are now all running v1.14.6+31a56cf75. Somehow the machine-config-controller is still running. For now.

all in all, it took 16h30 for 4.2.9 to deploy, which is my new record.

Comment 5 Kirsten Garrison 2019-12-18 18:45:53 UTC
Hi any progress on verifying this fix? We need to backport to 4.2.z.

Comment 6 Michael Nguyen 2019-12-20 19:59:46 UTC
Verified on 4.3.0-0.nightly-2019-12-20-025144.  Upgraded to 4.3.0-0.nightly-2019-12-20-152137.  No segfaults detected.

$ oc get nodes
NAME                           STATUS   ROLES    AGE   VERSION
ip-10-0-130-225.ec2.internal   Ready    master   26m   v1.16.2
ip-10-0-133-97.ec2.internal    Ready    worker   17m   v1.16.2
ip-10-0-150-28.ec2.internal    Ready    master   26m   v1.16.2
ip-10-0-156-216.ec2.internal   Ready    worker   17m   v1.16.2
ip-10-0-160-237.ec2.internal   Ready    master   26m   v1.16.2
ip-10-0-173-20.ec2.internal    Ready    worker   17m   v1.16.2
$ oc label node/ip-10-0-133-97.ec2.internal node-role.kubernetes.io/infra=""
node/ip-10-0-133-97.ec2.internal labeled
$ oc get nodes
NAME                           STATUS   ROLES          AGE   VERSION
ip-10-0-130-225.ec2.internal   Ready    master         27m   v1.16.2
ip-10-0-133-97.ec2.internal    Ready    infra,worker   18m   v1.16.2
ip-10-0-150-28.ec2.internal    Ready    master         27m   v1.16.2
ip-10-0-156-216.ec2.internal   Ready    worker         17m   v1.16.2
ip-10-0-160-237.ec2.internal   Ready    master         27m   v1.16.2
ip-10-0-173-20.ec2.internal    Ready    worker         18m   v1.16.2


$ cat infra.mcp.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""

$ oc create -f infra.mcp.yaml
machineconfigpool.machineconfiguration.openshift.io/infra created

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT
infra    rendered-infra-3cf246f5dc359eeb44060c37d2d5982e    True      False      False      0              0                   0                     0
master   rendered-master-68866b3f5c6924f954fc93ed42e7bb70   True      False      False      3              3                   3                     0
worker   rendered-worker-3cf246f5dc359eeb44060c37d2d5982e   True      False      False      3              3                   3                     0

$ oc adm upgrade --force --to-image=registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-20-152137
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2019-12-20-152137

$ oc -n openshift-machine-config-operator logs machine-config-controller-76769b5476-ptv2w | grep -i 'nil pointer'

Comment 8 errata-xmlrpc 2020-01-23 11:13:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.