Bug 1918415 - MCD nil pointer on dropins
Summary: MCD nil pointer on dropins
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Ben Howard
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-20 16:37 UTC by Ben Howard
Modified: 2021-02-24 15:55 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:55:11 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2351 0 None closed Bug 1918415: check dropins for nil-pointers 2021-02-09 16:07:41 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:55:35 UTC

Description Ben Howard 2021-01-20 16:37:43 UTC
The content of a dropin unit is not checked. 

        "I0120 13:49:07.868566       1 update.go:1470] Writing systemd unit dropin \"mco-disabled.conf\"", 
        "I0120 13:49:07.875918       1 update.go:1542] Could not reset unit preset for zincati.service, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory",")", 
        "I0120 13:49:07.875934       1 update.go:1470] Writing systemd unit dropin \"10-mco-default-env.conf\"", 
        "panic: runtime error: invalid memory address or nil pointer dereference", 
        "[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x18a2549]", 
        "", 
        "goroutine 1 [running]:", 
        "github.com/openshift/machine-config-operator/pkg/daemon.(*Daemon).writeUnits(0xc000386000, 0xc0001c5c00, 0xd, 0xd, 0x0, 0x0)", 
        "\t/go/src/github.com/openshift/machine-config-operator/pkg/daemon/update.go:1478 +0x309", 
skipped 11 lines unfold_more
        "\t/go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:895", 
        "main.main()", 
        "\t/go/src/github.com/openshift/machine-config-operator/cmd/machine-config-daemon/main.go:27 +0x31"
    ]
}

Comment 1 Ben Howard 2021-01-20 16:39:05 UTC
This was seen in:
https://github.com/openshift/machine-config-operator/pull/2342#issuecomment-763633956

Fix the nil pointer won't fix the root cause, but fixing it is good hygiene.

Comment 3 Michael Nguyen 2021-02-09 22:05:13 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-02-08-191932   True        False         27m     Cluster version is 4.7.0-0.nightly-2021-02-08-191932

$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-143-229.us-west-2.compute.internal   Ready    worker   21m   v1.20.0+ba45583
ip-10-0-154-80.us-west-2.compute.internal    Ready    master   32m   v1.20.0+ba45583
ip-10-0-167-217.us-west-2.compute.internal   Ready    worker   21m   v1.20.0+ba45583
ip-10-0-171-115.us-west-2.compute.internal   Ready    master   31m   v1.20.0+ba45583
ip-10-0-203-220.us-west-2.compute.internal   Ready    worker   25m   v1.20.0+ba45583
ip-10-0-207-111.us-west-2.compute.internal   Ready    master   32m   v1.20.0+ba45583

$ cat nil-content.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-nil-dropin
spec:
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
        - name: crio.service
          enabled: true
          dropins:
            - name: 10-test.conf

$ oc create -f nil-content.yaml 
machineconfig.machineconfiguration.openshift.io/99-worker-nil-dropin created

$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
00-worker                                          0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
01-master-container-runtime                        0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
01-master-kubelet                                  0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
01-worker-container-runtime                        0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
01-worker-kubelet                                  0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
99-master-generated-registries                     0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
99-master-ssh                                                                                 3.1.0             39m
99-worker-generated-registries                     0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
99-worker-nil-dropin                                                                          3.2.0             7s
99-worker-ssh                                                                                 3.1.0             39m
rendered-master-a001c8f955f52214f9e7ac86669ccb86   0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
rendered-worker-b3ad3955776d2d468e2bf0c9f3750a9a   0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             2s
rendered-worker-c975a1a31bded1669c4e55f408c0911b   0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m

$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
00-worker                                          0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
01-master-container-runtime                        0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
01-master-kubelet                                  0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
01-worker-container-runtime                        0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
01-worker-kubelet                                  0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
99-master-generated-registries                     0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
99-master-ssh                                                                                 3.1.0             39m
99-worker-generated-registries                     0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
99-worker-nil-dropin                                                                          3.2.0             14s
99-worker-ssh                                                                                 3.1.0             39m
rendered-master-a001c8f955f52214f9e7ac86669ccb86   0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m
rendered-worker-b3ad3955776d2d468e2bf0c9f3750a9a   0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             9s
rendered-worker-c975a1a31bded1669c4e55f408c0911b   0023e696058bbdf6e14504117bfc31f208125c47   3.2.0             30m

$ oc get mcp/worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-c975a1a31bded1669c4e55f408c0911b   False     True       False      3              0                   0                     0                      31m

$ watch oc get mcp/worker

$ oc debug node/ip-10-0-143-229.us-west-2.compute.internal
Starting pod/ip-10-0-143-229us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`

If you don't see a command prompt, try pressing enter.
sh-4.2# 
sh-4.2# chroot /host
sh-4.4# cd /etc/systemd/system/crio.service.d/
sh-4.4# ls
10-mco-default-madv.conf  10-mco-profile-unix-socket.conf
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

$ oc get pods -A --field-selector spec.nodeName=ip-10-0-143-229.us-west-2.compute.internal
NAMESPACE                                NAME                            READY   STATUS    RESTARTS   AGE
openshift-cluster-csi-drivers            aws-ebs-csi-driver-node-vmptb   3/3     Running   0          34m
openshift-cluster-node-tuning-operator   tuned-zkqwf                     1/1     Running   0          34m
openshift-dns                            dns-default-8t6c7               3/3     Running   0          34m
openshift-image-registry                 node-ca-x6m94                   1/1     Running   0          34m
openshift-ingress-canary                 ingress-canary-mtsxn            1/1     Running   0          32m
openshift-machine-config-operator        machine-config-daemon-bp2xj     2/2     Running   0          34m
openshift-monitoring                     node-exporter-smxdp             2/2     Running   0          34m
openshift-multus                         multus-g75j8                    1/1     Running   0          34m
openshift-multus                         network-metrics-daemon-d67wn    2/2     Running   0          34m
openshift-network-diagnostics            network-check-target-p2p48      1/1     Running   0          34m
openshift-sdn                            ovs-t9trj                       1/1     Running   0          34m
openshift-sdn                            sdn-zxqrh                       2/2     Running   0          34m

$ oc -n openshift-machine-config-operator logs machine-config-daemon-bp2xj -c machine-config-daemon | grep Dropin
I0209 21:45:00.499874    1736 update.go:1507] Dropin for 10-mco-default-env.conf has no content, skipping write
I0209 21:45:00.504508    1736 update.go:1507] Dropin for 10-test.conf has no content, skipping write
I0209 21:45:00.518622    1736 update.go:1507] Dropin for 10-mco-default-env.conf has no content, skipping write
I0209 21:45:00.815343    1736 update.go:1507] Dropin for 10-mco-default-env.conf has no content, skipping write

Comment 6 errata-xmlrpc 2021-02-24 15:55:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.