Bug 1918415
| Summary: | MCD nil pointer on dropins | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ben Howard <behoward> |
| Component: | Machine Config Operator | Assignee: | Ben Howard <behoward> |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.7 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:55:11 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This was seen in: https://github.com/openshift/machine-config-operator/pull/2342#issuecomment-763633956 Fix the nil pointer won't fix the root cause, but fixing it is good hygiene. $ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.0-0.nightly-2021-02-08-191932 True False 27m Cluster version is 4.7.0-0.nightly-2021-02-08-191932
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-143-229.us-west-2.compute.internal Ready worker 21m v1.20.0+ba45583
ip-10-0-154-80.us-west-2.compute.internal Ready master 32m v1.20.0+ba45583
ip-10-0-167-217.us-west-2.compute.internal Ready worker 21m v1.20.0+ba45583
ip-10-0-171-115.us-west-2.compute.internal Ready master 31m v1.20.0+ba45583
ip-10-0-203-220.us-west-2.compute.internal Ready worker 25m v1.20.0+ba45583
ip-10-0-207-111.us-west-2.compute.internal Ready master 32m v1.20.0+ba45583
$ cat nil-content.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-worker-nil-dropin
spec:
config:
ignition:
version: 3.2.0
systemd:
units:
- name: crio.service
enabled: true
dropins:
- name: 10-test.conf
$ oc create -f nil-content.yaml
machineconfig.machineconfiguration.openshift.io/99-worker-nil-dropin created
$ oc get mc
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
00-worker 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
01-master-container-runtime 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
01-master-kubelet 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
01-worker-container-runtime 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
01-worker-kubelet 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
99-master-generated-registries 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
99-master-ssh 3.1.0 39m
99-worker-generated-registries 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
99-worker-nil-dropin 3.2.0 7s
99-worker-ssh 3.1.0 39m
rendered-master-a001c8f955f52214f9e7ac86669ccb86 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
rendered-worker-b3ad3955776d2d468e2bf0c9f3750a9a 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 2s
rendered-worker-c975a1a31bded1669c4e55f408c0911b 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
$ oc get mc
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
00-worker 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
01-master-container-runtime 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
01-master-kubelet 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
01-worker-container-runtime 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
01-worker-kubelet 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
99-master-generated-registries 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
99-master-ssh 3.1.0 39m
99-worker-generated-registries 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
99-worker-nil-dropin 3.2.0 14s
99-worker-ssh 3.1.0 39m
rendered-master-a001c8f955f52214f9e7ac86669ccb86 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
rendered-worker-b3ad3955776d2d468e2bf0c9f3750a9a 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 9s
rendered-worker-c975a1a31bded1669c4e55f408c0911b 0023e696058bbdf6e14504117bfc31f208125c47 3.2.0 30m
$ oc get mcp/worker
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
worker rendered-worker-c975a1a31bded1669c4e55f408c0911b False True False 3 0 0 0 31m
$ watch oc get mcp/worker
$ oc debug node/ip-10-0-143-229.us-west-2.compute.internal
Starting pod/ip-10-0-143-229us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2#
sh-4.2# chroot /host
sh-4.4# cd /etc/systemd/system/crio.service.d/
sh-4.4# ls
10-mco-default-madv.conf 10-mco-profile-unix-socket.conf
sh-4.4# exit
exit
sh-4.2# exit
exit
Removing debug pod ...
$ oc get pods -A --field-selector spec.nodeName=ip-10-0-143-229.us-west-2.compute.internal
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-cluster-csi-drivers aws-ebs-csi-driver-node-vmptb 3/3 Running 0 34m
openshift-cluster-node-tuning-operator tuned-zkqwf 1/1 Running 0 34m
openshift-dns dns-default-8t6c7 3/3 Running 0 34m
openshift-image-registry node-ca-x6m94 1/1 Running 0 34m
openshift-ingress-canary ingress-canary-mtsxn 1/1 Running 0 32m
openshift-machine-config-operator machine-config-daemon-bp2xj 2/2 Running 0 34m
openshift-monitoring node-exporter-smxdp 2/2 Running 0 34m
openshift-multus multus-g75j8 1/1 Running 0 34m
openshift-multus network-metrics-daemon-d67wn 2/2 Running 0 34m
openshift-network-diagnostics network-check-target-p2p48 1/1 Running 0 34m
openshift-sdn ovs-t9trj 1/1 Running 0 34m
openshift-sdn sdn-zxqrh 2/2 Running 0 34m
$ oc -n openshift-machine-config-operator logs machine-config-daemon-bp2xj -c machine-config-daemon | grep Dropin
I0209 21:45:00.499874 1736 update.go:1507] Dropin for 10-mco-default-env.conf has no content, skipping write
I0209 21:45:00.504508 1736 update.go:1507] Dropin for 10-test.conf has no content, skipping write
I0209 21:45:00.518622 1736 update.go:1507] Dropin for 10-mco-default-env.conf has no content, skipping write
I0209 21:45:00.815343 1736 update.go:1507] Dropin for 10-mco-default-env.conf has no content, skipping write
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |
The content of a dropin unit is not checked. "I0120 13:49:07.868566 1 update.go:1470] Writing systemd unit dropin \"mco-disabled.conf\"", "I0120 13:49:07.875918 1 update.go:1542] Could not reset unit preset for zincati.service, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory",")", "I0120 13:49:07.875934 1 update.go:1470] Writing systemd unit dropin \"10-mco-default-env.conf\"", "panic: runtime error: invalid memory address or nil pointer dereference", "[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x18a2549]", "", "goroutine 1 [running]:", "github.com/openshift/machine-config-operator/pkg/daemon.(*Daemon).writeUnits(0xc000386000, 0xc0001c5c00, 0xd, 0xd, 0x0, 0x0)", "\t/go/src/github.com/openshift/machine-config-operator/pkg/daemon/update.go:1478 +0x309", skipped 11 lines unfold_more "\t/go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:895", "main.main()", "\t/go/src/github.com/openshift/machine-config-operator/cmd/machine-config-daemon/main.go:27 +0x31" ] }