Hide Forgot
Description of problem: When the rhel worker is adding to the cluster and the cluster is applying new rendered machineconfig at the same time, machineconfig daemon got error "Marking Degraded due to: failed to log pending config: exit status 1". Version-Release number of selected component (if applicable): oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.0-0.nightly-2019-05-09-182710 True False 4h50m Cluster version is 4.1.0-0.nightly-2019-05-09-182710 How reproducible: Always Steps to Reproduce: 1. Make new kubelet machineconfig for workers 2. Add new rhel worker to the cluster during the cluster is applying new machineconfig 3. Actual results: # oc get nodes -o template --template='{{range .items}}{{"===> node:> "}}{{.metadata.name}}{{"\n"}}{{range $k, $v := .metadata.annotations}}{{println $k ":" $v}}{{end}}{{"\n"}}{{end}}' ... ===> node:> dell-r730-068.dsal.lab.eng.rdu2.redhat.com machineconfiguration.openshift.io/currentConfig : rendered-worker-9425bc18523ebb54a929a7db9813a343 machineconfiguration.openshift.io/desiredConfig : rendered-worker-02875877c08cddc36cabb21b7788801c machineconfiguration.openshift.io/reason : failed to log pending config: exit status 1 machineconfiguration.openshift.io/ssh : accessed machineconfiguration.openshift.io/state : Degraded volumes.kubernetes.io/controller-managed-attach-detach : true # oc -n openshift-machine-config-operator logs -f machine-config-daemon-vbs9h I0510 06:52:22.794488 23426 update.go:170] Checking reconcilable for config rendered-worker-9425bc18523ebb54a929a7db9813a343 to rendered-worker-02875877c08cddc36cabb21b7788801c I0510 06:52:22.795668 23426 update.go:743] Starting update from rendered-worker-9425bc18523ebb54a929a7db9813a343 to rendered-worker-02875877c08cddc36cabb21b7788801c I0510 06:52:22.796636 23426 update.go:372] Updating files I0510 06:52:22.796648 23426 update.go:574] Writing file "/etc/tmpfiles.d/cleanup-cni.conf" I0510 06:52:23.334889 23426 update.go:574] Writing file "/etc/systemd/system.conf.d/kubelet-cgroups.conf" I0510 06:52:23.935607 23426 update.go:574] Writing file "/var/lib/kubelet/config.json" I0510 06:52:24.569906 23426 update.go:574] Writing file "/etc/kubernetes/ca.crt" I0510 06:52:24.978786 23426 update.go:574] Writing file "/etc/sysctl.d/forward.conf" I0510 06:52:25.377527 23426 update.go:574] Writing file "/etc/kubernetes/kubelet-plugins/volume/exec/.dummy" I0510 06:52:25.785718 23426 update.go:574] Writing file "/etc/containers/registries.conf" I0510 06:52:26.421934 23426 update.go:574] Writing file "/etc/containers/storage.conf" I0510 06:52:26.827820 23426 update.go:574] Writing file "/etc/crio/crio.conf" I0510 06:52:27.397023 23426 update.go:574] Writing file "/etc/kubernetes/cloud.conf" I0510 06:52:27.866888 23426 update.go:574] Writing file "/etc/kubernetes/kubelet.conf" I0510 06:52:28.446934 23426 update.go:574] Writing file "/etc/kubernetes/kubelet.conf" I0510 06:52:28.832434 23426 update.go:529] Writing systemd unit "kubelet.service" I0510 06:52:29.278484 23426 update.go:546] Enabling systemd unit "kubelet.service" I0510 06:52:29.278529 23426 update.go:466] /etc/systemd/system/multi-user.target.wants/kubelet.service already exists. Not making a new symlink I0510 06:52:29.278541 23426 update.go:391] Deleting stale data I0510 06:52:29.278558 23426 update.go:636] Writing SSHKeys at "/home/core/.ssh/authorized_keys" I0510 06:52:29.689316 23426 update.go:636] Writing SSHKeys at "/home/core/.ssh/authorized_keys" I0510 06:52:30.295779 23426 update.go:372] Updating files I0510 06:52:30.295804 23426 update.go:574] Writing file "/etc/tmpfiles.d/cleanup-cni.conf" I0510 06:52:30.702501 23426 update.go:574] Writing file "/etc/systemd/system.conf.d/kubelet-cgroups.conf" I0510 06:52:31.253447 23426 update.go:574] Writing file "/var/lib/kubelet/config.json" I0510 06:52:31.847678 23426 update.go:574] Writing file "/etc/kubernetes/ca.crt" I0510 06:52:32.275698 23426 update.go:574] Writing file "/etc/sysctl.d/forward.conf" I0510 06:52:32.885524 23426 update.go:574] Writing file "/etc/kubernetes/kubelet-plugins/volume/exec/.dummy" I0510 06:52:33.289385 23426 update.go:574] Writing file "/etc/containers/registries.conf" I0510 06:52:33.930205 23426 update.go:574] Writing file "/etc/containers/storage.conf" I0510 06:52:34.496840 23426 update.go:574] Writing file "/etc/crio/crio.conf" I0510 06:52:34.913196 23426 update.go:574] Writing file "/etc/kubernetes/cloud.conf" I0510 06:52:35.118717 23426 update.go:574] Writing file "/etc/kubernetes/kubelet.conf" I0510 06:52:35.611156 23426 update.go:529] Writing systemd unit "kubelet.service" I0510 06:52:36.004440 23426 update.go:546] Enabling systemd unit "kubelet.service" I0510 06:52:36.004469 23426 update.go:466] /etc/systemd/system/multi-user.target.wants/kubelet.service already exists. Not making a new symlink I0510 06:52:36.004475 23426 update.go:391] Deleting stale data E0510 06:52:36.004497 23426 writer.go:132] Marking Degraded due to: failed to log pending config: exit status 1 Expected results: Should work well Additional info:
Can you provide full steps to reproduce?
I've opened https://github.com/openshift/machine-config-operator/pull/733 to provide further debug when merged.
[root@dell-r730-068 ~]# logger --journald <<EOF > MESSAGE_ID=machine-config-daemon-pending-state > MESSAGE=rendered-worker-02875877c08cddc36cabb21b7788801c > BOOT_ID=983da81ad27042e29688465f2201b1f2 > PENDING=1 > EOF logger: unrecognized option '--journald' The logger version shipped with rhel 7.6 hasn't that --journald options, and is thus failing.
https://github.com/openshift/machine-config-operator/pull/734
Does this one need to be marked as a blocker?
(In reply to Colin Walters from comment #5) > Does this one need to be marked as a blocker? uhm, not sure, same goes for https://bugzilla.redhat.com/show_bug.cgi?id=1707162 which doesn't have blocker but both have the 4.1.0 target
Need wait new build to have a try.
Checked with 4.1.0-0.nightly-2019-05-14-202907, and rhel 7.6 worker still can not work well now. # oc get nodes -o template --template='{{range .items}}{{"===> node:> "}}{{.metadata.name}}{{"\n"}}{{range $k, $v := .metadata.annotations}}{{println $k ":" $v}}{{end}}{{"\n"}}{{end}}' ... ===> node:> dell-r730-068.dsal.lab.eng.rdu2.redhat.com machineconfiguration.openshift.io/currentConfig : rendered-worker-dd652a6e0f249230ef0b9fe1f5f241e9 machineconfiguration.openshift.io/desiredConfig : rendered-worker-41eed7b1596539a2ea0028a651704126 machineconfiguration.openshift.io/reason : failed to log pending config: logger: unrecognized option '--journald' Usage: logger [options] [message] Options: -T, --tcp use TCP only -d, --udp use UDP only -i, --id log the process ID too -f, --file <file> log the contents of this file -h, --help display this help text and exit -S, --size <num> maximum size for a single message (default 1024) -n, --server <name> write to this remote syslog server -P, --port <port> use this port for UDP or TCP connection -p, --priority <prio> mark given message with this priority -s, --stderr output message to standard error as well -t, --tag <tag> mark every line with this tag -u, --socket <socket> write to this Unix socket -V, --version output version information and exit : exit status 1 machineconfiguration.openshift.io/ssh : accessed machineconfiguration.openshift.io/state : Degraded volumes.kubernetes.io/controller-managed-attach-detach : true
The branching to 4.1 happened and it didn't pick https://github.com/openshift/machine-config-operator/pull/734 Backported. https://github.com/openshift/machine-config-operator/pull/754
Backport merged. Next nightly should have that.
This nightly payload now contains the backport to be tested by QE https://openshift-release.svc.ci.openshift.org/releasestream/4.1.0-0.nightly/release/4.1.0-0.nightly-2019-05-15-151517
Checked with 4.1.0-0.nightly-2019-05-15-151517 and rhel worker work well for machineconfig updating, so move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758