Bug 1708602
| Summary: | Got error from machine-config-daemon when the rhel worker is adding to the cluster but cluster is applying new rendered machineconfig | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | weiwei jiang <wjiang> |
| Component: | Machine Config Operator | Assignee: | Antonio Murdaca <amurdaca> |
| Status: | CLOSED ERRATA | QA Contact: | Micah Abbott <miabbott> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.0 | CC: | amurdaca, walters, xtian |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:48:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Can you provide full steps to reproduce? I've opened https://github.com/openshift/machine-config-operator/pull/733 to provide further debug when merged. [root@dell-r730-068 ~]# logger --journald <<EOF
> MESSAGE_ID=machine-config-daemon-pending-state
> MESSAGE=rendered-worker-02875877c08cddc36cabb21b7788801c
> BOOT_ID=983da81ad27042e29688465f2201b1f2
> PENDING=1
> EOF
logger: unrecognized option '--journald'
The logger version shipped with rhel 7.6 hasn't that --journald options, and is thus failing.
Does this one need to be marked as a blocker? (In reply to Colin Walters from comment #5) > Does this one need to be marked as a blocker? uhm, not sure, same goes for https://bugzilla.redhat.com/show_bug.cgi?id=1707162 which doesn't have blocker but both have the 4.1.0 target Need wait new build to have a try. Checked with 4.1.0-0.nightly-2019-05-14-202907, and rhel 7.6 worker still can not work well now.
# oc get nodes -o template --template='{{range .items}}{{"===> node:> "}}{{.metadata.name}}{{"\n"}}{{range $k, $v := .metadata.annotations}}{{println $k ":" $v}}{{end}}{{"\n"}}{{end}}'
...
===> node:> dell-r730-068.dsal.lab.eng.rdu2.redhat.com
machineconfiguration.openshift.io/currentConfig : rendered-worker-dd652a6e0f249230ef0b9fe1f5f241e9
machineconfiguration.openshift.io/desiredConfig : rendered-worker-41eed7b1596539a2ea0028a651704126
machineconfiguration.openshift.io/reason : failed to log pending config: logger: unrecognized option '--journald'
Usage:
logger [options] [message]
Options:
-T, --tcp use TCP only
-d, --udp use UDP only
-i, --id log the process ID too
-f, --file <file> log the contents of this file
-h, --help display this help text and exit
-S, --size <num> maximum size for a single message (default 1024)
-n, --server <name> write to this remote syslog server
-P, --port <port> use this port for UDP or TCP connection
-p, --priority <prio> mark given message with this priority
-s, --stderr output message to standard error as well
-t, --tag <tag> mark every line with this tag
-u, --socket <socket> write to this Unix socket
-V, --version output version information and exit
: exit status 1
machineconfiguration.openshift.io/ssh : accessed
machineconfiguration.openshift.io/state : Degraded
volumes.kubernetes.io/controller-managed-attach-detach : true
The branching to 4.1 happened and it didn't pick https://github.com/openshift/machine-config-operator/pull/734 Backported. https://github.com/openshift/machine-config-operator/pull/754 Backport merged. Next nightly should have that. This nightly payload now contains the backport to be tested by QE https://openshift-release.svc.ci.openshift.org/releasestream/4.1.0-0.nightly/release/4.1.0-0.nightly-2019-05-15-151517 Checked with 4.1.0-0.nightly-2019-05-15-151517 and rhel worker work well for machineconfig updating, so move to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |
Description of problem: When the rhel worker is adding to the cluster and the cluster is applying new rendered machineconfig at the same time, machineconfig daemon got error "Marking Degraded due to: failed to log pending config: exit status 1". Version-Release number of selected component (if applicable): oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.1.0-0.nightly-2019-05-09-182710 True False 4h50m Cluster version is 4.1.0-0.nightly-2019-05-09-182710 How reproducible: Always Steps to Reproduce: 1. Make new kubelet machineconfig for workers 2. Add new rhel worker to the cluster during the cluster is applying new machineconfig 3. Actual results: # oc get nodes -o template --template='{{range .items}}{{"===> node:> "}}{{.metadata.name}}{{"\n"}}{{range $k, $v := .metadata.annotations}}{{println $k ":" $v}}{{end}}{{"\n"}}{{end}}' ... ===> node:> dell-r730-068.dsal.lab.eng.rdu2.redhat.com machineconfiguration.openshift.io/currentConfig : rendered-worker-9425bc18523ebb54a929a7db9813a343 machineconfiguration.openshift.io/desiredConfig : rendered-worker-02875877c08cddc36cabb21b7788801c machineconfiguration.openshift.io/reason : failed to log pending config: exit status 1 machineconfiguration.openshift.io/ssh : accessed machineconfiguration.openshift.io/state : Degraded volumes.kubernetes.io/controller-managed-attach-detach : true # oc -n openshift-machine-config-operator logs -f machine-config-daemon-vbs9h I0510 06:52:22.794488 23426 update.go:170] Checking reconcilable for config rendered-worker-9425bc18523ebb54a929a7db9813a343 to rendered-worker-02875877c08cddc36cabb21b7788801c I0510 06:52:22.795668 23426 update.go:743] Starting update from rendered-worker-9425bc18523ebb54a929a7db9813a343 to rendered-worker-02875877c08cddc36cabb21b7788801c I0510 06:52:22.796636 23426 update.go:372] Updating files I0510 06:52:22.796648 23426 update.go:574] Writing file "/etc/tmpfiles.d/cleanup-cni.conf" I0510 06:52:23.334889 23426 update.go:574] Writing file "/etc/systemd/system.conf.d/kubelet-cgroups.conf" I0510 06:52:23.935607 23426 update.go:574] Writing file "/var/lib/kubelet/config.json" I0510 06:52:24.569906 23426 update.go:574] Writing file "/etc/kubernetes/ca.crt" I0510 06:52:24.978786 23426 update.go:574] Writing file "/etc/sysctl.d/forward.conf" I0510 06:52:25.377527 23426 update.go:574] Writing file "/etc/kubernetes/kubelet-plugins/volume/exec/.dummy" I0510 06:52:25.785718 23426 update.go:574] Writing file "/etc/containers/registries.conf" I0510 06:52:26.421934 23426 update.go:574] Writing file "/etc/containers/storage.conf" I0510 06:52:26.827820 23426 update.go:574] Writing file "/etc/crio/crio.conf" I0510 06:52:27.397023 23426 update.go:574] Writing file "/etc/kubernetes/cloud.conf" I0510 06:52:27.866888 23426 update.go:574] Writing file "/etc/kubernetes/kubelet.conf" I0510 06:52:28.446934 23426 update.go:574] Writing file "/etc/kubernetes/kubelet.conf" I0510 06:52:28.832434 23426 update.go:529] Writing systemd unit "kubelet.service" I0510 06:52:29.278484 23426 update.go:546] Enabling systemd unit "kubelet.service" I0510 06:52:29.278529 23426 update.go:466] /etc/systemd/system/multi-user.target.wants/kubelet.service already exists. Not making a new symlink I0510 06:52:29.278541 23426 update.go:391] Deleting stale data I0510 06:52:29.278558 23426 update.go:636] Writing SSHKeys at "/home/core/.ssh/authorized_keys" I0510 06:52:29.689316 23426 update.go:636] Writing SSHKeys at "/home/core/.ssh/authorized_keys" I0510 06:52:30.295779 23426 update.go:372] Updating files I0510 06:52:30.295804 23426 update.go:574] Writing file "/etc/tmpfiles.d/cleanup-cni.conf" I0510 06:52:30.702501 23426 update.go:574] Writing file "/etc/systemd/system.conf.d/kubelet-cgroups.conf" I0510 06:52:31.253447 23426 update.go:574] Writing file "/var/lib/kubelet/config.json" I0510 06:52:31.847678 23426 update.go:574] Writing file "/etc/kubernetes/ca.crt" I0510 06:52:32.275698 23426 update.go:574] Writing file "/etc/sysctl.d/forward.conf" I0510 06:52:32.885524 23426 update.go:574] Writing file "/etc/kubernetes/kubelet-plugins/volume/exec/.dummy" I0510 06:52:33.289385 23426 update.go:574] Writing file "/etc/containers/registries.conf" I0510 06:52:33.930205 23426 update.go:574] Writing file "/etc/containers/storage.conf" I0510 06:52:34.496840 23426 update.go:574] Writing file "/etc/crio/crio.conf" I0510 06:52:34.913196 23426 update.go:574] Writing file "/etc/kubernetes/cloud.conf" I0510 06:52:35.118717 23426 update.go:574] Writing file "/etc/kubernetes/kubelet.conf" I0510 06:52:35.611156 23426 update.go:529] Writing systemd unit "kubelet.service" I0510 06:52:36.004440 23426 update.go:546] Enabling systemd unit "kubelet.service" I0510 06:52:36.004469 23426 update.go:466] /etc/systemd/system/multi-user.target.wants/kubelet.service already exists. Not making a new symlink I0510 06:52:36.004475 23426 update.go:391] Deleting stale data E0510 06:52:36.004497 23426 writer.go:132] Marking Degraded due to: failed to log pending config: exit status 1 Expected results: Should work well Additional info: