Description of problem: SDN to OVN migration stucks on MCO for rhel worker Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2021-01-06-133533 Steps to Reproduce: 1. Setup IPI vsphere env with SDN network type, and scale up 2 rhel workers. 2. Follow below procedure to do migration. 1) oc annotate Network.operator.openshift.io cluster \ 'networkoperator.openshift.io/network-migration'="" 2) oc patch MachineConfigPool master --type='merge' --patch \ '{ "spec": { "paused": true } }' oc patch MachineConfigPool worker --type='merge' --patch \ '{ "spec":{ "paused" :true } }' 3) oc patch Network.config.openshift.io cluster \ --type='merge' --patch '{ "spec": { "networkType": "OVNKubernetes" } }' 4) Wait for all the multus pods recreated 5) Manually restart all the nodes. 6) Enable MCO 3. Actual results: MCO stucks on rhel worker. oc get nodes NAME STATUS ROLES AGE VERSION huirwang-vs0107-6dp7q-master-0 Ready master 4h22m v1.20.0+b1e9f0d huirwang-vs0107-6dp7q-master-1 Ready master 4h22m v1.20.0+b1e9f0d huirwang-vs0107-6dp7q-master-2 Ready master 4h22m v1.20.0+b1e9f0d huirwang-vs0107-6dp7q-rhel-0 Ready,SchedulingDisabled worker 3h9m v1.20.0+bdb9571 huirwang-vs0107-6dp7q-rhel-1 Ready worker 3h9m v1.20.0+bdb9571 huirwang-vs0107-6dp7q-worker-9zp8j Ready worker 4h11m v1.20.0+b1e9f0d huirwang-vs0107-6dp7q-worker-mvkmx Ready worker 4h12m v1.20.0+b1e9f0d oc logs machine-config-daemon-fbfxf -n openshift-machine-config-operator -c machine-config-daemon .......... I0107 06:58:41.963900 3031 update.go:588] Checking Reconcilable for config rendered-worker-64a9749c22336a21aa5a446e4e8a7623 to rendered-worker-22482dd0ebf8110d0be0c26d7dd9d295 I0107 06:58:41.998392 3031 update.go:1844] Starting update from rendered-worker-64a9749c22336a21aa5a446e4e8a7623 to rendered-worker-22482dd0ebf8110d0be0c26d7dd9d295: &{osUpdate:false kargs:false fips:false passwd:false files:false units:true kernelType:false extensions:false} I0107 06:58:42.015026 3031 update.go:1844] Update prepared; beginning drain E0107 06:58:42.257578 3031 daemon.go:342] WARNING: ignoring DaemonSet-managed Pods: openshift-cluster-node-tuning-operator/tuned-5lv8r, openshift-dns/dns-default-z2qsp, openshift-image-registry/node-ca-tq7xg, openshift-ingress-canary/ingress-canary-c9tdt, openshift-machine-config-operator/machine-config-daemon-fbfxf, openshift-monitoring/node-exporter-9m6zj, openshift-multus/multus-xv4wx, openshift-multus/network-metrics-daemon-zdnpc, openshift-network-diagnostics/network-check-target-b9rqr, openshift-ovn-kubernetes/ovnkube-node-72rvj, openshift-ovn-kubernetes/ovs-node-hb5nm I0107 06:58:42.257603 3031 update.go:1844] drain complete I0107 06:58:42.258614 3031 update.go:237] Successful drain took 0.242665656 seconds I0107 06:58:42.258633 3031 update.go:1207] Updating files I0107 06:58:42.272297 3031 update.go:1556] Writing file "/etc/NetworkManager/dispatcher.d/40-mdns-hostname" I0107 06:58:42.274270 3031 update.go:1556] Writing file "/etc/NetworkManager/conf.d/99-vsphere.conf" I0107 06:58:42.275623 3031 update.go:1556] Writing file "/etc/NetworkManager/dispatcher.d/30-resolv-prepender" I0107 06:58:42.280679 3031 update.go:1556] Writing file "/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt" I0107 06:58:42.281739 3031 update.go:1556] Writing file "/etc/tmpfiles.d/cleanup-cni.conf" I0107 06:58:42.286332 3031 update.go:1556] Writing file "/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem" I0107 06:58:42.291080 3031 update.go:1556] Writing file "/usr/local/bin/configure-ovs.sh" I0107 06:58:42.295256 3031 update.go:1556] Writing file "/etc/containers/storage.conf" I0107 06:58:42.298481 3031 update.go:1556] Writing file "/etc/kubernetes/static-pod-resources/coredns/Corefile.tmpl" I0107 06:58:42.303043 3031 update.go:1556] Writing file "/etc/kubernetes/manifests/coredns.yaml" I0107 06:58:42.307687 3031 update.go:1556] Writing file "/etc/systemd/system/crio.service.d/20-stream-address.conf" I0107 06:58:42.312552 3031 update.go:1556] Writing file "/etc/systemd/system.conf.d/10-default-env-godebug.conf" I0107 06:58:42.318389 3031 update.go:1556] Writing file "/etc/modules-load.d/iptables.conf" I0107 06:58:42.324349 3031 update.go:1556] Writing file "/etc/kubernetes/static-pod-resources/keepalived/keepalived.conf.tmpl" I0107 06:58:42.329728 3031 update.go:1556] Writing file "/etc/kubernetes/manifests/keepalived.yaml" I0107 06:58:42.335052 3031 update.go:1556] Writing file "/etc/tmpfiles.d/kni.conf" I0107 06:58:42.339619 3031 update.go:1556] Writing file "/etc/kubernetes/kubelet-ca.crt" I0107 06:58:42.344284 3031 update.go:1556] Writing file "/etc/systemd/system.conf.d/kubelet-cgroups.conf" I0107 06:58:42.348759 3031 update.go:1556] Writing file "/etc/systemd/system/kubelet.service.d/20-logging.conf" I0107 06:58:42.353795 3031 update.go:1556] Writing file "/etc/kubernetes/static-pod-resources/mdns/config.hcl.tmpl" I0107 06:58:42.359312 3031 update.go:1556] Writing file "/etc/kubernetes/manifests/mdns-publisher.yaml" I0107 06:58:42.364061 3031 update.go:1556] Writing file "/etc/NetworkManager/conf.d/sdn.conf" I0107 06:58:42.368670 3031 update.go:1556] Writing file "/var/lib/kubelet/config.json" I0107 06:58:42.373570 3031 update.go:1556] Writing file "/etc/kubernetes/ca.crt" I0107 06:58:42.378364 3031 update.go:1556] Writing file "/etc/ssh/sshd_config.d/10-disable-ssh-key-dir.conf" I0107 06:58:42.383725 3031 update.go:1556] Writing file "/etc/sysctl.d/forward.conf" I0107 06:58:42.388857 3031 update.go:1556] Writing file "/etc/sysctl.d/inotify.conf" I0107 06:58:42.393315 3031 update.go:1556] Writing file "/usr/local/sbin/set-valid-hostname.sh" I0107 06:58:42.395075 3031 update.go:1556] Writing file "/etc/kubernetes/kubelet-plugins/volume/exec/.dummy" I0107 06:58:42.399352 3031 update.go:1556] Writing file "/usr/local/bin/vsphere-hostname.sh" I0107 06:58:42.404033 3031 update.go:1556] Writing file "/etc/containers/registries.conf" I0107 06:58:42.405581 3031 update.go:1556] Writing file "/etc/crio/crio.conf.d/00-default" I0107 06:58:42.410622 3031 update.go:1556] Writing file "/etc/containers/policy.json" I0107 06:58:42.412186 3031 update.go:1556] Writing file "/etc/kubernetes/cloud.conf" I0107 06:58:42.418568 3031 update.go:1556] Writing file "/etc/kubernetes/kubelet.conf" I0107 06:58:42.424214 3031 update.go:1497] Writing systemd unit "etc-NetworkManager-system\\x2dconnections\\x2dmerged.mount" I0107 06:58:42.429481 3031 update.go:1462] Writing systemd unit dropin "10-mco-default-env.conf" I0107 06:58:42.524281 3031 update.go:1451] Preset systemd unit crio.service I0107 06:58:42.524302 3031 update.go:1462] Writing systemd unit dropin "mco-disabled.conf" I0107 06:58:42.531538 3031 update.go:1534] Could not reset unit preset for docker.socket, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory ) I0107 06:58:42.531558 3031 update.go:1462] Writing systemd unit dropin "10-mco-default-env.conf" I0107 06:58:42.536308 3031 update.go:1497] Writing systemd unit "kubelet.service" I0107 06:58:42.540880 3031 update.go:1497] Writing systemd unit "machine-config-daemon-firstboot.service" I0107 06:58:42.545540 3031 update.go:1497] Writing systemd unit "machine-config-daemon-pull.service" I0107 06:58:42.550628 3031 update.go:1497] Writing systemd unit "node-valid-hostname.service" I0107 06:58:42.556061 3031 update.go:1497] Writing systemd unit "nodeip-configuration.service" I0107 06:58:42.561882 3031 update.go:1497] Writing systemd unit "ovs-configuration.service" I0107 06:58:42.566399 3031 update.go:1462] Writing systemd unit dropin "10-ovs-vswitchd-restart.conf" I0107 06:58:42.662668 3031 update.go:1451] Preset systemd unit ovs-vswitchd.service I0107 06:58:42.662688 3031 update.go:1462] Writing systemd unit dropin "10-ovsdb-restart.conf" I0107 06:58:42.667465 3031 update.go:1462] Writing systemd unit dropin "10-mco-default-env.conf" I0107 06:58:42.675951 3031 update.go:1534] Could not reset unit preset for pivot.service, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory ) I0107 06:58:42.675969 3031 update.go:1497] Writing systemd unit "vsphere-hostname.service" I0107 06:58:42.681272 3031 update.go:1462] Writing systemd unit dropin "mco-disabled.conf" I0107 06:58:42.689236 3031 update.go:1534] Could not reset unit preset for zincati.service, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory ) I0107 06:58:42.772740 3031 update.go:1429] Enabled systemd units: [etc-NetworkManager-system\x2dconnections\x2dmerged.mount kubelet.service machine-config-daemon-firstboot.service machine-config-daemon-pull.service node-valid-hostname.service nodeip-configuration.service openvswitch.service ovs-configuration.service ovsdb-server.service vsphere-hostname.service] I0107 06:58:42.772766 3031 update.go:1280] Deleting stale data I0107 06:58:42.786883 3031 update.go:1675] Writing SSHKeys at "/home/core/.ssh/authorized_keys" I0107 06:58:42.829983 3031 update.go:1675] Writing SSHKeys at "/home/core/.ssh/authorized_keys" I0107 06:58:42.834252 3031 update.go:1207] Updating files I0107 06:58:42.848137 3031 update.go:1556] Writing file "/etc/NetworkManager/dispatcher.d/40-mdns-hostname" I0107 06:58:42.854039 3031 update.go:1556] Writing file "/etc/NetworkManager/conf.d/99-vsphere.conf" I0107 06:58:42.858559 3031 update.go:1556] Writing file "/etc/NetworkManager/dispatcher.d/30-resolv-prepender" I0107 06:58:42.863409 3031 update.go:1556] Writing file "/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt" I0107 06:58:42.867458 3031 update.go:1556] Writing file "/etc/tmpfiles.d/cleanup-cni.conf" I0107 06:58:42.872077 3031 update.go:1556] Writing file "/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem" I0107 06:58:42.876675 3031 update.go:1556] Writing file "/usr/local/bin/configure-ovs.sh" I0107 06:58:42.882405 3031 update.go:1556] Writing file "/etc/containers/storage.conf" I0107 06:58:42.887765 3031 update.go:1556] Writing file "/etc/kubernetes/static-pod-resources/coredns/Corefile.tmpl" I0107 06:58:42.892995 3031 update.go:1556] Writing file "/etc/kubernetes/manifests/coredns.yaml" I0107 06:58:42.897549 3031 update.go:1556] Writing file "/etc/systemd/system/crio.service.d/20-stream-address.conf" I0107 06:58:42.901967 3031 update.go:1556] Writing file "/etc/systemd/system.conf.d/10-default-env-godebug.conf" I0107 06:58:42.906497 3031 update.go:1556] Writing file "/etc/modules-load.d/iptables.conf" I0107 06:58:42.911101 3031 update.go:1556] Writing file "/etc/kubernetes/static-pod-resources/keepalived/keepalived.conf.tmpl" I0107 06:58:42.912756 3031 update.go:1556] Writing file "/etc/kubernetes/manifests/keepalived.yaml" I0107 06:58:42.917556 3031 update.go:1556] Writing file "/etc/tmpfiles.d/kni.conf" I0107 06:58:42.919097 3031 update.go:1556] Writing file "/etc/kubernetes/kubelet-ca.crt" I0107 06:58:42.924322 3031 update.go:1556] Writing file "/etc/systemd/system.conf.d/kubelet-cgroups.conf" I0107 06:58:42.929051 3031 update.go:1556] Writing file "/etc/systemd/system/kubelet.service.d/20-logging.conf" I0107 06:58:42.930678 3031 update.go:1556] Writing file "/etc/kubernetes/static-pod-resources/mdns/config.hcl.tmpl" I0107 06:58:42.936360 3031 update.go:1556] Writing file "/etc/kubernetes/manifests/mdns-publisher.yaml" I0107 06:58:42.942025 3031 update.go:1556] Writing file "/etc/NetworkManager/conf.d/sdn.conf" I0107 06:58:42.947964 3031 update.go:1556] Writing file "/var/lib/kubelet/config.json" I0107 06:58:42.953785 3031 update.go:1556] Writing file "/etc/kubernetes/ca.crt" I0107 06:58:42.955570 3031 update.go:1556] Writing file "/etc/ssh/sshd_config.d/10-disable-ssh-key-dir.conf" I0107 06:58:42.960360 3031 update.go:1556] Writing file "/etc/sysctl.d/forward.conf" I0107 06:58:42.965069 3031 update.go:1556] Writing file "/etc/sysctl.d/inotify.conf" I0107 06:58:42.969643 3031 update.go:1556] Writing file "/usr/local/sbin/set-valid-hostname.sh" I0107 06:58:42.975443 3031 update.go:1556] Writing file "/etc/kubernetes/kubelet-plugins/volume/exec/.dummy" I0107 06:58:42.980194 3031 update.go:1556] Writing file "/usr/local/bin/vsphere-hostname.sh" I0107 06:58:42.985379 3031 update.go:1556] Writing file "/etc/containers/registries.conf" I0107 06:58:42.990322 3031 update.go:1556] Writing file "/etc/crio/crio.conf.d/00-default" I0107 06:58:42.994921 3031 update.go:1556] Writing file "/etc/containers/policy.json" I0107 06:58:42.999591 3031 update.go:1556] Writing file "/etc/kubernetes/cloud.conf" I0107 06:58:43.004299 3031 update.go:1556] Writing file "/etc/kubernetes/kubelet.conf" I0107 06:58:43.009986 3031 update.go:1497] Writing systemd unit "etc-NetworkManager-system\\x2dconnections\\x2dmerged.mount" I0107 06:58:43.015281 3031 update.go:1462] Writing systemd unit dropin "10-mco-default-env.conf" I0107 06:58:43.095593 3031 update.go:1451] Preset systemd unit crio.service I0107 06:58:43.095615 3031 update.go:1462] Writing systemd unit dropin "mco-disabled.conf" I0107 06:58:43.100454 3031 update.go:1534] Could not reset unit preset for docker.socket, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory ) I0107 06:58:43.100471 3031 update.go:1462] Writing systemd unit dropin "10-mco-default-env.conf" I0107 06:58:43.105018 3031 update.go:1497] Writing systemd unit "kubelet.service" I0107 06:58:43.106440 3031 update.go:1497] Writing systemd unit "machine-config-daemon-firstboot.service" I0107 06:58:43.108059 3031 update.go:1497] Writing systemd unit "machine-config-daemon-pull.service" I0107 06:58:43.113291 3031 update.go:1497] Writing systemd unit "node-valid-hostname.service" I0107 06:58:43.118779 3031 update.go:1497] Writing systemd unit "nodeip-configuration.service" I0107 06:58:43.124258 3031 update.go:1497] Writing systemd unit "ovs-configuration.service" I0107 06:58:43.129414 3031 update.go:1462] Writing systemd unit dropin "10-ovs-vswitchd-restart.conf" I0107 06:58:43.221929 3031 update.go:1451] Preset systemd unit ovs-vswitchd.service I0107 06:58:43.221957 3031 update.go:1462] Writing systemd unit dropin "10-ovsdb-restart.conf" I0107 06:58:43.223851 3031 update.go:1462] Writing systemd unit dropin "10-mco-default-env.conf" I0107 06:58:43.231717 3031 update.go:1534] Could not reset unit preset for pivot.service, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory ) I0107 06:58:43.231737 3031 update.go:1497] Writing systemd unit "vsphere-hostname.service" I0107 06:58:43.236564 3031 update.go:1462] Writing systemd unit dropin "mco-disabled.conf" I0107 06:58:43.243636 3031 update.go:1534] Could not reset unit preset for zincati.service, skipping. (Error msg: error running preset on unit: Failed to execute operation: No such file or directory ) I0107 06:58:43.334973 3031 update.go:1429] Enabled systemd units: [etc-NetworkManager-system\x2dconnections\x2dmerged.mount kubelet.service machine-config-daemon-firstboot.service machine-config-daemon-pull.service node-valid-hostname.service nodeip-configuration.service openvswitch.service ovs-configuration.service ovsdb-server.service vsphere-hostname.service] I0107 06:58:43.334998 3031 update.go:1280] Deleting stale data E0107 06:58:43.335049 3031 writer.go:135] Marking Degraded due to: failed to log pending config: logger: unrecognized option '--journald' Usage: logger [options] [message] Options: -T, --tcp use TCP only -d, --udp use UDP only -i, --id log the process ID too -f, --file <file> log the contents of this file -h, --help display this help text and exit -S, --size <num> maximum size for a single message (default 1024) -n, --server <name> write to this remote syslog server -P, --port <port> use this port for UDP or TCP connection -p, --priority <prio> mark given message with this priority -s, --stderr output message to standard error as well -t, --tag <tag> mark every line with this tag -u, --socket <socket> write to this Unix socket -V, --version output version information and exit : exit status 1 Expected results: Shoud migrate to OVN successfully Additional info:
Flag `--journald` is not supported in RHEL 7, so it looks like an MCO use this flag without checking the running operating system. https://github.com/openshift/machine-config-operator/blob/0e40c961cdb200f886c06e88a7673c8dcd33b5be/pkg/daemon/update.go#L1829 Assign to MCO team.
Good catch Peng Liu! There is some sensing code in the package at https://github.com/openshift/machine-config-operator/blob/master/pkg/daemon/daemon.go#L243-L252 which might make sense to be turned into a function and used.
This doesn't quite make sense as it _would_ be using storePendingStateLegacyLogger() (see: https://github.com/openshift/machine-config-operator/blob/e70033b04d31f63661d59f92535ae460b6119bbe/pkg/daemon/update.go#L1816) which is for when journald not being present... which is 2 lines before what Peng Liu linked in the if statement: https://github.com/openshift/machine-config-operator/blob/0e40c961cdb200f886c06e88a7673c8dcd33b5be/pkg/daemon/update.go#L1826 To be clear: ``` if !dn.loggerSupportsJournal { return dn.storePendingStateLegacyLogger(pending, isPending) <--- this is what would be used if logger doesn't support journal } logger := exec.Command("logger", "--journald") <- this wouldn't be used if logger doesn't support journal ```
Will look a little further..
So the problem actually lies elsewhere in the VersionId check: https://github.com/openshift/machine-config-operator/blob/4a005cb1e426a4d844fad89bd88fef95e0ce0246/pkg/daemon/osrelease.go#L39 we are checking for "7" but $ rpmdev-extract redhat-release-7.5-0.14.el7.x86_64.rpm redhat-release-7.5-0.14.el7.x86_64/etc/issue redhat-release-7.5-0.14.el7.x86_64/etc/issue.net redhat-release-7.5-0.14.el7.x86_64/etc/pki/product-default <snip> $ egrep "VERSION_ID=" redhat-release-7.5-0.14.el7.x86_64/usr/lib/os-release VERSION_ID="7.5" will fix
Just to summarize the BZ: the VERSION_ID was 7.5 but our check IsLikeTraditionalRhel7() did not account for minor versions (only for "7" not for 7.x) and so returned false. This resulted in dn.loggerSupportsJournal being set to true (https://github.com/openshift/machine-config-operator/blob/8c7dab8a8e613e6a3a0533c6607c459d7fa480dd/pkg/daemon/daemon.go#L245) which is the relevant value when selecting which logger to use. So instead of using storePendingStateLegacyLogger() (for cases where journald isn't avail) we used normal logger resulting in the --journald flag erroneously being used for a rhel7 node.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633