Description of problem: Adding a RHEL node to the cluster fails. Version-Release number of selected component (if applicable): OCP 4.8.2 VMware baremetal How reproducible: Followed https://docs.openshift.com/container-platform/4.8/post_installation_configuration/node-tasks.html#post-install-config-adding-rhel-compute The scaleup playbook will fail with: ASK [openshift_node : Restart the CRI-O service] *************************************************************************************************************************************************** Monday 16 August 2021 18:14:56 -0600 (0:00:01.224) 0:12:12.914 ********* fatal: [rhel-worker.ocp.home]: FAILED! => {"changed": false, "msg": "Unable to start service crio: Job for crio.service failed because the control process exited with error code. See \"systemctl status crio.service\" and \"journalctl -xe\" for details.\n"} systemctl -l status crio.service crio.service - Open Container Initiative Daemon Loaded: loaded (/usr/lib/systemd/system/crio.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2021-08-16 18:14:57 MDT; 1min 16s ago Docs: https://github.com/cri-o/cri-o Process: 8517 ExecStart=/usr/bin/crio $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS (code=exited, status=1/FAILURE) Main PID: 8517 (code=exited, status=1/FAILURE) Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.056223929-06:00" level=info msg="Node configuration value for memoryswap cgroup is true" Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.065678261-06:00" level=info msg="Node configuration value for systemd CollectMode is true" Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.070473962-06:00" level=error msg="Node configuration validation for systemd AllowedCPUs failed: check systemd AllowedCPUs: exit status 1" Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.070514950-06:00" level=info msg="Node configuration value for systemd AllowedCPUs is false" Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.072761208-06:00" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL" Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.073307678-06:00" level=fatal msg="Validating runtime config: conmon validation: invalid conmon path: stat /usr/libexec/crio/conmon: no such file or directory" Aug 16 18:14:57 rhel-worker systemd[1]: crio.service: main process exited, code=exited, status=1/FAILURE Aug 16 18:14:57 rhel-worker systemd[1]: Failed to start Open Container Initiative Daemon. Aug 16 18:14:57 rhel-worker systemd[1]: Unit crio.service entered failed state. Aug 16 18:14:57 rhel-worker systemd[1]: crio.service failed. The usr/libexec/crio directory does not exist. Additional info: When running the playbook using the openshift/openshift-ansible from github the node will join the cluster with out any problems.
Checked on 4.9.0-0.nightly-2021-09-05-040736. Created a UPI cluster on vshpere and added a RHEL node. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-05-040736 True False 63m Cluster version is 4.9.0-0.nightly-2021-09-05-040736 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME compute-0 Ready worker 73m v1.22.0-rc.0+75ee307 172.31.248.32 172.31.248.32 Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 compute-1 Ready worker 73m v1.22.0-rc.0+75ee307 172.31.248.89 172.31.248.89 Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 control-plane-0 Ready master 85m v1.22.0-rc.0+75ee307 172.31.248.29 172.31.248.29 Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 control-plane-1 Ready master 85m v1.22.0-rc.0+75ee307 172.31.248.83 172.31.248.83 Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 control-plane-2 Ready master 85m v1.22.0-rc.0+75ee307 172.31.248.100 172.31.248.100 Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME compute-0 Ready worker 89m v1.22.0-rc.0+75ee307 172.31.248.32 172.31.248.32 Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 compute-1 Ready worker 89m v1.22.0-rc.0+75ee307 172.31.248.89 172.31.248.89 Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 control-plane-0 Ready master 101m v1.22.0-rc.0+75ee307 172.31.248.29 172.31.248.29 Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 control-plane-1 Ready master 102m v1.22.0-rc.0+75ee307 172.31.248.83 172.31.248.83 Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 control-plane-2 Ready master 102m v1.22.0-rc.0+75ee307 172.31.248.100 172.31.248.100 Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 sunilc0509491-pvvrl-rhel-0 Ready worker 5m6s v1.22.0-rc.0+75ee307 172.31.249.18 172.31.249.18 Red Hat Enterprise Linux 8.4 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8 sunilc0509491-pvvrl-rhel-1 Ready worker 5m6s v1.22.0-rc.0+75ee307 172.31.249.154 172.31.249.154 Red Hat Enterprise Linux 8.4 (Ootpa) 4.18.0-305.12.1.el8_4.x86_64 cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759