Bug 1994172 - rhel node does not join cluster conmon validation: invalid conmon path
Summary: rhel node does not join cluster conmon validation: invalid conmon path
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.9.0
Assignee: Peter Hunt
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-17 00:43 UTC by Dan Seals
Modified: 2021-10-18 17:47 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:46:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12342 0 None None None 2021-08-17 13:43:53 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:47:00 UTC

Description Dan Seals 2021-08-17 00:43:46 UTC
Description of problem:
Adding a RHEL node to the cluster fails.


Version-Release number of selected component (if applicable):
OCP 4.8.2
VMware baremetal



How reproducible:
Followed https://docs.openshift.com/container-platform/4.8/post_installation_configuration/node-tasks.html#post-install-config-adding-rhel-compute

The scaleup playbook will fail with:
ASK [openshift_node : Restart the CRI-O service] ***************************************************************************************************************************************************
Monday 16 August 2021  18:14:56 -0600 (0:00:01.224)       0:12:12.914 ********* 
fatal: [rhel-worker.ocp.home]: FAILED! => {"changed": false, "msg": "Unable to start service crio: Job for crio.service failed because the control process exited with error code. See \"systemctl status crio.service\" and \"journalctl -xe\" for details.\n"}



systemctl -l status crio.service
 crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2021-08-16 18:14:57 MDT; 1min 16s ago
     Docs: https://github.com/cri-o/cri-o
  Process: 8517 ExecStart=/usr/bin/crio $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 8517 (code=exited, status=1/FAILURE)

Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.056223929-06:00" level=info msg="Node configuration value for memoryswap cgroup is true"
Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.065678261-06:00" level=info msg="Node configuration value for systemd CollectMode is true"
Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.070473962-06:00" level=error msg="Node configuration validation for systemd AllowedCPUs failed: check systemd AllowedCPUs: exit status 1"
Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.070514950-06:00" level=info msg="Node configuration value for systemd AllowedCPUs is false"
Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.072761208-06:00" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL"
Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.073307678-06:00" level=fatal msg="Validating runtime config: conmon validation: invalid conmon path: stat /usr/libexec/crio/conmon: no such file or directory"
Aug 16 18:14:57 rhel-worker systemd[1]: crio.service: main process exited, code=exited, status=1/FAILURE
Aug 16 18:14:57 rhel-worker systemd[1]: Failed to start Open Container Initiative Daemon.
Aug 16 18:14:57 rhel-worker systemd[1]: Unit crio.service entered failed state.
Aug 16 18:14:57 rhel-worker systemd[1]: crio.service failed.


The usr/libexec/crio directory does not exist.





Additional info:
When running the playbook using the openshift/openshift-ansible from github the node will join the cluster with out any problems.

Comment 8 Sunil Choudhary 2021-09-05 13:49:35 UTC
Checked on 4.9.0-0.nightly-2021-09-05-040736. Created a UPI cluster on vshpere and added a RHEL node.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-09-05-040736   True        False         63m     Cluster version is 4.9.0-0.nightly-2021-09-05-040736


$ oc get nodes -o wide
NAME              STATUS   ROLES    AGE   VERSION                INTERNAL-IP      EXTERNAL-IP      OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
compute-0         Ready    worker   73m   v1.22.0-rc.0+75ee307   172.31.248.32    172.31.248.32    Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa)   4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
compute-1         Ready    worker   73m   v1.22.0-rc.0+75ee307   172.31.248.89    172.31.248.89    Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa)   4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
control-plane-0   Ready    master   85m   v1.22.0-rc.0+75ee307   172.31.248.29    172.31.248.29    Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa)   4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
control-plane-1   Ready    master   85m   v1.22.0-rc.0+75ee307   172.31.248.83    172.31.248.83    Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa)   4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
control-plane-2   Ready    master   85m   v1.22.0-rc.0+75ee307   172.31.248.100   172.31.248.100   Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa)   4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8

$ oc get nodes -o wide
NAME                         STATUS   ROLES    AGE    VERSION                INTERNAL-IP      EXTERNAL-IP      OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
compute-0                    Ready    worker   89m    v1.22.0-rc.0+75ee307   172.31.248.32    172.31.248.32    Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa)   4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
compute-1                    Ready    worker   89m    v1.22.0-rc.0+75ee307   172.31.248.89    172.31.248.89    Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa)   4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
control-plane-0              Ready    master   101m   v1.22.0-rc.0+75ee307   172.31.248.29    172.31.248.29    Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa)   4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
control-plane-1              Ready    master   102m   v1.22.0-rc.0+75ee307   172.31.248.83    172.31.248.83    Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa)   4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
control-plane-2              Ready    master   102m   v1.22.0-rc.0+75ee307   172.31.248.100   172.31.248.100   Red Hat Enterprise Linux CoreOS 49.84.202109041651-0 (Ootpa)   4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
sunilc0509491-pvvrl-rhel-0   Ready    worker   5m6s   v1.22.0-rc.0+75ee307   172.31.249.18    172.31.249.18    Red Hat Enterprise Linux 8.4 (Ootpa)                           4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8
sunilc0509491-pvvrl-rhel-1   Ready    worker   5m6s   v1.22.0-rc.0+75ee307   172.31.249.154   172.31.249.154   Red Hat Enterprise Linux 8.4 (Ootpa)                           4.18.0-305.12.1.el8_4.x86_64   cri-o://1.22.0-68.rhaos4.9.git011c10a.el8

Comment 11 errata-xmlrpc 2021-10-18 17:46:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.