Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2021256

Summary:	Backport to 4.8: rhel node does not join cluster conmon validation: invalid conmon path
Product:	OpenShift Container Platform	Reporter:	Dan Seals <dseals>
Component:	Node	Assignee:	Peter Hunt <pehunt>
Node sub component:	CRI-O	QA Contact:	Sunil Choudhary <schoudha>
Status:	CLOSED DUPLICATE	Docs Contact:
Severity:	unspecified
Priority:	unspecified	CC:	aos-bugs
Version:	4.8
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-11-08 17:05:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dan Seals 2021-11-08 16:52:35 UTC

Description of problem:
Adding a RHEL node to the cluster fails.


Version-Release number of selected component (if applicable):
OCP 4.8.2
VMware baremetal



How reproducible:
Followed https://docs.openshift.com/container-platform/4.8/post_installation_configuration/node-tasks.html#post-install-config-adding-rhel-compute

The scaleup playbook will fail with:
ASK [openshift_node : Restart the CRI-O service] ***************************************************************************************************************************************************
Monday 16 August 2021  18:14:56 -0600 (0:00:01.224)       0:12:12.914 ********* 
fatal: [rhel-worker.ocp.home]: FAILED! => {"changed": false, "msg": "Unable to start service crio: Job for crio.service failed because the control process exited with error code. See \"systemctl status crio.service\" and \"journalctl -xe\" for details.\n"}



systemctl -l status crio.service
 crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2021-08-16 18:14:57 MDT; 1min 16s ago
     Docs: https://github.com/cri-o/cri-o
  Process: 8517 ExecStart=/usr/bin/crio $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 8517 (code=exited, status=1/FAILURE)

Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.056223929-06:00" level=info msg="Node configuration value for memoryswap cgroup is true"
Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.065678261-06:00" level=info msg="Node configuration value for systemd CollectMode is true"
Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.070473962-06:00" level=error msg="Node configuration validation for systemd AllowedCPUs failed: check systemd AllowedCPUs: exit status 1"
Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.070514950-06:00" level=info msg="Node configuration value for systemd AllowedCPUs is false"
Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.072761208-06:00" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL"
Aug 16 18:14:57 rhel-worker crio[8517]: time="2021-08-16 18:14:57.073307678-06:00" level=fatal msg="Validating runtime config: conmon validation: invalid conmon path: stat /usr/libexec/crio/conmon: no such file or directory"
Aug 16 18:14:57 rhel-worker systemd[1]: crio.service: main process exited, code=exited, status=1/FAILURE
Aug 16 18:14:57 rhel-worker systemd[1]: Failed to start Open Container Initiative Daemon.
Aug 16 18:14:57 rhel-worker systemd[1]: Unit crio.service entered failed state.
Aug 16 18:14:57 rhel-worker systemd[1]: crio.service failed.


The usr/libexec/crio directory does not exist.





Additional info:
When running the playbook using the openshift/openshift-ansible from github the node will join the cluster with out any problems.


Bz 1994172 was closed to errata 4.9 https://access.redhat.com/errata/RHSA-2021:3759.
Opened this BZ for request to backport to 4.8

Comment 1 Peter Hunt 2021-11-08 17:05:41 UTC

this has been fixed in 4.8 already, seems to be fixed in 4.8.9. Can we use that instead?

*** This bug has been marked as a duplicate of bug 1993385 ***