Bug 1794495
| Summary: | Applying "ctrcfg" causes cri-o to fail to start on node reboot | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Urvashi Mohnani <umohnani> |
| Component: | Node | Assignee: | Urvashi Mohnani <umohnani> |
| Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.3.z | CC: | aos-bugs, jokerman, pruan, schoudha, weinliu, wking |
| Target Milestone: | --- | ||
| Target Release: | 4.3.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1794493 | Environment: | |
| Last Closed: | 2020-03-10 23:53:17 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1794493 | ||
| Bug Blocks: | |||
|
Description
Urvashi Mohnani
2020-01-23 17:38:07 UTC
Refrences on how to create a 'Ctrcfg' https://github.com/openshift/machine-config-operator/blob/master/docs/ContainerRuntimeConfigDesign.md https://github.com/openshift/machine-config-operator/blob/master/examples/containerruntimeconfig.crd.yaml verified with 4.3.0-0.nightly-2020-02-24-071304
$ oc version
Client Version: openshift-clients-4.3.0-201910250623-79-g5d15fd52
Server Version: 4.3.0-0.nightly-2020-02-24-071304
Kubernetes Version: v1.16.2
1. oc edit machineconfigpool worker
labels:
custom-crio: high-pid-limit <-- add this line
machineconfiguration.openshift.io/mco-built-in: ""
2. create a config yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
name: set-log-and-pid
spec:
machineConfigPoolSelector:
matchLabels:
custom-crio: high-pid-limit ### <-- this must match the label created in step #1
containerRuntimeConfig:
pidsLimit: 2048
logLevel: debug
3. oc create -f config.yaml
4. wait until all worker nodes come up
$ oc get no
NAME STATUS ROLES AGE VERSION
ip-10-0-133-169.us-east-2.compute.internal Ready worker 102m v1.16.2
ip-10-0-136-180.us-east-2.compute.internal Ready master 111m v1.16.2
ip-10-0-152-112.us-east-2.compute.internal Ready master 111m v1.16.2
ip-10-0-156-233.us-east-2.compute.internal Ready worker 102m v1.16.2
ip-10-0-172-75.us-east-2.compute.internal Ready master 111m v1.16.2
5. verify the limits defined in the config.yaml is applied to the nodes.
$ oc debug node/ip-10-0-156-233.us-east-2.compute.internal
Starting pod/ip-10-0-156-233us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
chroot /host
Pod IP: 10.0.156.233
If you don't see a command prompt, try pressing enter.
sh-4.2#
sh-4.2# chroot /host
sh-4.4# cat /etc/crio/crio.conf | grep limit
pids_limit = 2048
Recovery steps in case this issue is hit: 1) Delete the ctrcfg 2) Manually replace the `/etc/crio/crio.conf` on the node with a copy from a working node 3) Restart crio --> `systemctl restart crio` 4) Reboot node 5) Upgrade the cluster to pick up the newer version with the fix Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0676 |