Bug 1794495
Summary: | Applying "ctrcfg" causes cri-o to fail to start on node reboot | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Urvashi Mohnani <umohnani> |
Component: | Node | Assignee: | Urvashi Mohnani <umohnani> |
Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.3.z | CC: | aos-bugs, jokerman, pruan, schoudha, weinliu, wking |
Target Milestone: | --- | ||
Target Release: | 4.3.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | 1794493 | Environment: | |
Last Closed: | 2020-03-10 23:53:17 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1794493 | ||
Bug Blocks: |
Description
Urvashi Mohnani
2020-01-23 17:38:07 UTC
Refrences on how to create a 'Ctrcfg' https://github.com/openshift/machine-config-operator/blob/master/docs/ContainerRuntimeConfigDesign.md https://github.com/openshift/machine-config-operator/blob/master/examples/containerruntimeconfig.crd.yaml verified with 4.3.0-0.nightly-2020-02-24-071304 $ oc version Client Version: openshift-clients-4.3.0-201910250623-79-g5d15fd52 Server Version: 4.3.0-0.nightly-2020-02-24-071304 Kubernetes Version: v1.16.2 1. oc edit machineconfigpool worker labels: custom-crio: high-pid-limit <-- add this line machineconfiguration.openshift.io/mco-built-in: "" 2. create a config yaml apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: set-log-and-pid spec: machineConfigPoolSelector: matchLabels: custom-crio: high-pid-limit ### <-- this must match the label created in step #1 containerRuntimeConfig: pidsLimit: 2048 logLevel: debug 3. oc create -f config.yaml 4. wait until all worker nodes come up $ oc get no NAME STATUS ROLES AGE VERSION ip-10-0-133-169.us-east-2.compute.internal Ready worker 102m v1.16.2 ip-10-0-136-180.us-east-2.compute.internal Ready master 111m v1.16.2 ip-10-0-152-112.us-east-2.compute.internal Ready master 111m v1.16.2 ip-10-0-156-233.us-east-2.compute.internal Ready worker 102m v1.16.2 ip-10-0-172-75.us-east-2.compute.internal Ready master 111m v1.16.2 5. verify the limits defined in the config.yaml is applied to the nodes. $ oc debug node/ip-10-0-156-233.us-east-2.compute.internal Starting pod/ip-10-0-156-233us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` chroot /host Pod IP: 10.0.156.233 If you don't see a command prompt, try pressing enter. sh-4.2# sh-4.2# chroot /host sh-4.4# cat /etc/crio/crio.conf | grep limit pids_limit = 2048 Recovery steps in case this issue is hit: 1) Delete the ctrcfg 2) Manually replace the `/etc/crio/crio.conf` on the node with a copy from a working node 3) Restart crio --> `systemctl restart crio` 4) Reboot node 5) Upgrade the cluster to pick up the newer version with the fix Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0676 |