Bug 1794493 - Applying "ctrcfg" causes cri-o to fail to start on node reboot
Summary: Applying "ctrcfg" causes cri-o to fail to start on node reboot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.4.0
Assignee: Urvashi Mohnani
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks: 1794495
TreeView+ depends on / blocked
 
Reported: 2020-01-23 17:36 UTC by Urvashi Mohnani
Modified: 2020-05-04 11:27 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Applying a MachineConfig to a Machine Config Pool would sometimes cause cri-o to fail to start after the node rebooted. Consequence: Pods would be non-functional. Fix: Result: Applying a MachineConfig will work properly on node reboot.
Clone Of:
: 1794495 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:26:35 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1405 None closed Bug 1794493: Update crio.yaml templates with correct values 2020-05-18 14:39:40 UTC
Github openshift machine-config-operator pull 1414 None closed Bug 1794493: add ctrcfg e2e test 2020-05-18 14:39:40 UTC
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-04 11:27:08 UTC

Description Urvashi Mohnani 2020-01-23 17:36:15 UTC
Description of problem:

When applying a ctrcfg to an mcp, the node goes into "NotReady" state. This happens because the crio.conf generated after the crd application populates the config fields to it's empty value (0 for int, "" for string etc).


Version-Release number of selected component (if applicable): 
OCP 4.4
CRI-O 1.17 and 1.16


How reproducible: 100% of the time


Steps to Reproduce:

1. Create a "Ctrcfg"
2. Wait for it to roll out onto the nodes

Actual results:
Node goes into "NotReady" state

Expected results:
Roll out should be successful and node should be in "Ready" state.


Additional info: 
This change was introduced by https://github.com/openshift/machine-config-operator/commit/69025e8e8c82ed6d188eb0e409e8148da09ac3b2, we are working on reverting this and adding e2e tests for it.

Comment 3 Peter Ruan 2020-02-11 22:45:38 UTC
verified with 4.4.0-0.nightly-2020-02-10-165717

1. oc edit machineconfigpool worker 
      labels:                                                                       
        custom-crio: high-pid-limit       <-- add this line                                          
        machineconfiguration.openshift.io/mco-built-in: ""        
2. create a config yaml

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: set-log-and-pid
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-crio: high-pid-limit      ### <-- this must match the label created in step #1
  containerRuntimeConfig:
    pidsLimit: 2048
    logLevel: debug

3. oc create -f config.yaml
4. wait until all worker nodes come up
  pruan@fedora-vm ~/workspace/testcases/1794493 $ oc get nodes
NAME                                         STATUS                     ROLES    AGE   VERSION
ip-10-0-137-163.us-west-1.compute.internal   Ready                      worker   25h   v1.17.1
ip-10-0-138-58.us-west-1.compute.internal    Ready                      master   25h   v1.17.1
ip-10-0-139-15.us-west-1.compute.internal    Ready                      worker   25h   v1.17.1
ip-10-0-143-23.us-west-1.compute.internal    Ready                      master   25h   v1.17.1
ip-10-0-150-105.us-west-1.compute.internal   Ready,SchedulingDisabled   worker   25h   v1.17.1
ip-10-0-159-232.us-west-1.compute.internal   Ready                      master   25h   v1.17.1

5. verify the limits defined in the config.yaml is applied to the nodes.   
pruan@fedora-vm ~/workspace/testcases/1794493 $ oc debug node/ip-10-0-137-163.us-west-1.compute.internal
Starting pod/ip-10-0-137-163us-west-1computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.137.163
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# cat /etc/crio/crio.conf | grep limit
    pids_limit = 2048

Comment 5 errata-xmlrpc 2020-05-04 11:26:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.