Bug 2100894 - Possible to cause misconfiguration of container runtime soon after cluster creation
Summary: Possible to cause misconfiguration of container runtime soon after cluster cr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.z
Assignee: Qi Wang
QA Contact: Sunil Choudhary
URL:
Whiteboard:
: 2104160 (view as bug list)
Depends On: 2076355
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-24 15:21 UTC by Naveen Malik
Modified: 2022-08-01 11:36 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-01 11:35:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 3224 0 None open [release-4.10] Bug 2100894: fix MCNameSuffix with kcfg ctrrcfg 2022-07-05 17:53:27 UTC
Red Hat Product Errata RHSA-2022:5730 0 None None None 2022-08-01 11:36:02 UTC

Description Naveen Malik 2022-06-24 15:21:05 UTC
Description of problem:
It is possible to trigger duplication of a ContainerRuntimeConfiguration when multiple exist for a given set of nodes.  And the duplicated config is the first in the list.  When each config is managing the same configuration it effectively means what was the second config is now overridden.


Version-Release number of selected component (if applicable):
Tested and reproduced on 4.10.18 OSD clusters.
Observed on production customer OSD cluster version 4.10.6.


How reproducible:
About 25%.

Steps to Reproduce:
1. Create OSD cluster.  Setup IDP.  Note SRE used backplane for access and did not setup IDP.
2. Login to cluster as soon as possible.
3. Wait for at least one worker to have pids_limit = 4096, applied by custom-crio ContainerRuntimeConfiguration

oc -n default debug node/$(oc get nodes | grep worker | grep -v infra | awk '{print $1}' | head -n1) -- "chroot /host crio config | grep pids_limit"

4. Apply new ContainerRuntimeConfiguration to bump pids_limit to 65000

cat << EOF | oc create -f-
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: new-large-pidlimit
spec:
  containerRuntimeConfig:
    pidsLimit: 65000
  machineConfigPoolSelector:
    matchExpressions:
    - key: pools.operator.machineconfiguration.openshift.io/worker
      operator: Exists
EOF

5. Wait for at least one worker to have pids_limit = 65000, applied by new-large-pidlimit ContainerRuntimeConfiguration

oc -n default debug node/$(oc get nodes | grep worker | grep -v infra | awk '{print $1}' | head -n1) -- "chroot /host crio config | grep pids_limit"

6. Verify there are only 2 machineconfig for containerruntime

oc get machineconfig | grep containerruntime

7. Force CVO to reconcile things

oc -n openshift-cluster-version scale deployment cluster-version-operator --replicas=0
sleep 5
oc -n openshift-cluster-version scale deployment cluster-version-operator --replicas=1

8. Check machineconfig for containerruntime again.  If the problem is triggered (25% chance observed in testing) you'll now see a 3rd.  This 3rd one (with -2 post-fix) will be a duplicate of the original machineconfig created for "custom-crio".

oc get machineconfig | grep containerruntime

Actual results:
3 MachineConfig for containerruntime exist in this order:
1. custom-crio
2. new-large-pidlimit
3. custom-crio (duplicate)

Expected results:
2 MachineConfig for containerruntime exist in this order:
1. custom-crio
2. new-large-pidlimit

Additional info:
OSD creates a ContainerRuntimeConfiguration called "custom-crio" that sets pids_limit for workers to 4096.  We support customers creating a second ContainerRuntimeConfiguration to adjust that limit and other configurations.  Therefore the second customer ContainerRuntimeConfiguration is expected (and usually does) get rendered in MachineConfig.

Given this is reproduce while cluster is new while Nodes are being updated and CO's are progressing it's likely some timing issue.  And while this is happening the "master" nodes are being updated.  To reproduce more consistently CVO was scaled down then up to trigger reconcile which creates the 3rd rogue ContainerRuntimeConfiguration.

Must gather's will be provided in private comment.

Comment 2 Naveen Malik 2022-06-24 15:34:15 UTC
Note I tested my theory of a race condition at startup on 11 clusters (user error on the 12th!).  I did NOT reproduce the issue if all nodes were done progressing and all CO's were done progressing and none were degraded.  The test was the same other than conditions to wait.

Changes:
* after login, wait for all nodes to finish progressing and CO to be done progressing and none degraded
* after creating second ContainerRuntimeConfig wait for pids_limit to be updated on all nodes before scaling CVO

Comment 3 Naveen Malik 2022-06-24 18:50:17 UTC
Timeline on customer cluster that shows this is hard to be 100% certain on.  What I do see is the creation timestamp on resources in cluster. Further complicating this is additional changes were done on the cluster since this triggered, so the -1 machineconfig has been deleted.  What is of interest though is the age of 99-worker-generated-containerruntime-2, which is a duplicate of 99-worker-generated-containerruntime.  It was created 44 days after!

$ oc get machineconfig | grep containerruntime
99-worker-generated-containerruntime               e6ba00b885558712d660a3704c071490d999de6f   3.2.0             79d
99-worker-generated-containerruntime-2             e6ba00b885558712d660a3704c071490d999de6f   3.2.0             35d
99-worker-generated-containerruntime-3             e6ba00b885558712d660a3704c071490d999de6f   3.2.0             17d

Comment 6 Qi Wang 2022-07-05 17:07:04 UTC
*** Bug 2104160 has been marked as a duplicate of this bug. ***

Comment 9 Sunil Choudhary 2022-07-27 13:23:39 UTC
% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-07-26-232654   True        False         79m     Cluster version is 4.10.0-0.nightly-2022-07-26-232654

% oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-142-35.us-east-2.compute.internal    Ready    worker   88m   v1.23.5+012e945
ip-10-0-149-126.us-east-2.compute.internal   Ready    master   93m   v1.23.5+012e945
ip-10-0-168-61.us-east-2.compute.internal    Ready    master   93m   v1.23.5+012e945
ip-10-0-179-76.us-east-2.compute.internal    Ready    worker   88m   v1.23.5+012e945
ip-10-0-218-35.us-east-2.compute.internal    Ready    master   94m   v1.23.5+012e945
ip-10-0-219-184.us-east-2.compute.internal   Ready    worker   88m   v1.23.5+012e945

% oc debug node/ip-10-0-142-35.us-east-2.compute.internal                                                                                             
Starting pod/ip-10-0-142-35us-east-2computeinternal-debug ...
…

sh-4.4# crio config | grep pids_limit
INFO[2022-07-27 13:09:33.787028081Z] Starting CRI-O, version: 1.23.3-11.rhaos4.10.gitddf4b1a.1.el8, git: () 
INFO Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL 
pids_limit = 4096


% oc get containerruntimeconfig
NAME               AGE
new-max-pidlimit   6m35s
pidlimit           23m

% oc get mc                    
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             88m
00-worker                                          dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             88m
01-master-container-runtime                        dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             88m
01-master-kubelet                                  dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             88m
01-worker-container-runtime                        dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             88m
01-worker-kubelet                                  dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             88m
99-master-generated-crio-seccomp-use-default                                                  3.2.0             88m
99-master-generated-registries                     dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             88m
99-master-ssh                                                                                 3.2.0             90m
99-worker-generated-containerruntime               dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             23m
99-worker-generated-containerruntime-1             dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             6m40s
99-worker-generated-crio-seccomp-use-default                                                  3.2.0             88m
99-worker-generated-registries                     dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             88m
99-worker-ssh                                                                                 3.2.0             90m
rendered-master-1f5449d03a8fb49f0ff3d741eb363a4c   dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             88m
rendered-worker-d229647baf68ce03bce6557c7890110d   dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             23m
rendered-worker-d92fd0744b797e11843570f0b681e971   dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             88m
rendered-worker-efaf76f5ebf797d15ef5c6014919afed   dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             6m35s

% oc debug node/ip-10-0-142-35.us-east-2.compute.internal                                                                                             
Starting pod/ip-10-0-142-35us-east-2computeinternal-debug ...
…

sh-4.4# crio config | grep pids_limit
INFO[2022-07-27 13:17:32.805457991Z] Starting CRI-O, version: 1.23.3-11.rhaos4.10.gitddf4b1a.1.el8, git: () 
INFO Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL 
pids_limit = 65000


% oc get mc | grep -i containerruntime      
99-worker-generated-containerruntime               dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             29m
99-worker-generated-containerruntime-1             dc29945da95a65f460ad50ad1bbc10e1918a9c61   3.2.0             12m

Comment 11 errata-xmlrpc 2022-08-01 11:35:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.25 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5730


Note You need to log in before you can comment on or make changes to this bug.