Bug 1920368
Summary: | Fix containers creation issue resulting in runc running on Guaranteed Pod CPUs | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Marcel Apfelbaum <mapfelba> |
Component: | Node | Assignee: | Artyom <alukiano> |
Node sub component: | CPU manager | QA Contact: | Walid A. <wabouham> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | alukiano, aos-bugs, ddharwar, mifiedle, nagrawal, rpattath |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:56:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Marcel Apfelbaum
2021-01-26 08:10:24 UTC
Reproduction steps: 1. Use a kubelet config with: cpuManagerPolicy: static [...] reservedSystemCPUs: 0,1,... 2. Create a Guaranteed Pod (so some of the CPUs will be used) 3. Create a Pod (Guaranteed or not) 4. Verify the config.json of the new pod at create time (but before the container is started) It can be seen the cpuset is not set. In order to "catch" the config.json at the time the container is created one can use a cri-o wrapper: - change the runtime path in crio config [crio.runtime.runtimes.runc] runtime_path = "/usr/local/bin/runc-wrapper.sh" - Use a wrapper like if [ -n "$3" ] && [ "$3" == "create" ] && [ -f "$5/config.json" ]; then conf="$5/config.json" ... fi /bin/runc "$@" Tested and verified on OCP 4.7.0-0.nightly-2021-02-05-105159 on AWS cluster with m5.4xlarge instances (16 vCPUs) Followed reproduction steps in Comment 1. kubeletconfig: . . spec: kubeletConfig: cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s reservedSystemCPUs: 0,1,2,3,4 Edited the runtime path in crio config [crio.runtime.runtimes.runc] runtime_path = "/usr/local/bin/runc-wrapper.sh" systemctl restart crio cat /usr/local/bin/runc-wrapper.sh #!/bin/bash if [ -n "$3" ] && [ "$3" == "create" ] && [ -f "$5/config.json" ]; then conf="$5/config.json" cat $conf >> /root/create fi exec /bin/runc "$@" ------ Corndoned 2 of the 3 worker nodes to force the pods to be deployed on the cpu manager enabled worker node ip-10-0-128-206.us-east-2.compute.internal # oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-128-206.us-east-2.compute.internal Ready worker 6h25m v1.20.0+68292b2 ip-10-0-138-89.us-east-2.compute.internal Ready master 6h31m v1.20.0+68292b2 ip-10-0-187-126.us-east-2.compute.internal Ready master 6h30m v1.20.0+68292b2 ip-10-0-188-63.us-east-2.compute.internal Ready,SchedulingDisabled worker 6h24m v1.20.0+68292b2 ip-10-0-192-141.us-east-2.compute.internal Ready,SchedulingDisabled worker 6h24m v1.20.0+68292b2 ip-10-0-211-8.us-east-2.compute.internal Ready master 6h30m v1.20.0+68292b2 Deployed pod1 with 5 guaranteed CPUs: oc create -f pod1_5cpu.yaml # cat pod1_5cpu.yaml apiVersion: v1 kind: Pod metadata: name: pod1cpu5 annotations: spec: nodeSelector: containers: - name: appcntr1 image: zenghui/centos-dpdk imagePullPolicy: IfNotPresent command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 300000; done;" ] resources: requests: cpu: 5 memory: 100Mi limits: cpu: 5 memory: 100Mi After pod1cpu5 was running: # oc get pods NAME READY STATUS RESTARTS AGE ip-10-0-128-206us-east-2computeinternal-debug 1/1 Running 0 3m26s pod1cpu5 1/1 Running 0 20s Then deployed pod2cpu3: # cat pod2_3cpu.yaml apiVersion: v1 kind: Pod metadata: name: pod2cpu3 annotations: spec: nodeSelector: containers: - name: appcntr1 image: zenghui/centos-dpdk imagePullPolicy: IfNotPresent command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 300000; done;" ] resources: requests: cpu: 3 memory: 100Mi limits: cpu: 3 memory: 100Mi oc create -f pod3cpu3 # oc get pods NAME READY STATUS RESTARTS AGE ip-10-0-128-206us-east-2computeinternal-debug 1/1 Running 0 16m pod1cpu5 1/1 Running 0 13m pod2cpu3 1/1 Running 0 13m # oc debug node/ip-10-0-128-206.us-east-2.compute.internal # chroot /host sh-4.4# cat /var/lib/kubelet/cpu_manager_state {"policyName":"static","defaultCpuSet":"0-4,10-12","entries":{"50f732a3-956f-404f-9b94-c49fece9bd5e":{"appcntr1":"7,9,15"},"7e5ce205-8fc5-4b59-9eea-80cd020044bc":{"appcntr1":"5-6,8,13-14"}},"checksum":320863805}sh-4.4# When pod2 was deployed after pod1, only available CPUs were reserved, verifying fix cat /root/create | grep cpus: . . . "cpus": "0-4,10-12" "cpus": "0-4,10-12" "cpus": "5-6,8,13-14". <=== Pod1 has 5 guaranteed CPUs "cpus": "7,9,15". <=== Pod2 has 3 guaranteed CPUs Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |