Bug 2091524

Summary: Pod failed to start with message "set CPU load balancing: readdirent /proc/sys/kernel/sched_domain/cpu66/domain0: no such file or directory"
Product: OpenShift Container Platform Reporter: Shereen Haj Makhoul <shajmakh>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: akamra, akurikal, aos-bugs, cgaynor, dmoessne, dosman, eparis, fromani, harpatil, imiller, keyoung, kir, minmli, mzheng, nagrawal, openshift-bugs-escalate, pehunt, schoudha, shaising, shajmakh, tsweeney
Version: 4.8Keywords: Reopened
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1977100 Environment:
Last Closed: 2022-06-07 13:24:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1977100, 2102803    
Bug Blocks:    

Comment 5 Sunil Choudhary 2022-06-06 11:17:16 UTC
Verified on 4.10.17

% oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.17   True        False         162m    Cluster version is 4.10.17

Comment 6 Sunil Choudhary 2022-06-06 15:03:33 UTC
sh-4.4# cat /etc/kubernetes/openshift-workload-pinning 
{
  "management": {
    "cpuset": "0,1"
  }
}

sh-4.4# cat /etc/crio/crio.conf.d/01-workload-partitioning 
[crio.runtime.workloads.management]
activation_annotation = "target.workload.openshift.io/management"
annotation_prefix = "resources.workload.openshift.io"
resources = { "cpushares" = 0, "cpuset" = "0-1,10-12" }

% oc describe pod cyclictest
Name:         cyclictest
Namespace:    default
Priority:     0
Node:         ip-10-0-77-100.us-east-2.compute.internal/10.0.77.100
Start Time:   Mon, 06 Jun 2022 20:29:06 +0530
Labels:       <none>
Annotations:  cpu-load-balancing.crio.io: disable
              k8s.ovn.org/pod-networks:
                {"default":{"ip_addresses":["10.128.0.54/23"],"mac_address":"0a:58:0a:80:00:36","gateway_ips":["10.128.0.1"],"ip_address":"10.128.0.54/23"...
              k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "ovn-kubernetes",
                    "interface": "eth0",
                    "ips": [
                        "10.128.0.54"
                    ],
                    "mac": "0a:58:0a:80:00:36",
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "ovn-kubernetes",
                    "interface": "eth0",
                    "ips": [
                        "10.128.0.54"
                    ],
                    "mac": "0a:58:0a:80:00:36",
                    "default": true,
                    "dns": {}
                }]
Status:       Running
IP:           10.128.0.54
IPs:
  IP:  10.128.0.54
Containers:
  cyclic:
    Container ID:   cri-o://870f47fa9936f6d7df0db78ed9472d5ebda236f0189a6cf6649e3e0fe600d19a
    Image:          quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e0c134d4266596cde4ff501098aab2238bb7da8452a00bb4116eaf00f7bb479c
    Image ID:       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e0c134d4266596cde4ff501098aab2238bb7da8452a00bb4116eaf00f7bb479c
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 06 Jun 2022 20:29:09 +0530
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5r6kq (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-5r6kq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Warning  ErrorAddingLogicalPort  14s   controlplane       failed to ensurePod default/cyclictest since it is not yet scheduled
  Normal   Scheduled               14s   default-scheduler  Successfully assigned default/cyclictest to ip-10-0-77-100.us-east-2.compute.internal
  Normal   AddedInterface          12s   multus             Add eth0 [10.128.0.54/23] from ovn-kubernetes
  Normal   Pulled                  12s   kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e0c134d4266596cde4ff501098aab2238bb7da8452a00bb4116eaf00f7bb479c" already present on machine
  Normal   Created                 11s   kubelet            Created container cyclic
  Normal   Started                 11s   kubelet            Started container cyclic

Comment 8 errata-xmlrpc 2022-06-07 13:24:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.17 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4882

Comment 9 Peter Hunt 2022-06-30 17:24:51 UTC
*** Bug 2102803 has been marked as a duplicate of this bug. ***