Bug 1985739 - Tuned affining containers to house keeping cpus
Summary: Tuned affining containers to house keeping cpus
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Performance Addon Operator
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Yanir Quinn
QA Contact: Niranjan Mallapadi Raghavender
URL:
Whiteboard:
Depends On:
Blocks: 1998120
TreeView+ depends on / blocked
 
Reported: 2021-07-25 11:50 UTC by Yanir Quinn
Modified: 2022-08-26 14:51 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Tuned scheduler plugin did not recognize container processes as exceptions being managed by OCP's cpu manager. Consequence: Tuned modified the pinning of containers on restart and placed them all on the house keeping (reserved) cpus. This broke pinning guarantees and overloaded the reserved cpus. Fix: A new tuned feature #1980715 allows defining exceptions using cgroup name and PAO configures tuned to ignore container pinning. Result: Container pinning is now managed by OCP's cpu manager only.
Clone Of:
: 1998120 (view as bug list)
Environment:
Last Closed: 2022-08-26 14:51:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 258 0 None None None 2021-08-04 10:04:50 UTC

Comment 5 Niranjan Mallapadi Raghavender 2021-08-26 12:50:10 UTC
[root@bkr-hv03 performance]# /root/img-nvr.sh 
NVR=v4.9.0-19
[root@bkr-hv03 performance]# oc version
Client Version: 4.7.21
Server Version: 4.9.0-0.nightly-2021-08-23-224104
Kubernetes Version: v1.22.0-rc.0+5c2f7cd

Deployed PAO with below profile:
spec:
  cpu:
    isolated: 5-19,45-59,20-39,60-79
    reserved: 0-4,40-44
  hugepages:
    defaultHugepagesSize: 1G
    pages:
    - count: 20
      size: 2M
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: true



[mniranja@mniranja test1 (master *%)]$ oc get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE     IP            NODE                                      NOMINATED NODE   READINESS GATES
cluster-node-tuning-operator-ffbc7b65-gv9lm   1/1     Running   0          5h23m   10.129.0.23   hlxcl6-master-0.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-2rxkq                                   1/1     Running   1          2d      10.46.56.67   hlxcl6-master-2.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-9fvrh                                   1/1     Running   5          2d      10.46.56.2    helix02.lab.eng.tlv2.redhat.com           <none>           <none>
tuned-rxnnq                                   1/1     Running   1          2d      10.46.56.65   hlxcl6-master-0.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-t7bdb                                   1/1     Running   1          2d      10.46.56.66   hlxcl6-master-1.lab.eng.tlv2.redhat.com   <none>           <none>
tuned-xdzpj                                   1/1     Running   4          2d      10.46.56.3    helix03.lab.eng.tlv2.redhat.com           <none>           <none>
[mniranja@mniranja test1 (master *%)]$ oc rsh tuned-9fvrh


sh-4.4# grep Cpus_allowed_list /proc/`pidof chronyd`/status
Cpus_allowed_list:      0-4,40-44
sh-4.4# grep ^Cpus_allowed_list /proc/`pgrep openshift-tuned`/status
Cpus_allowed_list:      0-79
sh-4.4#   grep  /proc/`pidof chronyd`/cgroup
^C
sh-4.4#  grep . /proc/`pidof chronyd`/cgroup 
12:cpuset:/
11:rdma:/
10:blkio:/system.slice/chronyd.service
9:net_cls,net_prio:/
8:cpu,cpuacct:/system.slice/chronyd.service
7:hugetlb:/
6:freezer:/
5:perf_event:/
4:memory:/system.slice/chronyd.service
3:devices:/system.slice/chronyd.service
2:pids:/system.slice/chronyd.service
1:name=systemd:/system.slice/chronyd.service


Note You need to log in before you can comment on or make changes to this bug.