Bug 2087159 - PPC does not generate new "offlined" parameter in Performance Profile
Summary: PPC does not generate new "offlined" parameter in Performance Profile
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Performance Addon Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.11.0
Assignee: Jose Luis
QA Contact: Gowrishankar Rajaiyan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-17 13:42 UTC by Jose Luis
Modified: 2022-08-10 12:17 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 12:16:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 354 0 None open Bug 2087159: Modify PPC to generate new "offlined" parameter in Performance Profile 2022-05-17 13:49:26 UTC
Red Hat Product Errata RHBA-2022:5869 0 None None None 2022-08-10 12:17:00 UTC

Description Jose Luis 2022-05-17 13:42:11 UTC
Description of problem:

A new "offlined" field has been added to Performance Profile
Performance Profile Creator should be able to generate Performance Profiles with this new field.


Version-Release number of selected component (if applicable): 4.11


How reproducible: every time


Steps to Reproduce:
1. execute Performance Profile Creator
2.
3.

Actual results: 
There is no way to generate a Performance Profile with  "offlined" field


Expected results:  
If the input parameters are correct Performance Profile Creator will create a Performance Profile with "offlined" field in it.

Additional info:

To calculate offlined cpus we take into account the following input parameters:

- offlined-cpu-count : the number of cpus the user wants to set offline.
- disable-ht: Marks if the user want to disable logical processor siblings to disable hyperthreading
- power-consumption-mode: values: “default”, “low-latency”, “ultra-low-latency”

if disable-ht is true sibling logical processors will not be considered in the calculation of any of the CPUSets.

Unless power-consumption-mode is equal to ultra-low-latency ( which is considered a high-consumption scenario), we will first try to look for a complete socket to set offline, that is a socket where all the logical processors are still not in any set.

If we still have not set offline enough logical processors to fulfill the offlined-cpu-count we will try to set offline as many logical processor siblings as possible

If we still have not set offline enough logical processors to fulfill the offlined-cpu-count we will try to set offline any logical processor that is not already in a set.

Comment 3 Niranjan Mallapadi Raghavender 2022-08-01 06:53:21 UTC
Version:

[root@registry kni]# oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-rc.6   True        False         106m    Cluster version is 4.11.0-rc.6
[root@registry kni]# 

Steps:
1. Label the worker nodes:
oc label --overwrite node/worker-0 node-role.kubernetes.io/worker-cnf=""
oc label --overwrite node/worker-1 node-role.kubernetes.io/worker-cnf=""

[root@registry kni]# oc get nodes
NAME       STATUS   ROLES               AGE    VERSION
master-0   Ready    master              148m   v1.24.0+9546431
master-1   Ready    master              148m   v1.24.0+9546431
master-2   Ready    master              148m   v1.24.0+9546431
worker-0   Ready    worker,worker-cnf   109m   v1.24.0+9546431
worker-1   Ready    worker,worker-cnf   116m   v1.24.0+9546431


2. Create mcp for worker-cnf nodes

[root@registry kni]# cat /root/worker_cnf_machine_config_pool.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-cnf
  labels:
    machineconfiguration.openshift.io/role: worker-cnf
spec:
  machineConfigSelector:
    matchExpressions:
      - {
          key: machineconfiguration.openshift.io/role,
          operator: In,
          values: [worker-cnf, worker],
        }
  paused: false
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-cnf: ''

oc apply -f  /root/worker_cnf_machine_config_pool.yaml 

[root@registry kni]# oc get mcp
NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master       rendered-master-b5328a678da9b8cae073f5a87080fe16       True      False      False      3              3                   3                     0                      147m
worker       rendered-worker-153bef82d12fd727c823bea6722ccbe6       True      False      False      0              0                   0                     0                      147m
worker-cnf   rendered-worker-cnf-5bb1710b379050d77b67931de29a2f89   True      False      False      2              2                   2                     0                      87m

3. oc project openshift-cluster-node-tuning-operator

4. check the tuned pods running on worker nodes

[root@registry kni]# oc get pods -o wide 
NAME                                            READY   STATUS    RESTARTS   AGE    IP            NODE       NOMINATED NODE   READINESS GATES
cluster-node-tuning-operator-587c56fc77-2vsk2   1/1     Running   0          161m   192.168.0.18   master-0   <none>           <none>
tuned-2w6ns                                     1/1     Running   1          112m   192.168.80.2    worker-0   <none>           <none>
tuned-cswnn                                     1/1     Running   2          119m   192.168.80.3    worker-1   <none>           <none>
tuned-wbb2b                                     1/1     Running   0          147m   192.168.80.21   master-1   <none>           <none>
tuned-wvwjc                                     1/1     Running   0          147m   192.168.80.20   master-0   <none>           <none>
tuned-xjrl9                                     1/1     Running   0          147m   192.168.80.22   master-2   <none>           <none>

5. Check the tuned pods for the nto image 

oc describe pods/tuned-2w6ns | less
Name:                 tuned-2w6ns
Namespace:            openshift-cluster-node-tuning-operator
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 worker-0/10.46.80.2
Start Time:           Mon, 01 Aug 2022 00:56:05 -0400
Labels:               controller-revision-hash=6b564cbb8d
                      openshift-app=tuned
                      pod-template-generation=1
Annotations:          openshift.io/scc: privileged
Status:               Running
IP:                   10.46.80.2
IPs:
  IP:           10.46.80.2
Controlled By:  DaemonSet/tuned
Containers:
  tuned:
    Container ID:  cri-o://6736846211a4126f1d983734c5aed90ea0a6bdae328c47c9dff89f9d10bbed22
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:14f6372a5297bafe582fc8e28e473585a3b2ab52893f36e1f4a412ceadddc506
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:14f6372a5297bafe582fc8e28e473585a3b2ab52893f36e1f4a412ceadddc506
    Port:          <none>
    Host Port:     <none>
    Command:


6. Create a Mustgather using Performance Addon operator image 

$ oc adm must-gather --dest-dir=/tmp/must-gather-tmp --image=registry.hlxcl6.lab.eng.tlv2.redhat.com:5000/openshift4-performance-addon-operator-must-gather-rhel8:v4.11.0-116

7. Run the podman command with appropriate pull secret having access to quay.io image and passing offlined parameter with number of cpus to offline. 


podman run --authfile /root/pull_secret.json --entrypoint performance-profile-creator -v /tmp/must-gather-tmp:/tmp/must-gather-tmp:z quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:14f6372a5297bafe582fc8e28e473585a3b2ab52893f36e1f4a412ceadddc506 --must-gather-dir-path /tmp/must-gather-tmp --rt-kernel=true --mcp-name worker-cnf --reserved-cpu-count=4 --power-consumption-mode low-latency  --offlined-cpu-count 6

level=info msg="Nodes targetted by worker-cnf MCP are: [worker-0 worker-1]"
level=info msg="NUMA cell(s): 2"
level=info msg="NUMA cell 0 : [0 40 1 41 2 42 3 43 4 44 5 45 6 46 7 47 8 48 9 49 10 50 11 51 12 52 13 53 14 54 15 55 16 56 17 57 18 58 19 59]"
level=info msg="NUMA cell 1 : [20 60 21 61 22 62 23 63 24 64 25 65 26 66 27 67 28 68 29 69 30 70 31 71 32 72 33 73 34 74 35 75 36 76 37 77 38 78 39 79]"
level=info msg="CPU(s): 80"
level=info msg="4 reserved CPUs allocated: 0-1,40-41 "
level=info msg="70 isolated CPUs allocated: 2-39,48-79"
level=info msg="Additional Kernel Args based on configuration: []"
---
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: performance
spec:
  cpu:
    isolated: 2-39,48-79
    offlined: 42-47
    reserved: 0-1,40-41
  machineConfigPoolSelector:
    machineconfiguration.openshift.io/role: worker-cnf
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numa:
    topologyPolicy: restricted
  realTimeKernel:
    enabled: true
  workloadHints:
    highPowerConsumption: false
    realTime: true

Comment 5 errata-xmlrpc 2022-08-10 12:16:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.11 low-latency extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5869


Note You need to log in before you can comment on or make changes to this bug.