Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2225163

Summary: [TestOnly] Power savings - tuned profile and ovs improvements
Product: Red Hat OpenStack Reporter: Christophe Fontaine <cfontain>
Component: openvswitchAssignee: RHOSP:NFV_Eng <rhosp-nfv-int>
Status: CLOSED COMPLETED QA Contact: Sanjay Upadhyay <supadhya>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: apevec, cfontain, chrisw, ekuris, eshulman, gregraka, gurpsing, mariel, rhosp-nfv-int, rjarry, supadhya, vcandapp
Target Milestone: z3Keywords: TestOnly, Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
A power save profile, cpu-partitioning-powersave, has been introduced in Red Hat Enterprise Linux 9 (RHEL 9), and is now available in Red Hat OpenStack Platform (RHOSP) 17.1.3. This TuneD profile is the base building block to save power in RHOSP 17.1 NFV environments. For more information, see link:{defaultURL}/configuring_network_functions_virtualization/plan-ovs-dpdk-deploy_rhosp-nfv#save-power-ovsdpdk-deploy_plndpdk-nfv[Saving power in OVS-DPDK deployments] in _Configuring network functions virtualization_.
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-12-15 10:35:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2225164    
Bug Blocks: 2250831    

Description Christophe Fontaine 2023-07-24 12:54:30 UTC
A new tuned profile (cpu-partitioning-powersave) has been introduced in RHEL 9, and is the base building block to save power in NFV deployments.

We need to test the deployment and validate that it doesn't introduce any tangible performance regression for NFV usecases.

Comment 1 Gurpreet Singh 2023-07-30 18:31:49 UTC
*** Bug 2187562 has been marked as a duplicate of this bug. ***

Comment 8 Bracha Frenkel 2023-11-13 18:26:58 UTC Comment hidden (spam)
Comment 9 Bracha Frenkel 2023-11-15 07:52:04 UTC
moving to "on_dev" status, waiting to ovs-patch for fixing the kernel arg update in issue UEFI mode

Comment 11 Sanjay Upadhyay 2024-04-08 11:04:02 UTC
As discussed, I need some more details on how to confirm all parameters for cpu-partitioning-powersave are in place.
I have this patch (https://code.engineering.redhat.com/gerrit/c/nfv-qe/+/451366) for the moment, which I have already run a deployment.. although the tuned profile being used still is cpu-partitioning.
tagging @vcandapp

Comment 12 Vijayalakshmi Candappa 2024-04-08 11:31:58 UTC
supadhya, from the deployment perspective the following files will need update for setting this new profile:

1. baremetal_deployment.yaml, use the new tuned profile
Eg.

- name: ComputeOvsDpdkSriov
  count: 2
  instances:
    - hostname: compute-0
      name: compute-0
    - hostname: compute-1
      name: compute-1
  defaults:
    networks:
      - network: internal_api
        subnet: internal_api_subnet
      - network: tenant
        subnet: tenant_subnet
      - network: storage
        subnet: storage_subnet
    network_config:
      template: /home/stack/ospd-17.1-geneve-ovn-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/nic-configs/computeovsdpdksriov.yaml

  ansible_playbooks:
    - playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-kernelargs.yaml
      extra_vars:
        kernel_args: "default_hugepagesz=1GB hugepagesz=1G hugepages=64 iommu=pt intel_iommu=on tsx=off isolcpus=2-19,22-39"
        reboot_wait_timeout: 900
        tuned_profile: "cpu-partitioning-powersave"   ====> here
        tuned_isolated_cores: "2-19,22-39"

2. roles_data.yaml
- name: ComputeOvsDpdkSriov
  description: |
    Compute role with OvS-DPDK and SR-IOV services
  CountDefault: 1
  tags:
    - compute
    - dpdk
  networks:
    - InternalApi
    - Tenant
    - Storage
  RoleParametersDefault:
    FsAioMaxNumber: 1048576
    VhostuserSocketGroup: "hugetlbfs"
    TunedProfileName: "cpu-partitioning-powersave"    ====> here

Tagging rjarry to provide more detail steps for testing

Comment 13 Robin Jarry 2024-04-08 11:57:54 UTC
Hi @supadhya,

By default, cpu-partitioning-powersave comes with max_power_state="cstate.name:C1|10" with is the same value as the original cpu-partitioning profile. This means that **there will not be any power saving with the default settings**.

In order to achieve any power saving, you need to change this setting to allow higher C-states on the CPUs. The setting is described in tuned-profiles-cpu-partitioning(7):

> max_power_state=<MAX_CSTATE>
>       Maximum  c-state  the cores are allowed to enter. Can be expressed as it's name (C1E) or minimum wake-up latency, in micro-seconds.  This parameter is provided as-is to `force_latency`.  Default is set to "cstate.name:C1|10" to behave as cpu-partitioning profile.

More details about force_latency here https://github.com/redhat-performance/tuned/blob/v2.22.1/tuned/plugins/plugin_cpu.py#L133-L152

IMPORTANT FOR DOCS: do not blindly set max_power_state=C6 as it can introduce latency side effects to all workloads. In order to know what is the latency (in microseconds) associated to each power state, you can check cpuidle states in sysfs:

~# for s in /sys/devices/system/cpu/cpu2/cpuidle/*; do grep . $s/{name,latency}; done
/sys/devices/system/cpu/cpu2/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu2/cpuidle/state0/latency:0
/sys/devices/system/cpu/cpu2/cpuidle/state1/name:C1
/sys/devices/system/cpu/cpu2/cpuidle/state1/latency:1
/sys/devices/system/cpu/cpu2/cpuidle/state2/name:C1E
/sys/devices/system/cpu/cpu2/cpuidle/state2/latency:4
/sys/devices/system/cpu/cpu2/cpuidle/state3/name:C6
/sys/devices/system/cpu/cpu2/cpuidle/state3/latency:170

On this platform, a CPU entering C6 will need 170us to "wake up" before performing any operation. During these 170us, 6324 x 64 bytes packets can be received, which is more than the default rxq sizes. In any case, this is beyond the testing of this particular BZ.

TESTING PROCEDURE
=================

1) Set TunedProfileName: "cpu-partitioning-powersave" as showed by Viji in the previous comment.

2) Change the value of max_power_state to "C6" (using a custom playbook with crudini or some other command).

3) After deployment, verify that CPUs that are **NOT** doing anything are reaching C6.

Example:

~# cpupower -c 7,27 monitor
...
    | Nehalem                   || Mperf              || RAPL           || Idle_Stats                
 CPU| C3   | C6   | PC3  | PC6  || C0   | Cx   | Freq || pack   | dram  || POLL | C1   | C1E  | C6    
   7|  0.00| 98.79|  0.00|  0.00||  0.00|100.00|  2939||69039558|3598453||  0.00|  0.00|  0.00|100.00
  27|  0.00| 98.79|  0.00|  0.00||  0.00|100.00|  2794||69039558|3598453||  0.00|  0.00|  0.00|100.00
            ^^^^^^                                                                             ^^^^^^

Maybe @cfontain can provide more details.

Comment 14 Robin Jarry 2024-04-08 12:04:26 UTC
Quoting myself, as I forgot to mention an important detail:

(In reply to Robin Jarry from comment #13)
> On this platform, a CPU entering C6 will need 170us to "wake up" before
> performing any operation. During these 170us, 6324 x 64 bytes packets can be
> received, which is more than the default rxq sizes.

6324 x 64 bytes packets **ON A 25G LINK** was missing :)

Comment 28 Eran Kuris 2024-08-19 07:01:22 UTC
*** Bug 2250768 has been marked as a duplicate of this bug. ***