Bug 2225163
| Summary: | [TestOnly] Power savings - tuned profile and ovs improvements | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Christophe Fontaine <cfontain> |
| Component: | openvswitch | Assignee: | RHOSP:NFV_Eng <rhosp-nfv-int> |
| Status: | CLOSED COMPLETED | QA Contact: | Sanjay Upadhyay <supadhya> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 17.1 (Wallaby) | CC: | apevec, cfontain, chrisw, ekuris, eshulman, gregraka, gurpsing, mariel, rhosp-nfv-int, rjarry, supadhya, vcandapp |
| Target Milestone: | z3 | Keywords: | TestOnly, Triaged |
| Target Release: | 17.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Enhancement | |
| Doc Text: |
A power save profile, cpu-partitioning-powersave, has been introduced in Red Hat Enterprise Linux 9 (RHEL 9), and is now available in Red Hat OpenStack Platform (RHOSP) 17.1.3. This TuneD profile is the base building block to save power in RHOSP 17.1 NFV environments. For more information, see link:{defaultURL}/configuring_network_functions_virtualization/plan-ovs-dpdk-deploy_rhosp-nfv#save-power-ovsdpdk-deploy_plndpdk-nfv[Saving power in OVS-DPDK deployments] in _Configuring network functions virtualization_.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-12-15 10:35:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2225164 | ||
| Bug Blocks: | 2250831 | ||
|
Description
Christophe Fontaine
2023-07-24 12:54:30 UTC
*** Bug 2187562 has been marked as a duplicate of this bug. *** verified on RHOS-17.1-RHEL-9-20231102.n.1 1. configured BIOS - OS driven Power Management on compute nodes 2. Established Baseline: * run Performance with the `cpu-partitioning` tuned profile [1] * check the power level: ``` # ipmi-dcmi --get-system-power-statistics Current Power: 144 Watts ``` 3. apply the `cpu-partitioning-powersave` profile: * run Performance to make sure it remains stable [2] * check the power level: ``` # ipmi-dcmi --get-system-power-statistics Current Power: 132 Watts ``` idling, the system is saving ~10W . ------ p.s. ---------- potential issue with the kernel arg update in UEFI mode Power savings are OK when it is fixed manually Waiting for the ovs-patch to flip the BZ to verified moving to "on_dev" status, waiting to ovs-patch for fixing the kernel arg update in issue UEFI mode As discussed, I need some more details on how to confirm all parameters for cpu-partitioning-powersave are in place. I have this patch (https://code.engineering.redhat.com/gerrit/c/nfv-qe/+/451366) for the moment, which I have already run a deployment.. although the tuned profile being used still is cpu-partitioning. tagging @vcandapp supadhya, from the deployment perspective the following files will need update for setting this new profile:
1. baremetal_deployment.yaml, use the new tuned profile
Eg.
- name: ComputeOvsDpdkSriov
count: 2
instances:
- hostname: compute-0
name: compute-0
- hostname: compute-1
name: compute-1
defaults:
networks:
- network: internal_api
subnet: internal_api_subnet
- network: tenant
subnet: tenant_subnet
- network: storage
subnet: storage_subnet
network_config:
template: /home/stack/ospd-17.1-geneve-ovn-dpdk-sriov-ctlplane-dataplane-bonding-hybrid/nic-configs/computeovsdpdksriov.yaml
ansible_playbooks:
- playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-kernelargs.yaml
extra_vars:
kernel_args: "default_hugepagesz=1GB hugepagesz=1G hugepages=64 iommu=pt intel_iommu=on tsx=off isolcpus=2-19,22-39"
reboot_wait_timeout: 900
tuned_profile: "cpu-partitioning-powersave" ====> here
tuned_isolated_cores: "2-19,22-39"
2. roles_data.yaml
- name: ComputeOvsDpdkSriov
description: |
Compute role with OvS-DPDK and SR-IOV services
CountDefault: 1
tags:
- compute
- dpdk
networks:
- InternalApi
- Tenant
- Storage
RoleParametersDefault:
FsAioMaxNumber: 1048576
VhostuserSocketGroup: "hugetlbfs"
TunedProfileName: "cpu-partitioning-powersave" ====> here
Tagging rjarry to provide more detail steps for testing
Hi @supadhya, By default, cpu-partitioning-powersave comes with max_power_state="cstate.name:C1|10" with is the same value as the original cpu-partitioning profile. This means that **there will not be any power saving with the default settings**. In order to achieve any power saving, you need to change this setting to allow higher C-states on the CPUs. The setting is described in tuned-profiles-cpu-partitioning(7): > max_power_state=<MAX_CSTATE> > Maximum c-state the cores are allowed to enter. Can be expressed as it's name (C1E) or minimum wake-up latency, in micro-seconds. This parameter is provided as-is to `force_latency`. Default is set to "cstate.name:C1|10" to behave as cpu-partitioning profile. More details about force_latency here https://github.com/redhat-performance/tuned/blob/v2.22.1/tuned/plugins/plugin_cpu.py#L133-L152 IMPORTANT FOR DOCS: do not blindly set max_power_state=C6 as it can introduce latency side effects to all workloads. In order to know what is the latency (in microseconds) associated to each power state, you can check cpuidle states in sysfs: ~# for s in /sys/devices/system/cpu/cpu2/cpuidle/*; do grep . $s/{name,latency}; done /sys/devices/system/cpu/cpu2/cpuidle/state0/name:POLL /sys/devices/system/cpu/cpu2/cpuidle/state0/latency:0 /sys/devices/system/cpu/cpu2/cpuidle/state1/name:C1 /sys/devices/system/cpu/cpu2/cpuidle/state1/latency:1 /sys/devices/system/cpu/cpu2/cpuidle/state2/name:C1E /sys/devices/system/cpu/cpu2/cpuidle/state2/latency:4 /sys/devices/system/cpu/cpu2/cpuidle/state3/name:C6 /sys/devices/system/cpu/cpu2/cpuidle/state3/latency:170 On this platform, a CPU entering C6 will need 170us to "wake up" before performing any operation. During these 170us, 6324 x 64 bytes packets can be received, which is more than the default rxq sizes. In any case, this is beyond the testing of this particular BZ. TESTING PROCEDURE ================= 1) Set TunedProfileName: "cpu-partitioning-powersave" as showed by Viji in the previous comment. 2) Change the value of max_power_state to "C6" (using a custom playbook with crudini or some other command). 3) After deployment, verify that CPUs that are **NOT** doing anything are reaching C6. Example: ~# cpupower -c 7,27 monitor ... | Nehalem || Mperf || RAPL || Idle_Stats CPU| C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || pack | dram || POLL | C1 | C1E | C6 7| 0.00| 98.79| 0.00| 0.00|| 0.00|100.00| 2939||69039558|3598453|| 0.00| 0.00| 0.00|100.00 27| 0.00| 98.79| 0.00| 0.00|| 0.00|100.00| 2794||69039558|3598453|| 0.00| 0.00| 0.00|100.00 ^^^^^^ ^^^^^^ Maybe @cfontain can provide more details. Quoting myself, as I forgot to mention an important detail: (In reply to Robin Jarry from comment #13) > On this platform, a CPU entering C6 will need 170us to "wake up" before > performing any operation. During these 170us, 6324 x 64 bytes packets can be > received, which is more than the default rxq sizes. 6324 x 64 bytes packets **ON A 25G LINK** was missing :) *** Bug 2250768 has been marked as a duplicate of this bug. *** |