Bug 1642209
| Summary: | Applying tuned profile cpu-partitioning to compute nodes through heat template fails | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Sai Sindhur Malleni <smalleni> |
| Component: | tuned | Assignee: | Christophe Fontaine <cfontain> |
| Status: | CLOSED WORKSFORME | QA Contact: | Yariv <yrachman> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 13.0 (Queens) | CC: | atelang, atheurer, bmichalo, cfontain, dbecker, dshaks, jianzzha, krister, mburns, morazi, skramaja, smalleni, vchundur |
| Target Milestone: | --- | Keywords: | Triaged, ZStream |
| Target Release: | 10.0 (Newton) | Flags: | vchundur:
needinfo+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-02-20 10:32:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Just to be clear, I'm not targeting an NFV usecase, just want to use isolated cpus and VCPU pinning for normal usecase. if we manually try to provision cpu-partitioning profile with the same list specified in the templates, [root@overcloud-compute-0 tuned]# cat cpu-partitioning-variables.conf # Examples: # isolated_cores=2,4-7 isolated_cores=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 # # To disable the kernel load balancing in certain isolated CPUs: # no_rebalance_cores=5-10 [root@overcloud-compute-0 tuned]# tuned-adm profile cpu-partitioning Cannot load profile(s) 'cpu-partitioning': Assertion 'isolated_cores contains online CPU(s)' failed. now if we shrink the list, [root@overcloud-compute-0 tuned]# cat cpu-partitioning-variables.conf # Examples: # isolated_cores=2,4-7 isolated_cores=1,3,5,7 # # To disable the kernel load balancing in certain isolated CPUs: # no_rebalance_cores=5-10 [root@overcloud-compute-0 tuned]# tuned-adm profile cpu-partitioning [root@overcloud-compute-0 tuned]# so clearly the tuned has issue with this particular cpu list. now increase the list to have 1,3,5,7,9 and reapply the profile [root@overcloud-compute-0 tuned]# cat cpu-partitioning-variables.conf # Examples: # isolated_cores=2,4-7 isolated_cores=1,3,5,7,9 # # To disable the kernel load balancing in certain isolated CPUs: # no_rebalance_cores=5-10 [root@overcloud-compute-0 tuned]# tuned-adm profile cpu-partitioning Cannot load profile(s) 'cpu-partitioning': Assertion 'isolated_cores contains online CPU(s)' failed. [root@overcloud-compute-0 tuned]# rpm -qa | grep tuned tuned-2.9.0-1.el7_5.2.noarch tuned-profiles-cpu-partitioning-2.9.0-1.el7_5.2.noarch [root@overcloud-compute-0 tuned]# this looks awefully similar to: https://bugzilla.redhat.com/show_bug.cgi?id=1432240 [root@overcloud-compute-0 tuned]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz Stepping: 4 CPU MHz: 1199.890 CPU max MHz: 3600.0000 CPU min MHz: 1200.0000 BogoMIPS: 6000.07 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts so it only accept certain list, like 1,3 works but 1,3,5,7,9 doesn't [heat-admin@overcloud-compute-0 ~]$ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=UUID=c4f51144-921c-4f64-a3ef-5bb0cb17cef0 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet skew_tick=1 nohz=on nohz_full=1,3 rcu_nocbs=1,3 tuned.non_isolcpus=000000ff,fffffff5 intel_pstate=disable nosoftlockup [root@overcloud-compute-0 tuned]# uname -r 3.10.0-862.11.6.el7.x86_64 ok, I think I saw where the problem is, in https://github.com/redhat-performance/tuned/blob/master/tuned/profiles/functions/function_assertion.py change the if statement on line 22 to: if ''.join(sorted(args[1].split(','))) != ''.join(sorted(args[1].split(','))): shall take care of this issue, this is assuming both args[1] and args[2] only take comma separated list. typo, change above line 22 to:
if ''.join(sorted(args[1].split(','))) != ''.join(sorted(args[2].split(','))):
@Jianzhu, any idea why the file cat /etc/tuned/cpu-partitioning-variables.conf was not populated at all when the deploy failed? I only examined the issue with the manual cpu-partitioning setup failure and overlooked the deployment failure. we might have two issues with this: 1) deployment failure, the best person to answer this is skramaja 2) tuned failure similar to the existing one https://bugzilla.redhat.com/show_bug.cgi?id=1432240 So shall we do this: 1) have this BZ fix the deployment failure 2) I will re-open https://bugzilla.redhat.com/show_bug.cgi?id=1432240 Saravanan, Could you help us understand what is going on, please? Sai, From a very high level view i am guessing this BZ is related to https://bugzilla.redhat.com/show_bug.cgi?id=1645412. This is an issue we see when we are trying to update to RHEL 7.6 on the undercloud. If you are doing the same on your set up it is likely this could be the issue. Vijay. Thanks Vijay. Saravanan, We see [heat-admin@overcloud-compute-0 ~]$ cat /etc/tuned/cpu-partitioning-variables.conf # Examples: # isolated_cores=2,4-7 # isolated_cores=2-23 # # To disable the kernel load balancing in certain isolated CPUs: # no_rebalance_cores=5-10 So this file was not even populated in spite passing the required params TunedProfileName: "cpu-partitioning" IsolCpusList: "1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39" That is also an issue here right? Please try these templates, they are passing CI https://code.engineering.redhat.com/gerrit/gitweb?p=nfv-qe.git;a=tree;f=ospd-13-vxlan-dpdk-odl-ctlplane-dataplane-bonding-hybrid;h=e00ab0a72a45a8847503a9b699b6e3165d5c149c;hb=refs/heads/ci It is working |
Description of problem: Trying to apply the cpu-partitioning tuned profile to compute nodes through Triple heat templates fails with deploy failing in step 1 with the message. Stack overcloud CREATE_FAILED overcloud.AllNodesDeploySteps.ComputeDeployment_Step1.1: resource_type: OS::Heat::StructuredDeployment physical_resource_id: a3e3e1cb-a2c0-4fa9-972e-5a328e2c08c5 status: CREATE_FAILED status_reason: | Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... " with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ssh/ma nifests/server.pp\", 12]:[\"/var/lib/tripleo-config/puppet_step_config.pp\", 40]", "Error: tuned-adm profile cpu-partitioning returned 1 instead of one of [0]", "Error: /Stage[main]/Tripleo::Profile::Base::Tuned/Exec[tuned-adm]/returns: change from notrun to 0 failed: tuned-adm profile cpu-partitioning returned 1 i nstead of one of [0]" ] } The heat template used is as follows: (undercloud) [stack@ducati ~]$ cat ~/templates/args.yaml parameter_defaults: #NeutronGlobalPhysnetMtu: 9000 ComputeParameters: KernelArgs: "default_hugepagesz=1GB hugepagesz=1GB hugepages=4" TunedProfileName: "cpu-partitioning" IsolCpusList: "1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39" NovaVcpuPinSet: ['1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39'] The CPUs on the compute node using lscpu are as follows: NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 Also, [heat-admin@overcloud-compute-0 ~]$ cat /etc/tuned/cpu-partitioning-variables.conf # Examples: # isolated_cores=2,4-7 # isolated_cores=2-23 # # To disable the kernel load balancing in certain isolated CPUs: # no_rebalance_cores=5-10 The following errors were seen in the tuned logs: 2018-10-22 17:35:04,921 INFO tuned.profiles.loader: loading profile: cpu-partitioning 2018-10-22 17:35:04,942 ERROR tuned.profiles.functions.function_assertion_non_equal: assertion 'isolated_cores are set' failed: '${isolated_cores}' == '${isolated_cores}' 2018-10-22 17:35:04,943 ERROR tuned.daemon.controller: Failed to apply profile 'cpu-partitioning' 2018-10-22 17:35:04,943 INFO tuned.daemon.controller: Applying previously applied profile. Version-Release number of selected component (if applicable): OSP 13 Puddle: 2018-10-02.1 tuned-2.9.0-1.el7_5.2.noarch tuned-profiles-cpu-partitioning-2.9.0-1.el7_5.2.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy overcloud by passing the additional heat template to set tuned profile 2. 3. Actual results: Deploy fails in step 1 Expected results: Deploy should succeed and the required tuned profile should be set Additional info: