Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1642209

Summary: Applying tuned profile cpu-partitioning to compute nodes through heat template fails
Product: Red Hat OpenStack Reporter: Sai Sindhur Malleni <smalleni>
Component: tunedAssignee: Christophe Fontaine <cfontain>
Status: CLOSED WORKSFORME QA Contact: Yariv <yrachman>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: atelang, atheurer, bmichalo, cfontain, dbecker, dshaks, jianzzha, krister, mburns, morazi, skramaja, smalleni, vchundur
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)Flags: vchundur: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-20 10:32:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sai Sindhur Malleni 2018-10-24 02:16:14 UTC
Description of problem: Trying to apply the cpu-partitioning tuned profile to compute nodes through Triple heat templates fails with deploy failing in step 1 with the message.
 Stack overcloud CREATE_FAILED

overcloud.AllNodesDeploySteps.ComputeDeployment_Step1.1:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: a3e3e1cb-a2c0-4fa9-972e-5a328e2c08c5
  status: CREATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
            "                    with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ssh/ma
nifests/server.pp\", 12]:[\"/var/lib/tripleo-config/puppet_step_config.pp\", 40]",
            "Error: tuned-adm profile cpu-partitioning returned 1 instead of one of [0]",
            "Error: /Stage[main]/Tripleo::Profile::Base::Tuned/Exec[tuned-adm]/returns: change from notrun to 0 failed: tuned-adm profile cpu-partitioning returned 1 i
nstead of one of [0]"
        ]
    }

The heat template used is as follows:
(undercloud) [stack@ducati ~]$ cat ~/templates/args.yaml
parameter_defaults:
  #NeutronGlobalPhysnetMtu: 9000
  ComputeParameters:
    KernelArgs: "default_hugepagesz=1GB hugepagesz=1GB hugepages=4"
    TunedProfileName: "cpu-partitioning"
    IsolCpusList: "1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39"
    NovaVcpuPinSet: ['1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39']


The CPUs on the compute node using lscpu are as follows:
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

Also,
[heat-admin@overcloud-compute-0 ~]$ cat /etc/tuned/cpu-partitioning-variables.conf
# Examples:
# isolated_cores=2,4-7
# isolated_cores=2-23
#
# To disable the kernel load balancing in certain isolated CPUs:
# no_rebalance_cores=5-10

The following errors were seen in the tuned logs:
2018-10-22 17:35:04,921 INFO     tuned.profiles.loader: loading profile: cpu-partitioning
2018-10-22 17:35:04,942 ERROR    tuned.profiles.functions.function_assertion_non_equal: assertion 'isolated_cores are set' failed: '${isolated_cores}' == '${isolated_cores}'
2018-10-22 17:35:04,943 ERROR    tuned.daemon.controller: Failed to apply profile 'cpu-partitioning'
2018-10-22 17:35:04,943 INFO     tuned.daemon.controller: Applying previously applied profile.


Version-Release number of selected component (if applicable):
OSP 13
Puddle: 2018-10-02.1
tuned-2.9.0-1.el7_5.2.noarch
tuned-profiles-cpu-partitioning-2.9.0-1.el7_5.2.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud by passing the additional heat template to set tuned profile
2.
3.

Actual results:
Deploy fails in step 1

Expected results:
Deploy should succeed and the required tuned profile should be set

Additional info:

Comment 1 Sai Sindhur Malleni 2018-10-24 13:39:52 UTC
Just to be clear, I'm not targeting an NFV usecase, just want to use isolated cpus and VCPU pinning for normal usecase.

Comment 2 jianzzha 2018-10-24 17:35:31 UTC
if we manually try to provision cpu-partitioning profile with the same list specified in the templates,

[root@overcloud-compute-0 tuned]# cat cpu-partitioning-variables.conf
# Examples:
# isolated_cores=2,4-7
isolated_cores=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
#
# To disable the kernel load balancing in certain isolated CPUs:
# no_rebalance_cores=5-10


[root@overcloud-compute-0 tuned]# tuned-adm profile cpu-partitioning
Cannot load profile(s) 'cpu-partitioning': Assertion 'isolated_cores contains online CPU(s)' failed.

now if we shrink the list,
[root@overcloud-compute-0 tuned]# cat cpu-partitioning-variables.conf
# Examples:
# isolated_cores=2,4-7
isolated_cores=1,3,5,7
#
# To disable the kernel load balancing in certain isolated CPUs:
# no_rebalance_cores=5-10
[root@overcloud-compute-0 tuned]# tuned-adm profile cpu-partitioning
[root@overcloud-compute-0 tuned]# 

so clearly the tuned has issue with this particular cpu list.

now increase the list to have 1,3,5,7,9 and reapply the profile
[root@overcloud-compute-0 tuned]# cat cpu-partitioning-variables.conf
# Examples:
# isolated_cores=2,4-7
isolated_cores=1,3,5,7,9
#
# To disable the kernel load balancing in certain isolated CPUs:
# no_rebalance_cores=5-10

[root@overcloud-compute-0 tuned]# tuned-adm profile cpu-partitioning
Cannot load profile(s) 'cpu-partitioning': Assertion 'isolated_cores contains online CPU(s)' failed.

[root@overcloud-compute-0 tuned]# rpm -qa | grep tuned
tuned-2.9.0-1.el7_5.2.noarch
tuned-profiles-cpu-partitioning-2.9.0-1.el7_5.2.noarch
[root@overcloud-compute-0 tuned]# 

this looks awefully similar to: 
https://bugzilla.redhat.com/show_bug.cgi?id=1432240
[root@overcloud-compute-0 tuned]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
Stepping:              4
CPU MHz:               1199.890
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              6000.07
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts

Comment 3 jianzzha 2018-10-24 17:49:57 UTC
so it only accept certain list, like 1,3 works but 1,3,5,7,9 doesn't

[heat-admin@overcloud-compute-0 ~]$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=UUID=c4f51144-921c-4f64-a3ef-5bb0cb17cef0 ro console=tty0 console=ttyS0,115200n8 crashkernel=auto rhgb quiet skew_tick=1 nohz=on nohz_full=1,3 rcu_nocbs=1,3 tuned.non_isolcpus=000000ff,fffffff5 intel_pstate=disable nosoftlockup

[root@overcloud-compute-0 tuned]# uname -r
3.10.0-862.11.6.el7.x86_64

Comment 5 jianzzha 2018-10-25 03:32:22 UTC
ok, I think I saw where the problem is,

in https://github.com/redhat-performance/tuned/blob/master/tuned/profiles/functions/function_assertion.py

change the if statement on line 22 to:
 if ''.join(sorted(args[1].split(','))) != ''.join(sorted(args[1].split(','))):

shall take care of this issue, this is assuming both args[1] and args[2] only take comma separated list.

Comment 6 jianzzha 2018-10-25 03:55:22 UTC
typo, change above line 22 to:
  if ''.join(sorted(args[1].split(','))) != ''.join(sorted(args[2].split(','))):

Comment 7 Sai Sindhur Malleni 2018-10-25 16:38:36 UTC
@Jianzhu, any idea why the file cat /etc/tuned/cpu-partitioning-variables.conf was not populated at all when the deploy failed?

Comment 8 jianzzha 2018-10-25 17:09:19 UTC
I only examined the issue  with the manual cpu-partitioning setup failure and overlooked the deployment failure.

we might have two issues with this:
1) deployment failure, the best person to answer this is skramaja
2) tuned failure similar to the existing one https://bugzilla.redhat.com/show_bug.cgi?id=1432240 

So shall we do this: 
1) have this BZ fix the deployment failure
2) I will re-open https://bugzilla.redhat.com/show_bug.cgi?id=1432240

Comment 9 Sai Sindhur Malleni 2018-10-25 17:40:53 UTC
Saravanan,

Could you help us understand what is going on, please?

Comment 10 Vijay Chundury 2018-11-07 19:56:06 UTC
Sai,
From a very high level view i am guessing this BZ is related to https://bugzilla.redhat.com/show_bug.cgi?id=1645412.
This is an issue we see when we are trying to update to RHEL 7.6 on the undercloud.
If you are doing the same on your set up it is likely this could be the issue.

Vijay.

Comment 11 Sai Sindhur Malleni 2018-11-08 15:36:30 UTC
Thanks Vijay.

Saravanan,
We see
[heat-admin@overcloud-compute-0 ~]$ cat /etc/tuned/cpu-partitioning-variables.conf
# Examples:
# isolated_cores=2,4-7
# isolated_cores=2-23
#
# To disable the kernel load balancing in certain isolated CPUs:
# no_rebalance_cores=5-10

So this file was not even populated in spite passing the required params 
 TunedProfileName: "cpu-partitioning"
 IsolCpusList: "1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39"


That is also an issue here right?