Description of problem: Creating the tuned profile with the kernel arguments should automatically create a machineconfig resource with specified kernel argument. But it not happen when the master nodes selected as target by the tuned profile. # oc -n openshift-cluster-node-tuning-operator describe tuned openshift-node-performance-hp-performanceprofile Name: openshift-node-performance-hp-performanceprofile Namespace: openshift-cluster-node-tuning-operator Labels: <none> Annotations: <none> API Version: tuned.openshift.io/v1 Kind: Tuned Metadata: Creation Timestamp: 2020-09-22T07:05:32Z Generation: 1 Manager: performance-operator Operation: Update Time: 2020-09-22T07:05:32Z Owner References: API Version: performance.openshift.io/v1 Block Owner Deletion: true Controller: true Kind: PerformanceProfile Name: hp-performanceprofile UID: 238c242a-006c-4bc5-a8b7-d35c92965f00 Resource Version: 691796 Self Link: /apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds/openshift-node-performance-hp-performanceprofile UID: 88b74131-4912-45a8-854c-d95dc4eaed53 Spec: Profile: Data: [main] summary=Openshift node optimized for deterministic performance at the cost of increased power consumption, focused on low latency network performance. Based on Tuned 2.11 and Cluster node tuning (oc 4.5) include=openshift-node,cpu-partitioning # Inheritance of base profiles legend: # cpu-partitioning -> network-latency -> latency-performance # https://github.com/redhat-performance/tuned/blob/master/profiles/latency-performance/tuned.conf # https://github.com/redhat-performance/tuned/blob/master/profiles/network-latency/tuned.conf # https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf # All values are mapped with a comment where a parent profile contains them. # Different values will override the original values in parent profiles. [variables] # isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7 isolated_cores=2-9 not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}} [cpu] force_latency=cstate.id:1|3 # latency-performance (override) governor=performance # latency-performance energy_perf_bias=performance # latency-performance min_perf_pct=100 # latency-performance [vm] transparent_hugepages=never # network-latency [sysctl] kernel.hung_task_timeout_secs = 600 # cpu-partitioning #realtime kernel.nmi_watchdog = 0 # cpu-partitioning #realtime kernel.sched_rt_runtime_us = -1 # realtime kernel.timer_migration = 0 # cpu-partitioning (= 1) #realtime (= 0) kernel.numa_balancing=0 # network-latency net.core.busy_read=50 # network-latency net.core.busy_poll=50 # network-latency net.ipv4.tcp_fastopen=3 # network-latency vm.stat_interval = 10 # cpu-partitioning #realtime # ktune sysctl settings for rhel6 servers, maximizing i/o throughput # # Minimal preemption granularity for CPU-bound tasks: # (default: 1 msec# (1 + ilog(ncpus)), units: nanoseconds) kernel.sched_min_granularity_ns=10000000 # latency-performance # If a workload mostly uses anonymous memory and it hits this limit, the entire # working set is buffered for I/O, and any more write buffering would require # swapping, so it's time to throttle writes until I/O can catch up. Workloads # that mostly use file mappings may be able to use even higher values. # # The generator of dirty data starts writeback at this percentage (system default # is 20%) vm.dirty_ratio=10 # latency-performance # Start background writeback (via writeback threads) at this percentage (system # default is 10%) vm.dirty_background_ratio=3 # latency-performance # The swappiness parameter controls the tendency of the kernel to move # processes out of physical memory and onto the swap disk. # 0 tells the kernel to avoid swapping processes out of physical memory # for as long as possible # 100 tells the kernel to aggressively swap processes out of physical memory # and move them to swap cache vm.swappiness=10 # latency-performance # The total time the scheduler will consider a migrated process # "cache hot" and thus less likely to be re-migrated # (system default is 500000, i.e. 0.5 ms) kernel.sched_migration_cost_ns=5000000 # latency-performance [selinux] avc_cache_threshold=8192 # Custom (atomic host) [net] nf_conntrack_hashsize=131072 # Custom (atomic host) [bootloader] # set empty values to disable RHEL initrd setting in cpu-partitioning initrd_remove_dir= initrd_dst_img= initrd_add_dir= # overrides cpu-partitioning cmdline cmdline_cpu_part=+nohz=on rcu_nocbs=${isolated_cores} tuned.non_isolcpus=${not_isolated_cpumask} intel_pstate=disable nosoftlockup cmdline_realtime=+tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,${isolated_cores} systemd.cpu_affinity=${not_isolated_cores_expanded} cmdline_hugepages=+ default_hugepagesz=2M cmdline_additionalArg=+ Name: openshift-node-performance-hp-performanceprofile Recommend: Machine Config Labels: machineconfiguration.openshift.io/role: master Priority: 30 Profile: openshift-node-performance-hp-performanceprofile Status: Events: <none> # oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 18h 00-master-chronyd-custom 2.2.0 18h 00-worker 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 18h 00-worker-chronyd-custom 2.2.0 18h 01-master-container-runtime 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 18h 01-master-kubelet 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 18h 01-worker-container-runtime 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 18h 01-worker-kubelet 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 18h 99-master-generated-registries 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 18h 99-master-ssh 3.1.0 18h 99-worker-generated-registries 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 18h 99-worker-ssh 3.1.0 18h performance-hp-performanceprofile 2.2.0 11m rendered-master-078c03ce2e4157d56d586723c1670be3 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 4h21m rendered-master-621540f2d0256aab87d64cfcba80d563 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 4h21m rendered-master-6669b52b2613b697a99f7f95d0e453f8 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 18h rendered-worker-d91dc9edcc3b9fc16718c10df528f9b5 5b5d26108b3e0502887e91e8a46524bd77689fad 3.1.0 18h # oc -n openshift-cluster-node-tuning-operator logs cluster-node-tuning-operator-66d8cc975d-v5jtp I0922 11:17:53.829232 1 main.go:24] Go Version: go1.14.7 I0922 11:17:53.829758 1 main.go:25] Go OS/Arch: linux/amd64 I0922 11:17:53.829813 1 main.go:26] node-tuning Version: v4.6.0-202009152100.p0-0-gf1bc826-dirty I0922 11:17:53.847235 1 controller.go:948] trying to become a leader I0922 11:17:57.525216 1 controller.go:953] became a leader I0922 11:17:57.542601 1 controller.go:960] starting Tuned controller I0922 11:17:57.944837 1 controller.go:1011] started events processor/controller I0922 11:19:55.826617 1 controller.go:562] updated profile master-0 [openshift-node-performance-hp-performanceprofile] I0922 11:19:55.842358 1 controller.go:562] updated profile master-0 [openshift-control-plane] I0922 11:23:17.549045 1 controller.go:562] updated profile master-0 [openshift-node-performance-hp-performanceprofile] I0922 11:23:17.561194 1 controller.go:562] updated profile master-0 [openshift-control-plane] Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-09-16-080214 How reproducible: Always Steps to Reproduce: 1. Create the tuned profile specified above 2. Wait for MC creation 3. Actual results: MC wasn't created Expected results: The MC with relevant parameters should be created. Additional info: The cluster has 3 nodes when node both master and worker. # oc get nodes NAME STATUS ROLES AGE VERSION master-0 Ready master,worker 17h v1.19.0+35ab7c5 master-1 Ready master,worker 17h v1.19.0+35ab7c5 master-2 Ready master,worker 17h v1.19.0+35ab7c5
You seem to be using openshift-node-performance-hp-performanceprofile with priority 30. In all likelihood, there's default Tuned profile "" with the same priority (30). See: https://github.com/openshift/cluster-node-tuning-operator/blob/8b9aeeda1b13f2c6130a54d89be003fd18ee828f/assets/tuned/manifests/default-cr-tuned.yaml#L72 In this case, if both profiles match a node, the profile selection is random. I'd recommend always create profiles with unique priorities to avoid issues like these. I'll see how can I improve things from NTO side. The easiest and quickest to implement seems issuing an operator warning to the user.
Verified on 4.6.0-0.nightly-2020-10-02-043144
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
Adding Test Case: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-36881