Bug 1881422 - NTO does not create machine config for the master machine config pool
Summary: NTO does not create machine config for the master machine config pool
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.6
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.6.0
Assignee: Jiří Mencák
QA Contact: Simon
URL:
Whiteboard:
Depends On:
Blocks: 1882005
TreeView+ depends on / blocked
 
Reported: 2020-09-22 11:42 UTC by Artyom
Modified: 2023-12-15 19:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:44:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 156 0 None closed Bug 1881422: Issue a warning when two or more profiles use the same priority. 2021-02-01 17:07:18 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:44:32 UTC

Description Artyom 2020-09-22 11:42:49 UTC
Description of problem:
Creating the tuned profile with the kernel arguments should automatically create a machineconfig resource with specified kernel argument. But it not happen when the master nodes selected as target by the tuned profile.

# oc -n openshift-cluster-node-tuning-operator describe tuned openshift-node-performance-hp-performanceprofile

Name:         openshift-node-performance-hp-performanceprofile
Namespace:    openshift-cluster-node-tuning-operator
Labels:       <none>
Annotations:  <none>
API Version:  tuned.openshift.io/v1
Kind:         Tuned
Metadata:
  Creation Timestamp:  2020-09-22T07:05:32Z
  Generation:          1
    Manager:    performance-operator
    Operation:  Update
    Time:       2020-09-22T07:05:32Z
  Owner References:
    API Version:           performance.openshift.io/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  PerformanceProfile
    Name:                  hp-performanceprofile
    UID:                   238c242a-006c-4bc5-a8b7-d35c92965f00
  Resource Version:        691796
  Self Link:               /apis/tuned.openshift.io/v1/namespaces/openshift-cluster-node-tuning-operator/tuneds/openshift-node-performance-hp-performanceprofile
  UID:                     88b74131-4912-45a8-854c-d95dc4eaed53
Spec:
  Profile:
    Data:  [main]
summary=Openshift node optimized for deterministic performance at the cost of increased power consumption, focused on low latency network performance. Based on Tuned 2.11 and Cluster node tuning (oc 4.5)
include=openshift-node,cpu-partitioning
 
# Inheritance of base profiles legend:
# cpu-partitioning -> network-latency -> latency-performance
# https://github.com/redhat-performance/tuned/blob/master/profiles/latency-performance/tuned.conf
# https://github.com/redhat-performance/tuned/blob/master/profiles/network-latency/tuned.conf
# https://github.com/redhat-performance/tuned/blob/master/profiles/cpu-partitioning/tuned.conf
 
# All values are mapped with a comment where a parent profile contains them.
# Different values will override the original values in parent profiles.
 
[variables]
# isolated_cores take a list of ranges; e.g. isolated_cores=2,4-7
 
isolated_cores=2-9
 
 
not_isolated_cores_expanded=${f:cpulist_invert:${isolated_cores_expanded}}
 
[cpu]
force_latency=cstate.id:1|3                   #  latency-performance  (override)
governor=performance                          #  latency-performance
energy_perf_bias=performance                  #  latency-performance
min_perf_pct=100                              #  latency-performance
 
[vm]
transparent_hugepages=never                   #  network-latency
 
[sysctl]
kernel.hung_task_timeout_secs = 600           # cpu-partitioning #realtime
kernel.nmi_watchdog = 0                       # cpu-partitioning #realtime
kernel.sched_rt_runtime_us = -1               # realtime
kernel.timer_migration = 0                    # cpu-partitioning (= 1) #realtime (= 0)
kernel.numa_balancing=0                       # network-latency
net.core.busy_read=50                         # network-latency
net.core.busy_poll=50                         # network-latency
net.ipv4.tcp_fastopen=3                       # network-latency
vm.stat_interval = 10                         # cpu-partitioning  #realtime
 
# ktune sysctl settings for rhel6 servers, maximizing i/o throughput
#
# Minimal preemption granularity for CPU-bound tasks:
# (default: 1 msec#  (1 + ilog(ncpus)), units: nanoseconds)
kernel.sched_min_granularity_ns=10000000      # latency-performance
 
# If a workload mostly uses anonymous memory and it hits this limit, the entire
# working set is buffered for I/O, and any more write buffering would require
# swapping, so it's time to throttle writes until I/O can catch up.  Workloads
# that mostly use file mappings may be able to use even higher values.
#
# The generator of dirty data starts writeback at this percentage (system default
# is 20%)
vm.dirty_ratio=10                             # latency-performance
 
# Start background writeback (via writeback threads) at this percentage (system
# default is 10%)
vm.dirty_background_ratio=3                   # latency-performance
 
# The swappiness parameter controls the tendency of the kernel to move
# processes out of physical memory and onto the swap disk.
# 0 tells the kernel to avoid swapping processes out of physical memory
# for as long as possible
# 100 tells the kernel to aggressively swap processes out of physical memory
# and move them to swap cache
vm.swappiness=10                              # latency-performance
 
# The total time the scheduler will consider a migrated process
# "cache hot" and thus less likely to be re-migrated
# (system default is 500000, i.e. 0.5 ms)
kernel.sched_migration_cost_ns=5000000        # latency-performance
 
[selinux]
avc_cache_threshold=8192                      # Custom (atomic host)
 
[net]
nf_conntrack_hashsize=131072                  # Custom (atomic host)
 
[bootloader]
# set empty values to disable RHEL initrd setting in cpu-partitioning
initrd_remove_dir=    
initrd_dst_img=
initrd_add_dir=
# overrides cpu-partitioning cmdline
cmdline_cpu_part=+nohz=on rcu_nocbs=${isolated_cores} tuned.non_isolcpus=${not_isolated_cpumask} intel_pstate=disable nosoftlockup
 
cmdline_realtime=+tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,${isolated_cores} systemd.cpu_affinity=${not_isolated_cores_expanded}
 
cmdline_hugepages=+ default_hugepagesz=2M  
cmdline_additionalArg=+
 
    Name:  openshift-node-performance-hp-performanceprofile
  Recommend:
    Machine Config Labels:
      machineconfiguration.openshift.io/role:  master
    Priority:                                  30
    Profile:                                   openshift-node-performance-hp-performanceprofile
Status:
Events:  <none>

# oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             18h
00-master-chronyd-custom                                                                      2.2.0             18h
00-worker                                          5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             18h
00-worker-chronyd-custom                                                                      2.2.0             18h
01-master-container-runtime                        5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             18h
01-master-kubelet                                  5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             18h
01-worker-container-runtime                        5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             18h
01-worker-kubelet                                  5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             18h
99-master-generated-registries                     5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             18h
99-master-ssh                                                                                 3.1.0             18h
99-worker-generated-registries                     5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             18h
99-worker-ssh                                                                                 3.1.0             18h
performance-hp-performanceprofile                                                             2.2.0             11m
rendered-master-078c03ce2e4157d56d586723c1670be3   5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             4h21m
rendered-master-621540f2d0256aab87d64cfcba80d563   5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             4h21m
rendered-master-6669b52b2613b697a99f7f95d0e453f8   5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             18h
rendered-worker-d91dc9edcc3b9fc16718c10df528f9b5   5b5d26108b3e0502887e91e8a46524bd77689fad   3.1.0             18h

# oc -n openshift-cluster-node-tuning-operator logs cluster-node-tuning-operator-66d8cc975d-v5jtp
I0922 11:17:53.829232       1 main.go:24] Go Version: go1.14.7
I0922 11:17:53.829758       1 main.go:25] Go OS/Arch: linux/amd64
I0922 11:17:53.829813       1 main.go:26] node-tuning Version: v4.6.0-202009152100.p0-0-gf1bc826-dirty
I0922 11:17:53.847235       1 controller.go:948] trying to become a leader
I0922 11:17:57.525216       1 controller.go:953] became a leader
I0922 11:17:57.542601       1 controller.go:960] starting Tuned controller
I0922 11:17:57.944837       1 controller.go:1011] started events processor/controller
I0922 11:19:55.826617       1 controller.go:562] updated profile master-0 [openshift-node-performance-hp-performanceprofile]
I0922 11:19:55.842358       1 controller.go:562] updated profile master-0 [openshift-control-plane]
I0922 11:23:17.549045       1 controller.go:562] updated profile master-0 [openshift-node-performance-hp-performanceprofile]
I0922 11:23:17.561194       1 controller.go:562] updated profile master-0 [openshift-control-plane]

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-09-16-080214

How reproducible:
Always


Steps to Reproduce:
1. Create the tuned profile specified above
2. Wait for MC creation
3.

Actual results:
MC wasn't created

Expected results:
The MC with relevant parameters should be created.

Additional info:
The cluster has 3 nodes when node both master and worker.
# oc get nodes
NAME       STATUS   ROLES           AGE   VERSION
master-0   Ready    master,worker   17h   v1.19.0+35ab7c5
master-1   Ready    master,worker   17h   v1.19.0+35ab7c5
master-2   Ready    master,worker   17h   v1.19.0+35ab7c5

Comment 3 Jiří Mencák 2020-09-22 17:17:29 UTC
You seem to be using openshift-node-performance-hp-performanceprofile with priority 30.  In all likelihood, there's default Tuned profile "" with the same priority (30).  See:
https://github.com/openshift/cluster-node-tuning-operator/blob/8b9aeeda1b13f2c6130a54d89be003fd18ee828f/assets/tuned/manifests/default-cr-tuned.yaml#L72

In this case, if both profiles match a node, the profile selection is random.  I'd recommend always create profiles with unique priorities to avoid issues like these.
I'll see how can I improve things from NTO side.  The easiest and quickest to implement seems issuing an operator warning to the user.

Comment 9 Simon 2020-10-02 17:52:14 UTC
Verified on 4.6.0-0.nightly-2020-10-02-043144

Comment 12 errata-xmlrpc 2020-10-27 16:44:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.