Description of problem: In 3.9 node labels were set correctly in /etc/origin/node/node-config.yaml, so that tools could find out the type of the node and set relevant tuned profiles. In 3.10, there is not enough information in node-config.yaml or /etc/origin/node/bootstrap-node-config.yaml to establish type of the node (control plane/computes). This is a regression wrt to 3.9. Version-Release number of the following components: $ git describe openshift-ansible-3.10.0-0.25.0 How reproducible: Always. Steps to Reproduce: 1. Install a small 4-node OCP cluster with 1 master, 1 infra node, 1 lb and 1 worker node with node labels as follows: [nodes] # masters b1.lan openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'zone': 'default'}" # infras b2.lan openshift_schedulable=true openshift_node_labels="{'region': 'infra', 'zone': 'default'}" # worker/application nodes b3.lan openshift_schedulable=true openshift_node_labels="{'region': 'primary', 'zone': 'default'}" Actual results: On 3.9, you'll get: root@b1: # grep -C1 infra /etc/origin/node/node-config.yaml node-labels: - region=infra - zone=default root@b2: ~ # grep -C1 infra /etc/origin/node/node-config.yaml node-labels: - region=infra - zone=default On 3.10, you'll get root@b1: # grep -C1 infra /etc/origin/node/node-config.yaml node-labels: - region=infra - zone=default root@b1: ~ # grep -C1 infra /etc/origin/node/node-config.yaml root@b1: ~ # grep -C1 infra /etc/origin/node/bootstrap-node-config.yaml root@b1: ~ # grep -A2 node-labels /etc/origin/node/node-config.yaml node-labels: - "node-role.kubernetes.io/master=true" enable-controller-attach-detach: root@b1: ~ # grep -A2 node-labels /etc/origin/node/bootstrap-node-config.yaml node-labels: - "" enable-controller-attach-detach: root@b2: ~ # grep -A2 node-labels /etc/origin/node/bootstrap-node-config.yaml node-labels: - "" enable-controller-attach-detach: root@b2: ~ # grep -A2 node-labels /etc/origin/node/node-config.yaml node-labels: - "node-role.kubernetes.io/compute=true" enable-controller-attach-detach: Expected results: /etc/origin/node/bootstrap-node-config.yaml ideally for 3.10 or any other file in /etc/origin needs to contain enough information about the type of the OCP node. Additional info: https://bugzilla.redhat.com/show_bug.cgi?id=1504475
This is going to be problematic as this is now driven by configmaps rather than the configuration files. I'm going to mark this as 3.10.z.
/etc/origin/node/node-config.yaml should have these labels today in 3.10 and 3.11, but wasn't when this was BZ was first created. Though they're going to be 'node-role.kubernetes.io/infra=true' etc and that file will not exist until the host has been bootstrapped. Does tuned have the ability to watch files for changes? I'm not sure how well this will translate into our efforts to move to upstream dynamic config. https://kubernetes.io/docs/tasks/administer-cluster/reconfigure-kubelet/
(In reply to Scott Dodson from comment #3) > Does tuned have the ability to watch files for changes? I do not believe the current tuned has this functionality. Adding Jaroslav Skarvada. For 4.0 there are plans to have tuned containerized and have inotifywatch check changes in files that need to be checked. But a tuned functionality to have certain files watched and then reloaded based on changes in them would definitely help in a more dynemic environment.
(In reply to jmencak from comment #4) > (In reply to Scott Dodson from comment #3) > > Does tuned have the ability to watch files for changes? > > I do not believe the current tuned has this functionality. Adding Jaroslav > Skarvada. For 4.0 there are plans to have tuned containerized and have > inotifywatch check changes in files that need to be checked. But a tuned > functionality to have certain files watched and then reloaded based on > changes in them would definitely help in a more dynemic environment. Unfortunately there is no such functionality at the moment. Feel free to open Tuned bugzilla.
We could have the sync pod, which deploys the new node-config.yaml and restarts the kubelet, trigger a tuned command after updating the file. If we do that, what would that command be?
(In reply to Scott Dodson from comment #6) > We could have the sync pod, which deploys the new node-config.yaml and > restarts the kubelet, trigger a tuned command after updating the file. If we > do that, what would that command be? Basically the same thing the tuned role in openshift-ansible does, i.e.: - rm /etc/tuned/{active_profile,profile_mode} # Make tuned use the recommended tuned profile on restart - systemctl restart tuned
Summarizing and transfering to pod team. Prior to 3.10 the installer configured node-config.yaml with labels which triggered tuned profile selection. Due to the refactoring of the install process this is now no longer possible because node-config.yaml is retrieved via the API server at runtime rather than install time. As such I think we need to make the sync pod responsible for notifying tuned whenever node-config.yaml is updated so that tuned selects the proper profiled based on the node's current role. See comment 7 for details of the steps necessary to make that happen.
There is a file laid down by the installer that has the role information in 3.10 and 3.11 at least: $ grep BOOTSTRAP_CONFIG_NAME /etc/sysconfig/atomic-openshift-node BOOTSTRAP_CONFIG_NAME=node-config-compute where "compute" is the role and I assume tuned is configuring based on this information. Can this be used?
Thanks, Seth, I believe we can use /etc/sysconfig/atomic-openshift-node for that purpose now.
Proposed 3.10: https://github.com/openshift/openshift-ansible/pull/10730
Tried this with latest release-3.10 branch $ git describe openshift-ansible-3.10.79-1-16-g5621b67 After modifying one configmap of node_group, tuned service got restart on corresponding nodes afterwards. [root@ip-172-18-11-150 ~]# tailf /var/log/tuned/tuned.log 2018-11-26 02:02:53,311 INFO tuned.daemon.controller: terminating controller 2018-11-26 02:02:53,311 INFO tuned.daemon.daemon: stopping tuning 2018-11-26 02:02:53,511 INFO tuned.daemon.daemon: terminating Tuned, rolling back all changes 2018-11-26 02:02:53,712 INFO tuned.daemon.application: dynamic tuning is globally disabled 2018-11-26 02:02:53,720 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2018-11-26 02:02:53,720 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2018-11-26 02:02:53,721 INFO tuned.daemon.daemon: Using 'openshift-node' profile 2018-11-26 02:02:53,722 INFO tuned.profiles.loader: loading profile: openshift-node 2018-11-26 02:02:53,767 INFO tuned.daemon.controller: starting controller 2018-11-26 02:02:53,768 INFO tuned.daemon.daemon: starting tuning 2018-11-26 02:02:53,777 INFO tuned.plugins.base: instance cpu: assigning devices cpu0, cpu1 2018-11-26 02:02:53,781 WARNING tuned.plugins.plugin_cpu: your CPU doesn't support MSR_IA32_ENERGY_PERF_BIAS, ignoring CPU energy performance bias 2018-11-26 02:02:53,781 INFO tuned.plugins.base: instance disk: assigning devices dm-0, xvda 2018-11-26 02:02:53,782 INFO tuned.plugins.base: instance net: assigning devices eth0 2018-11-26 02:02:54,113 INFO tuned.plugins.plugin_sysctl: reapplying system sysctl 2018-11-26 02:02:54,127 INFO tuned.daemon.daemon: static tuning from profile 'openshift-node' applied The proposed PR https://github.com/openshift/openshift-ansible/pull/10730 not merged into latest openshift-ansible rpm package - openshift-ansible-3.10.79-1.git.0.10daf7d yet, so change it to MODIFIED.
Fixed in openshift-ansible-3.10.80-1
openshift-ansible-3.10.83-1.git.0.12699eb.el7 now attached to https://errata.devel.redhat.com/advisory/38171, move this bug to verified per Comment 14.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3750