Bug 1824473 - Tuned profile is not updated after incorrect tuned CR is fixed
Summary: Tuned profile is not updated after incorrect tuned CR is fixed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.4.0
Assignee: Jiří Mencák
QA Contact: Simon
URL:
Whiteboard:
Depends On: 1823941
Blocks: 1825007
TreeView+ depends on / blocked
 
Reported: 2020-04-16 09:44 UTC by Jiří Mencák
Modified: 2020-05-04 11:49 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The Node Tuning Operator does not ship with the fixes to address tuned daemon behaviour for 1774645 and 1702724. Consequence: When an invalid profile is specified by the user, this results in a DoS of the operand's (tuned daemon) functionality and a correction of that profile does not restore the operand's functionality. Fix: Apply fixes for 1774645 and 1702724 to the tuned daemon. Result: Tuned daemon will correctly process and set set the new, corrected profile.
Clone Of: 1823941
: 1825007 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:49:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-node-tuning-operator pull 125 0 None closed Bug 1824473: Fix tuned reload behaviour on SIGHUP for invalid tuned profiles. 2020-10-06 12:41:19 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:49:45 UTC

Description Jiří Mencák 2020-04-16 09:44:40 UTC
+++ This bug was initially created as a clone of Bug #1823941 +++

Description of problem:

 When tuned CR is created containing an error, fixing the the mistake in the CR does not update the tuned profile on the host. The tuned pod has to be manually restarted for the changes to take effect. 


Version-Release number of selected component (if applicable):
4.3

How reproducible:
100%

Steps to Reproduce:
1. # oc lable node NODE1 tuned=test

2. Create tuned CR with error the value kernel.pid_max=1048575 is set 2x. 

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: ips
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=A custom OpenShift IPS host profile
      [sysctl]
      kernel.msgmni=4096
      kernel.pid_max=1048575
      kernel.shmmax=180000000
      kernel.sem="128 1048576 32 32768"
      net.core.rmem_default=>33554431
      net.core.rmem_max=>33554431
      fs.file-max=>240000
      vm.dirty_background_ratio=64
      vm.dirty_ratio=72
      kernel.pid_max=1048575
    name: ips-host
  recommend:
  - match:
    - label: tuned
      value: ips
    priority: 20
    profile: ips-host


3. fix the tuned CR or delete and replace with fixed removing the duplicate kernel.pid_max

# oc delete tuned ips
# oc create -f fixed-ips.yaml 


Actual results:
Nothing 

Expected results:
Tuned profile to get updated 


Additional info:

In order for this to take effect the tuned pod has to be restarted manually

--- Additional comment from Ryan Howe on 2020-04-14 20:11:35 UTC ---

Typo in my steps 

# oc lable node NODE1 tuned=ips 


THe issue is still present I just typed up the steps wrong.

--- Additional comment from  on 2020-04-16 07:22:31 UTC ---

Upstream fix for 4.5: https://github.com/openshift/cluster-node-tuning-operator/pull/123

Comment 1 Jiří Mencák 2020-04-16 09:51:28 UTC
Upstream PR for 4.4: https://github.com/openshift/cluster-node-tuning-operator/pull/125

Comment 4 Jiří Mencák 2020-04-18 05:59:03 UTC
Fixed in 4.4.0-0.nightly-2020-04-17-202411 and later.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-04-17-202411   True        False         53m     Cluster version is 4.4.0-0.nightly-2020-04-17-202411

$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-133-220.eu-west-1.compute.internal   Ready    master   72m   v1.17.1
ip-10-0-138-5.eu-west-1.compute.internal     Ready    worker   63m   v1.17.1
ip-10-0-151-119.eu-west-1.compute.internal   Ready    worker   63m   v1.17.1
ip-10-0-153-38.eu-west-1.compute.internal    Ready    master   72m   v1.17.1
ip-10-0-167-9.eu-west-1.compute.internal     Ready    master   72m   v1.17.1

$ oc label node ip-10-0-138-5.eu-west-1.compute.internal tuned.openshift.io/invalid-duplicate-sysctl-key=

$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: invalid-duplicate-sysctl-key
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Testing an invalid tuned profile, duplicate keys
      [sysctl]
      kernel.pid_max=1048576
      kernel.pid_max=1048576
    name: invalid-duplicate-sysctl-key
  recommend:
  - match:
    - label: tuned.openshift.io/invalid-duplicate-sysctl-key
    priority: 20
    profile: invalid-duplicate-sysctl-key
EOF
tuned.tuned.openshift.io/invalid-duplicate-sysctl-key created

$ oc project openshift-cluster-node-tuning-operator

$  oc get pods -o wide
NAME                                            READY   STATUS    RESTARTS   AGE   IP             NODE                                         NOMINATED NODE   READINESS GATES
cluster-node-tuning-operator-58c96b5964-q4m7j   1/1     Running   0          78m   10.129.0.8     ip-10-0-153-38.eu-west-1.compute.internal    <none>           <none>
tuned-h8zzf                                     1/1     Running   0          71m   10.0.167.9     ip-10-0-167-9.eu-west-1.compute.internal     <none>           <none>
tuned-hg7fd                                     1/1     Running   0          71m   10.0.133.220   ip-10-0-133-220.eu-west-1.compute.internal   <none>           <none>
tuned-hs6fr                                     1/1     Running   0          65m   10.0.138.5     ip-10-0-138-5.eu-west-1.compute.internal     <none>           <none>
tuned-k2h8f                                     1/1     Running   0          65m   10.0.151.119   ip-10-0-151-119.eu-west-1.compute.internal   <none>           <none>
tuned-vq9tt                                     1/1     Running   0          71m   10.0.153.38    ip-10-0-153-38.eu-west-1.compute.internal    <none>           <none>

$ oc logs tuned-hs6fr | tail -n7
I0418 05:50:28.016751    1877 tuned.go:384] sending HUP to PID 2544
2020-04-18 05:50:28,016 INFO     tuned.daemon.daemon: stopping tuning
2020-04-18 05:50:28,050 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2020-04-18 05:50:28,056 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2020-04-18 05:50:28,057 INFO     tuned.daemon.daemon: Using 'invalid-duplicate-sysctl-key' profile
2020-04-18 05:50:28,057 INFO     tuned.profiles.loader: loading profile: invalid-duplicate-sysctl-key
2020-04-18 05:50:28,057 ERROR    tuned.daemon.controller: Failed to reload Tuned: Cannot load profile(s) 'invalid-duplicate-sysctl-key': ("Cannot parse '/etc/tuned/invalid-duplicate-sysctl-key/tuned.conf'.", DuplicateError('Duplicate keyword name at line 5.',))

$ oc apply -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: invalid-duplicate-sysctl-key
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Testing an invalid tuned profile, duplicate keys
      [sysctl]
      kernel.pid_max=1048576
      #kernel.pid_max=1048576
    name: invalid-duplicate-sysctl-key
  recommend:
  - match:
    - label: tuned.openshift.io/invalid-duplicate-sysctl-key
    priority: 20
    profile: invalid-duplicate-sysctl-key
EOF

$ oc logs tuned-hs6fr | tail -n7
I0418 05:53:38.920945    1877 tuned.go:384] sending HUP to PID 2544
2020-04-18 05:53:38,921 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2020-04-18 05:53:38,921 INFO     tuned.daemon.daemon: Using 'invalid-duplicate-sysctl-key' profile
2020-04-18 05:53:38,922 INFO     tuned.profiles.loader: loading profile: invalid-duplicate-sysctl-key
2020-04-18 05:53:38,922 INFO     tuned.daemon.daemon: starting tuning
2020-04-18 05:53:38,923 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2020-04-18 05:53:38,924 INFO     tuned.daemon.daemon: static tuning from profile 'invalid-duplicate-sysctl-key' applied

$ oc rsh tuned-hs6fr
sh-4.2# sysctl kernel.pid_max
kernel.pid_max = 1048576
sh-4.2#

Comment 5 Simon 2020-04-18 16:48:33 UTC
Verification positive!

oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-130-5.us-east-2.compute.internal     Ready    master   60m   v1.17.1
ip-10-0-140-247.us-east-2.compute.internal   Ready    worker   49m   v1.17.1
ip-10-0-151-195.us-east-2.compute.internal   Ready    master   60m   v1.17.1
ip-10-0-156-18.us-east-2.compute.internal    Ready    worker   49m   v1.17.1
ip-10-0-167-184.us-east-2.compute.internal   Ready    worker   49m   v1.17.1
ip-10-0-175-106.us-east-2.compute.internal   Ready    master   60m   v1.17.1

node=ip-10-0-140-247.us-east-2.compute.internal
oc label node $node tuned=ips
node/ip-10-0-140-247.us-east-2.compute.internal labeled

oc create -f ips.yaml
oc project openshift-cluster-node-tuning-operator

oc get tuned
NAME       AGE
default    59m
ips        11s
rendered   59m

oc get pods -o wide | grep $node
tuned-tgvgz                                     1/1     Running   0          54m   10.0.140.247   ip-10-0-140-247.us-east-2.compute.internal   <none>           <none>

oc logs tuned-tgvgz
2020-04-18 16:39:35,739 ERROR    tuned.daemon.controller: Failed to reload Tuned: Cannot load profile(s) 'ips-host': ("Cannot parse '/etc/tuned/ips-host/tuned.conf'.", DuplicateError('Duplicate keyword name at line 13.',))

oc edit tuned ips # Remove doubled key
oc logs tuned-tgvgz
2020-04-18 16:46:07,196 INFO     tuned.daemon.daemon: static tuning from profile 'ips-host' applied

oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-04-18-095545   True        False         54m     Cluster version is 4.4.0-0.nightly-2020-04-18-095545

Comment 7 errata-xmlrpc 2020-05-04 11:49:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.