Bug 1823941 - Tuned profile is not updated after incorrect tuned CR is fixed
Summary: Tuned profile is not updated after incorrect tuned CR is fixed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Jiří Mencák
QA Contact: Simon
URL:
Whiteboard:
Depends On:
Blocks: 1824473
TreeView+ depends on / blocked
 
Reported: 2020-04-14 20:10 UTC by Ryan Howe
Modified: 2024-03-25 15:49 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The Node Tuning Operator does not ship with the fixes to address tuned daemon behaviour for 1774645 and 1702724. Consequence: When an invalid profile is specified by the user, this results in a DoS of the operand's (tuned daemon) functionality and a correction of that profile does not restore the operand's functionality. Fix: Apply fixes for 1774645 and 1702724 to the tuned daemon. Result: Tuned daemon will correctly process and set set the new, corrected profile.
Clone Of:
: 1824473 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:27:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:28:20 UTC

Description Ryan Howe 2020-04-14 20:10:41 UTC
Description of problem:

 When tuned CR is created containing an error, fixing the the mistake in the CR does not update the tuned profile on the host. The tuned pod has to be manually restarted for the changes to take effect. 


Version-Release number of selected component (if applicable):
4.3

How reproducible:
100%

Steps to Reproduce:
1. # oc lable node NODE1 tuned=test

2. Create tuned CR with error the value kernel.pid_max=1048575 is set 2x. 

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: ips
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=A custom OpenShift IPS host profile
      [sysctl]
      kernel.msgmni=4096
      kernel.pid_max=1048575
      kernel.shmmax=180000000
      kernel.sem="128 1048576 32 32768"
      net.core.rmem_default=>33554431
      net.core.rmem_max=>33554431
      fs.file-max=>240000
      vm.dirty_background_ratio=64
      vm.dirty_ratio=72
      kernel.pid_max=1048575
    name: ips-host
  recommend:
  - match:
    - label: tuned
      value: ips
    priority: 20
    profile: ips-host


3. fix the tuned CR or delete and replace with fixed removing the duplicate kernel.pid_max

# oc delete tuned ips
# oc create -f fixed-ips.yaml 


Actual results:
Nothing 

Expected results:
Tuned profile to get updated 


Additional info:

In order for this to take effect the tuned pod has to be restarted manually

Comment 1 Ryan Howe 2020-04-14 20:11:35 UTC
Typo in my steps 

# oc lable node NODE1 tuned=ips 


THe issue is still present I just typed up the steps wrong.

Comment 2 Jiří Mencák 2020-04-16 07:22:31 UTC
Upstream fix for 4.5: https://github.com/openshift/cluster-node-tuning-operator/pull/123

Comment 3 Jiří Mencák 2020-04-17 06:59:23 UTC
Fixed in 4.5.0-0.nightly-2020-04-17-012157 and later.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-04-17-012157   True        False         31m     Cluster version is 4.5.0-0.nightly-2020-04-17-012157

$ oc get nodes
NAME                                                    STATUS   ROLES    AGE   VERSION
jmenca-gd69f-m-0.c.openshift-gce-devel.internal         Ready    master   69m   v1.18.0-rc.1
jmenca-gd69f-m-1.c.openshift-gce-devel.internal         Ready    master   69m   v1.18.0-rc.1
jmenca-gd69f-m-2.c.openshift-gce-devel.internal         Ready    master   69m   v1.18.0-rc.1
jmenca-gd69f-w-a-z6p45.c.openshift-gce-devel.internal   Ready    worker   47m   v1.18.0-rc.1
jmenca-gd69f-w-b-vk45l.c.openshift-gce-devel.internal   Ready    worker   47m   v1.18.0-rc.1

$ oc label node jmenca-gd69f-w-a-z6p45.c.openshift-gce-devel.internal tuned.openshift.io/invalid-duplicate-sysctl-key=

$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: invalid-duplicate-sysctl-key
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Testing an invalid tuned profile, duplicate keys
      [sysctl]
      kernel.pid_max=1048576
      kernel.pid_max=1048576
    name: invalid-duplicate-sysctl-key
  recommend:
  - match:
    - label: tuned.openshift.io/invalid-duplicate-sysctl-key
    priority: 20
    profile: invalid-duplicate-sysctl-key
EOF
tuned.tuned.openshift.io/invalid-duplicate-sysctl-key created

$ oc project openshift-cluster-node-tuning-operator

$ oc get pods -o wide
NAME                                          READY   STATUS    RESTARTS   AGE   IP           NODE                                                    NOMINATED NODE   READINESS GATES
cluster-node-tuning-operator-b5b9d88f-ftfsw   1/1     Running   0          86m   10.129.0.6   jmenca-gd69f-m-0.c.openshift-gce-devel.internal         <none>           <none>
tuned-2x9mv                                   1/1     Running   0          63m   10.0.32.3    jmenca-gd69f-w-b-vk45l.c.openshift-gce-devel.internal   <none>           <none>
tuned-6gvnp                                   1/1     Running   0          81m   10.0.0.3     jmenca-gd69f-m-2.c.openshift-gce-devel.internal         <none>           <none>
tuned-krpdt                                   1/1     Running   0          81m   10.0.0.4     jmenca-gd69f-m-0.c.openshift-gce-devel.internal         <none>           <none>
tuned-t8zdr                                   1/1     Running   0          63m   10.0.32.2    jmenca-gd69f-w-a-z6p45.c.openshift-gce-devel.internal   <none>           <none>
tuned-vb4m7                                   1/1     Running   0          81m   10.0.0.5     jmenca-gd69f-m-1.c.openshift-gce-devel.internal         <none>           <none>

$ oc logs tuned-t8zdr | tail -n7
I0417 06:37:16.839362    2256 tuned.go:432] sending HUP to PID 2963
2020-04-17 06:37:16,839 INFO     tuned.daemon.daemon: stopping tuning
2020-04-17 06:37:17,031 INFO     tuned.daemon.daemon: terminating Tuned, rolling back all changes
2020-04-17 06:37:17,044 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2020-04-17 06:37:17,045 INFO     tuned.daemon.daemon: Using 'invalid-duplicate-sysctl-key' profile
2020-04-17 06:37:17,045 INFO     tuned.profiles.loader: loading profile: invalid-duplicate-sysctl-key
2020-04-17 06:37:17,046 ERROR    tuned.daemon.controller: Failed to reload Tuned: Cannot load profile(s) 'invalid-duplicate-sysctl-key': ("Cannot parse '/etc/tuned/invalid-duplicate-sysctl-key/tuned.conf'.", DuplicateError('Duplicate keyword name at line 5.',))

$ oc apply -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: invalid-duplicate-sysctl-key
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Testing an invalid tuned profile, duplicate keys
      [sysctl]
      kernel.pid_max=1048576
      #kernel.pid_max=1048576
    name: invalid-duplicate-sysctl-key
  recommend:
  - match:
    - label: tuned.openshift.io/invalid-duplicate-sysctl-key
    priority: 20
    profile: invalid-duplicate-sysctl-key
EOF

$ oc logs tuned-t8zdr | tail -n7
I0417 06:37:43.715550    2256 tuned.go:432] sending HUP to PID 2963
2020-04-17 06:37:43,715 INFO     tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2020-04-17 06:37:43,716 INFO     tuned.daemon.daemon: Using 'invalid-duplicate-sysctl-key' profile
2020-04-17 06:37:43,717 INFO     tuned.profiles.loader: loading profile: invalid-duplicate-sysctl-key
2020-04-17 06:37:43,717 INFO     tuned.daemon.daemon: starting tuning
2020-04-17 06:37:43,719 INFO     tuned.plugins.plugin_sysctl: reapplying system sysctl
2020-04-17 06:37:43,720 INFO     tuned.daemon.daemon: static tuning from profile 'invalid-duplicate-sysctl-key' applied

$ oc rsh tuned-t8zdr
sh-4.2# sysctl kernel.pid_max
kernel.pid_max = 1048576

Comment 4 Simon 2020-04-17 16:37:31 UTC
Verification positive!!

oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-134-50.us-east-2.compute.internal    Ready    worker   11m   v1.18.0-rc.1
ip-10-0-139-49.us-east-2.compute.internal    Ready    master   24m   v1.18.0-rc.1
ip-10-0-150-9.us-east-2.compute.internal     Ready    worker   11m   v1.18.0-rc.1
ip-10-0-159-147.us-east-2.compute.internal   Ready    master   23m   v1.18.0-rc.1
ip-10-0-160-111.us-east-2.compute.internal   Ready    master   24m   v1.18.0-rc.1
ip-10-0-167-220.us-east-2.compute.internal   Ready    worker   11m   v1.18.0-rc.1

NODE1=ip-10-0-134-50.us-east-2.compute.internal
oc label node $NODE1 tuned=ips
node/ip-10-0-134-50.us-east-2.compute.internal labeled

I have used provided example by Ryan

oc create -f ips.yaml

oc get tuned
NAME       AGE
default    26m
ips        19s
rendered   26m

oc get pods -o wide | grep $NODE1
tuned-6mqzq                                     1/1     Running   0          18m   10.0.134.50    ip-10-0-134-50.us-east-2.compute.internal    <none>           <none>

oc logs tuned-6mqzq
2020-04-17 16:25:08,886 ERROR    tuned.daemon.controller: Failed to reload Tuned: Cannot load profile(s) 'ips-host': ("Cannot parse '/etc/tuned/ips-host/tuned.conf'.", DuplicateError('Duplicate keyword name at line 13.',))

oc edit tuned ips # Remove doubled key

oc logs tuned-6mqzq
2020-04-17 16:33:57,534 INFO     tuned.daemon.daemon: static tuning from profile 'ips-host' applied

oc get clusterversions.config.openshift.io
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-04-17-083506   True        False         20m     Cluster version is 4.5.0-0.nightly-2020-04-17-083506

Comment 6 errata-xmlrpc 2020-07-13 17:27:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.