Bug 1823941
| Summary: | Tuned profile is not updated after incorrect tuned CR is fixed | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ryan Howe <rhowe> | |
| Component: | Node Tuning Operator | Assignee: | Jiří Mencák <jmencak> | |
| Status: | CLOSED ERRATA | QA Contact: | Simon <skordas> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.3.z | CC: | sejug | |
| Target Milestone: | --- | |||
| Target Release: | 4.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause:
The Node Tuning Operator does not ship with the fixes to address tuned daemon behaviour for 1774645 and 1702724.
Consequence:
When an invalid profile is specified by the user, this results in a DoS of the operand's (tuned daemon) functionality and a correction of that profile does not restore the operand's functionality.
Fix:
Apply fixes for 1774645 and 1702724 to the tuned daemon.
Result:
Tuned daemon will correctly process and set set the new, corrected profile.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1824473 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-13 17:27:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1824473 | |||
Typo in my steps # oc lable node NODE1 tuned=ips THe issue is still present I just typed up the steps wrong. Upstream fix for 4.5: https://github.com/openshift/cluster-node-tuning-operator/pull/123 Fixed in 4.5.0-0.nightly-2020-04-17-012157 and later.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.5.0-0.nightly-2020-04-17-012157 True False 31m Cluster version is 4.5.0-0.nightly-2020-04-17-012157
$ oc get nodes
NAME STATUS ROLES AGE VERSION
jmenca-gd69f-m-0.c.openshift-gce-devel.internal Ready master 69m v1.18.0-rc.1
jmenca-gd69f-m-1.c.openshift-gce-devel.internal Ready master 69m v1.18.0-rc.1
jmenca-gd69f-m-2.c.openshift-gce-devel.internal Ready master 69m v1.18.0-rc.1
jmenca-gd69f-w-a-z6p45.c.openshift-gce-devel.internal Ready worker 47m v1.18.0-rc.1
jmenca-gd69f-w-b-vk45l.c.openshift-gce-devel.internal Ready worker 47m v1.18.0-rc.1
$ oc label node jmenca-gd69f-w-a-z6p45.c.openshift-gce-devel.internal tuned.openshift.io/invalid-duplicate-sysctl-key=
$ oc create -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
name: invalid-duplicate-sysctl-key
namespace: openshift-cluster-node-tuning-operator
spec:
profile:
- data: |
[main]
summary=Testing an invalid tuned profile, duplicate keys
[sysctl]
kernel.pid_max=1048576
kernel.pid_max=1048576
name: invalid-duplicate-sysctl-key
recommend:
- match:
- label: tuned.openshift.io/invalid-duplicate-sysctl-key
priority: 20
profile: invalid-duplicate-sysctl-key
EOF
tuned.tuned.openshift.io/invalid-duplicate-sysctl-key created
$ oc project openshift-cluster-node-tuning-operator
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-node-tuning-operator-b5b9d88f-ftfsw 1/1 Running 0 86m 10.129.0.6 jmenca-gd69f-m-0.c.openshift-gce-devel.internal <none> <none>
tuned-2x9mv 1/1 Running 0 63m 10.0.32.3 jmenca-gd69f-w-b-vk45l.c.openshift-gce-devel.internal <none> <none>
tuned-6gvnp 1/1 Running 0 81m 10.0.0.3 jmenca-gd69f-m-2.c.openshift-gce-devel.internal <none> <none>
tuned-krpdt 1/1 Running 0 81m 10.0.0.4 jmenca-gd69f-m-0.c.openshift-gce-devel.internal <none> <none>
tuned-t8zdr 1/1 Running 0 63m 10.0.32.2 jmenca-gd69f-w-a-z6p45.c.openshift-gce-devel.internal <none> <none>
tuned-vb4m7 1/1 Running 0 81m 10.0.0.5 jmenca-gd69f-m-1.c.openshift-gce-devel.internal <none> <none>
$ oc logs tuned-t8zdr | tail -n7
I0417 06:37:16.839362 2256 tuned.go:432] sending HUP to PID 2963
2020-04-17 06:37:16,839 INFO tuned.daemon.daemon: stopping tuning
2020-04-17 06:37:17,031 INFO tuned.daemon.daemon: terminating Tuned, rolling back all changes
2020-04-17 06:37:17,044 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2020-04-17 06:37:17,045 INFO tuned.daemon.daemon: Using 'invalid-duplicate-sysctl-key' profile
2020-04-17 06:37:17,045 INFO tuned.profiles.loader: loading profile: invalid-duplicate-sysctl-key
2020-04-17 06:37:17,046 ERROR tuned.daemon.controller: Failed to reload Tuned: Cannot load profile(s) 'invalid-duplicate-sysctl-key': ("Cannot parse '/etc/tuned/invalid-duplicate-sysctl-key/tuned.conf'.", DuplicateError('Duplicate keyword name at line 5.',))
$ oc apply -f- <<EOF
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
name: invalid-duplicate-sysctl-key
namespace: openshift-cluster-node-tuning-operator
spec:
profile:
- data: |
[main]
summary=Testing an invalid tuned profile, duplicate keys
[sysctl]
kernel.pid_max=1048576
#kernel.pid_max=1048576
name: invalid-duplicate-sysctl-key
recommend:
- match:
- label: tuned.openshift.io/invalid-duplicate-sysctl-key
priority: 20
profile: invalid-duplicate-sysctl-key
EOF
$ oc logs tuned-t8zdr | tail -n7
I0417 06:37:43.715550 2256 tuned.go:432] sending HUP to PID 2963
2020-04-17 06:37:43,715 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration.
2020-04-17 06:37:43,716 INFO tuned.daemon.daemon: Using 'invalid-duplicate-sysctl-key' profile
2020-04-17 06:37:43,717 INFO tuned.profiles.loader: loading profile: invalid-duplicate-sysctl-key
2020-04-17 06:37:43,717 INFO tuned.daemon.daemon: starting tuning
2020-04-17 06:37:43,719 INFO tuned.plugins.plugin_sysctl: reapplying system sysctl
2020-04-17 06:37:43,720 INFO tuned.daemon.daemon: static tuning from profile 'invalid-duplicate-sysctl-key' applied
$ oc rsh tuned-t8zdr
sh-4.2# sysctl kernel.pid_max
kernel.pid_max = 1048576
Verification positive!!
oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-134-50.us-east-2.compute.internal Ready worker 11m v1.18.0-rc.1
ip-10-0-139-49.us-east-2.compute.internal Ready master 24m v1.18.0-rc.1
ip-10-0-150-9.us-east-2.compute.internal Ready worker 11m v1.18.0-rc.1
ip-10-0-159-147.us-east-2.compute.internal Ready master 23m v1.18.0-rc.1
ip-10-0-160-111.us-east-2.compute.internal Ready master 24m v1.18.0-rc.1
ip-10-0-167-220.us-east-2.compute.internal Ready worker 11m v1.18.0-rc.1
NODE1=ip-10-0-134-50.us-east-2.compute.internal
oc label node $NODE1 tuned=ips
node/ip-10-0-134-50.us-east-2.compute.internal labeled
I have used provided example by Ryan
oc create -f ips.yaml
oc get tuned
NAME AGE
default 26m
ips 19s
rendered 26m
oc get pods -o wide | grep $NODE1
tuned-6mqzq 1/1 Running 0 18m 10.0.134.50 ip-10-0-134-50.us-east-2.compute.internal <none> <none>
oc logs tuned-6mqzq
2020-04-17 16:25:08,886 ERROR tuned.daemon.controller: Failed to reload Tuned: Cannot load profile(s) 'ips-host': ("Cannot parse '/etc/tuned/ips-host/tuned.conf'.", DuplicateError('Duplicate keyword name at line 13.',))
oc edit tuned ips # Remove doubled key
oc logs tuned-6mqzq
2020-04-17 16:33:57,534 INFO tuned.daemon.daemon: static tuning from profile 'ips-host' applied
oc get clusterversions.config.openshift.io
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.5.0-0.nightly-2020-04-17-083506 True False 20m Cluster version is 4.5.0-0.nightly-2020-04-17-083506
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |
Description of problem: When tuned CR is created containing an error, fixing the the mistake in the CR does not update the tuned profile on the host. The tuned pod has to be manually restarted for the changes to take effect. Version-Release number of selected component (if applicable): 4.3 How reproducible: 100% Steps to Reproduce: 1. # oc lable node NODE1 tuned=test 2. Create tuned CR with error the value kernel.pid_max=1048575 is set 2x. apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: ips namespace: openshift-cluster-node-tuning-operator spec: profile: - data: | [main] summary=A custom OpenShift IPS host profile [sysctl] kernel.msgmni=4096 kernel.pid_max=1048575 kernel.shmmax=180000000 kernel.sem="128 1048576 32 32768" net.core.rmem_default=>33554431 net.core.rmem_max=>33554431 fs.file-max=>240000 vm.dirty_background_ratio=64 vm.dirty_ratio=72 kernel.pid_max=1048575 name: ips-host recommend: - match: - label: tuned value: ips priority: 20 profile: ips-host 3. fix the tuned CR or delete and replace with fixed removing the duplicate kernel.pid_max # oc delete tuned ips # oc create -f fixed-ips.yaml Actual results: Nothing Expected results: Tuned profile to get updated Additional info: In order for this to take effect the tuned pod has to be restarted manually