Bug 1769812 - A long period of time to restore tuned default.
Summary: A long period of time to restore tuned default.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Tuning Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.4.0
Assignee: Jiří Mencák
QA Contact: Simon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-07 14:40 UTC by Simon
Modified: 2020-05-13 21:52 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-13 21:52:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-13 21:52:24 UTC

Description Simon 2019-11-07 14:40:20 UTC
Description of problem:
Comparing to previous version (4.2) it takes long time to restore default tuned.

| ver.     | 4.2 |  4.3 |
| new tuned| 31s |  49s |
| restore  |  8s | 464s | its ~60 times slower! 

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2019-11-02-092336

How reproducible:
100% The best time I had was 86s

Steps to Reproduce:
I was using two terminal windows.

1. Terminal 1 - create cr
oc new-project my-logging-project

oc create -f https://raw.githubusercontent.com/hongkailiu/svt-case-doc/master/files/pod_test.yaml

2. Terminal 2 - check which node will be tuned and debug this node:
oc get pod web -A -o wide ##to get node

oc debug node/<my_node>
# on pod:
chroot /host

i=0; while [[ "$(sysctl kernel.pid_max | cut -d ' ' -f 3)" != "131074" ]]; do sysctl kernel.pid_max; sleep 1;i=$[$i+1]; echo "time: $i"; done

## this will create loop to check changed value by NTO

3. Terminal 1 - label pod and create new tuned.

oc label pod web -n my-logging-project tuned.openshift.io/elasticsearch=

oc create -f https://raw.githubusercontent.com/openshift/svt/master/openshift_tooling/node_tuning_operator/content/tuned-kernel-pid_max.yml

## Creating new tuned check time on Terminal 2

4. Check time when node will be tuned by new tuned.

5. Do something similar deleting tuned.

6. Terminal 2 - check node for changes

i=0; while [[ "$(sysctl kernel.pid_max | cut -d ' ' -f 3)" != "4194304" ]]; do sysctl kernel.pid_max; sleep 1;i=$[$i+1]; echo "time: $i"; done

7. Terminal 1 - delete tuned

oc delete tuned max-pid-test -n openshift-cluster-node-tuning-operator

## deleting tuned check time on terminal 2

8. Check time when default value on node will be restored.


Actual results:
Restoring default values taking too long comparing to previous versions.

Expected results:
Similar time to restore default values.

Comment 5 Simon 2019-12-13 20:34:31 UTC
RETEST POSITIVE:

```bash
Cluster version is 4.4.0-0.nightly-2019-12-13-082744

oc get pods -n openshift-cluster-node-tuning-operator 
NAME                                            READY   STATUS    RESTARTS   AGE
cluster-node-tuning-operator-7666899684-vljzz   1/1     Running   0          73m
tuned-5lpp7                                     1/1     Running   0          73m
tuned-c9t9n                                     1/1     Running   0          69m
tuned-cvmbn                                     1/1     Running   0          69m
tuned-l4qk2                                     1/1     Running   0          73m
tuned-qvfd2                                     1/1     Running   0          73m
tuned-tvpzc                                     1/1     Running   0          68m

oc rsh -n openshift-cluster-node-tuning-operator cluster-node-tuning-operator-7666899684-vljzz
sh-4.2$ cluster-node-tuning-operator --version
I1213 18:55:48.964995      73 main.go:22] Go Version: go1.12.12
I1213 18:55:48.965124      73 main.go:23] Go OS/Arch: linux/amd64
I1213 18:55:48.965142      73 main.go:24] node-tuning Version: v4.4.0-201912110523-0-g4db2d1c-dirty
```

Now waiting for new tuned value ~1-2 sec!
Restore ~ 3-5 sec!
WOW!

Comment 7 errata-xmlrpc 2020-05-13 21:52:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.