Bug 1714484

Summary: So many secrets generated under openshift-cluster-node-tuning-operator namespace after the cluster serve several days
Product: OpenShift Container Platform Reporter: weiwei jiang <wjiang>
Component: Node Tuning OperatorAssignee: Jiří Mencák <jmencak>
Status: CLOSED ERRATA QA Contact: Simon <skordas>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.0CC: jmencak, mifiedle, rhowe, sejug, skordas, sponnaga, xtian
Target Milestone: ---Keywords: OSE41z_next
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: 4.1.2
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The reconciliation loop of the node-tuning operator unnecessarily updated the operand's service account. Consequence: Accumulating secrets in the openshift-cluster-node-tuning-operator namespace. Fix: Adjust the reconciliation loop to make sure the service account for the operand is created when it does not exist. Result: Constant number of secrets in the openshift-cluster-node-tuning-operator namespace.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-19 06:45:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1718956    

Description weiwei jiang 2019-05-28 08:32:00 UTC
Description of problem:
Found there are so many secrets in openshift-cluster-node-tuning-operator(the cluster serve about 4 days).

#  for i in `oc get project -o jsonpath='{.items..metadata.name}'`; do echo -n -e "Project: $i => Secret: "; oc get secret -n $i | wc -l; done | column -t                                                                                                                   
Project:  default                                                =>  Secret:  13
Project:  kube-public                                            =>  Secret:  10
Project:  kube-system                                            =>  Secret:  84
Project:  openshift                                              =>  Secret:  11
Project:  openshift-ansible-service-broker                       =>  Secret:  22
Project:  openshift-apiserver                                    =>  Secret:  15
Project:  openshift-apiserver-operator                           =>  Secret:  14
Project:  openshift-authentication                               =>  Secret:  18
Project:  openshift-authentication-operator                      =>  Secret:  14
Project:  openshift-cloud-credential-operator                    =>  Secret:  10
Project:  openshift-cluster-machine-approver                     =>  Secret:  13
Project:  openshift-cluster-node-tuning-operator                 =>  Secret:  2066
Project:  openshift-cluster-samples-operator                     =>  Secret:  13
Project:  openshift-cluster-storage-operator                     =>  Secret:  13
Project:  openshift-cluster-version                              =>  Secret:  10
Project:  openshift-config                                       =>  Secret:  17
Project:  openshift-config-managed                               =>  Secret:  13
Project:  openshift-console                                      =>  Secret:  12
Project:  openshift-console-operator                             =>  Secret:  13
Project:  openshift-controller-manager                           =>  Secret:  14
Project:  openshift-controller-manager-operator                  =>  Secret:  14
Project:  openshift-dns                                          =>  Secret:  13
Project:  openshift-dns-operator                                 =>  Secret:  13
Project:  openshift-etcd                                         =>  Secret:  10
Project:  openshift-image-registry                               =>  Secret:  21
Project:  openshift-infra                                        =>  Secret:  70
Project:  openshift-ingress                                      =>  Secret:  17
Project:  openshift-ingress-operator                             =>  Secret:  14
Project:  openshift-kube-apiserver                               =>  Secret:  51
Project:  openshift-kube-apiserver-operator                      =>  Secret:  20
Project:  openshift-kube-controller-manager                      =>  Secret:  48
Project:  openshift-kube-controller-manager-operator             =>  Secret:  17
Project:  openshift-kube-scheduler                               =>  Secret:  31
Project:  openshift-kube-scheduler-operator                      =>  Secret:  14
Project:  openshift-machine-api                                  =>  Secret:  21
Project:  openshift-machine-config-operator                      =>  Secret:  24
Project:  openshift-marketplace                                  =>  Secret:  26
Project:  openshift-monitoring                                   =>  Secret:  56
Project:  openshift-multus                                       =>  Secret:  13
Project:  openshift-network-operator                             =>  Secret:  10
Project:  openshift-node                                         =>  Secret:  10
Project:  openshift-operator-lifecycle-manager                   =>  Secret:  22
Project:  openshift-operators                                    =>  Secret:  10
Project:  openshift-sdn                                          =>  Secret:  16
Project:  openshift-service-ca                                   =>  Secret:  20
Project:  openshift-service-ca-operator                          =>  Secret:  13
Project:  openshift-service-catalog-apiserver                    =>  Secret:  15
Project:  openshift-service-catalog-apiserver-operator           =>  Secret:  14
Project:  openshift-service-catalog-controller-manager           =>  Secret:  14
Project:  openshift-service-catalog-controller-manager-operator  =>  Secret:  14
Project:  openshift-template-service-broker                      =>  Secret:  21


The cluster is running more than 3 days, and have several times upgrade on it.


Version-Release number of selected component (if applicable):
The cluster is installed with 4.1.0-0.nightly-2019-05-17-110425, 
but we upgrade several times, and currently the cluster is
4.1.0-0.nightly-2019-05-24-040103

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
After several days, node-tuning-operator namespace have nearly 2000 secrets
it's much more than other namespaces

Expected results:
Should not have so many secrets.

Additional info:
# oc get nodes -o wide 
NAME                                         STATUS   ROLES    AGE     VERSION             INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                   KERNEL-VERSION               CONTAINER-RUNTIME
dell-r730-063.dsal.lab.eng.rdu2.redhat.com   Ready    master   3d19h   v1.13.4+cb455d664   10.1.8.73     <none>        Red Hat Enterprise Linux CoreOS 410.8.20190520.0 (Ootpa)   4.18.0-80.1.2.el8_0.x86_64   cri-o://1.13.9-1.rhaos4.1.gitd70609a.el8
dell-r730-064.dsal.lab.eng.rdu2.redhat.com   Ready    master   3d19h   v1.13.4+cb455d664   10.1.8.74     <none>        Red Hat Enterprise Linux CoreOS 410.8.20190520.0 (Ootpa)   4.18.0-80.1.2.el8_0.x86_64   cri-o://1.13.9-1.rhaos4.1.gitd70609a.el8
dell-r730-065.dsal.lab.eng.rdu2.redhat.com   Ready    master   3d19h   v1.13.4+cb455d664   10.1.8.75     <none>        Red Hat Enterprise Linux CoreOS 410.8.20190520.0 (Ootpa)   4.18.0-80.1.2.el8_0.x86_64   cri-o://1.13.9-1.rhaos4.1.gitd70609a.el8
dell-r730-066.dsal.lab.eng.rdu2.redhat.com   Ready    worker   3d19h   v1.13.4+cb455d664   10.1.8.76     <none>        Red Hat Enterprise Linux CoreOS 410.8.20190520.0 (Ootpa)   4.18.0-80.1.2.el8_0.x86_64   cri-o://1.13.9-1.rhaos4.1.gitd70609a.el8
dell-r730-067.dsal.lab.eng.rdu2.redhat.com   Ready    worker   3d19h   v1.13.4+cb455d664   10.1.8.77     <none>        Red Hat Enterprise Linux CoreOS 410.8.20190520.0 (Ootpa)   4.18.0-80.1.2.el8_0.x86_64   cri-o://1.13.9-1.rhaos4.1.gitd70609a.el8
dell-r730-068.dsal.lab.eng.rdu2.redhat.com   Ready    worker   3d17h   v1.13.4+54aa63688   10.1.8.78     <none>        Red Hat Enterprise Linux Server 7.6 (Maipo)                3.10.0-957.el7.x86_64        cri-o://1.13.6-1.dev.rhaos4.1.gitee2e748.el7-dev

Comment 3 Jiří Mencák 2019-05-28 15:10:26 UTC
Fix for release-4.1 branch: https://github.com/openshift/cluster-node-tuning-operator/pull/59

Comment 8 Ryan Howe 2019-06-04 02:17:49 UTC
*** Bug 1716600 has been marked as a duplicate of this bug. ***

Comment 11 Mike Fiedler 2019-06-05 16:00:29 UTC
@jmencak - To safely delete the extraneous secrets, can all but the most recent tuned-token and tuned-dockercfg secrets be deleted?

Comment 12 Jiří Mencák 2019-06-05 21:30:12 UTC
(In reply to Mike Fiedler from comment #11)
> @jmencak - To safely delete the extraneous secrets, can all but the most
> recent tuned-token and tuned-dockercfg secrets be deleted?

Unfortunately, that wouldn't work.  At this point, the suggested cleanup after an upgrade from a version affected by this bug to 4.1.1 (or a version that has the fix) is:

$ oc get secrets -n openshift-cluster-node-tuning-operator | awk '/^tuned-/ {print $1}' | xargs oc delete secrets
$ oc delete ds/tuned -n openshift-cluster-node-tuning-operator

Comment 13 weiwei jiang 2019-06-05 23:57:40 UTC
Checked with 4.1.0-0.nightly-2019-06-04-235906, and new fresh installation cluster will not burst so many secrets.

And as jmencak said, existed thousands of secrets will not be cleaned.

Comment 14 Mike Fiedler 2019-06-06 02:00:10 UTC
@skordas - Please update the existing node tuning operator basic functionality automation to add a sanity check on the number of secrets in project.   Thanks a lot

Comment 15 Simon 2019-06-06 12:46:04 UTC
oc get clusterversions
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-06-04-235906   True        False         20h     Cluster version is 4.1.0-0.nightly-2019-06-04-235906

# oc get secrets -n openshift-cluster-node-tuning-operator
NAME                                           TYPE                                  DATA   AGE
builder-dockercfg-dp8dm                        kubernetes.io/dockercfg               1      20h
builder-token-987xv                            kubernetes.io/service-account-token   4      20h
builder-token-skzqw                            kubernetes.io/service-account-token   4      20h
cluster-node-tuning-operator-dockercfg-zgwq9   kubernetes.io/dockercfg               1      20h
cluster-node-tuning-operator-token-djrfq       kubernetes.io/service-account-token   4      20h
cluster-node-tuning-operator-token-jnkcw       kubernetes.io/service-account-token   4      20h
default-dockercfg-j999p                        kubernetes.io/dockercfg               1      20h
default-token-26nlt                            kubernetes.io/service-account-token   4      20h
default-token-wrf2s                            kubernetes.io/service-account-token   4      20h
deployer-dockercfg-c4wzz                       kubernetes.io/dockercfg               1      20h
deployer-token-54k7s                           kubernetes.io/service-account-token   4      20h
deployer-token-cnf7s                           kubernetes.io/service-account-token   4      20h
tuned-dockercfg-t6j85                          kubernetes.io/dockercfg               1      20h
tuned-token-2p74v                              kubernetes.io/service-account-token   4      20h
tuned-token-4ptrb                              kubernetes.io/service-account-token   4      20h


@mifiedle
Test case and automation are updated:
https://github.com/openshift/svt/pull/591

Comment 16 Jiří Mencák 2019-06-09 08:47:09 UTC
(In reply to weiwei jiang from comment #13)
> Checked with 4.1.0-0.nightly-2019-06-04-235906, and new fresh installation
> cluster will not burst so many secrets.
> 
> And as jmencak said, existed thousands of secrets will not be cleaned.

There is an upstream PR https://github.com/openshift/cluster-node-tuning-operator/pull/63 for automated removal of detached tuned secrets to perform the cleanup.  Should the removal of secrets be tracked as part of this BZ or a new one created?

Comment 17 Mike Fiedler 2019-06-10 11:33:39 UTC
Re: comment 16.  Opened https://bugzilla.redhat.com/show_bug.cgi?id=1718842 to track this and targeted it for 4.1.2

Comment 19 errata-xmlrpc 2019-06-19 06:45:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1382