Bug 2021067 - Extensive number of requests from storage version operator in cluster
Summary: Extensive number of requests from storage version operator in cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-storage-version-migrator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.10.0
Assignee: Luis Sanchez
QA Contact: Rahul Gangwar
URL:
Whiteboard:
Depends On:
Blocks: 2022528
TreeView+ depends on / blocked
 
Reported: 2021-11-08 09:22 UTC by Michal Fojtik
Modified: 2022-03-10 16:25 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2022528 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:25:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-storage-version-migrator-operator pull 73 0 None Merged cleanup kube-storage-version-migrator-operator 2021-11-08 22:10:53 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:25:52 UTC

Description Michal Fojtik 2021-11-08 09:22:16 UTC
Description of problem:

Based on investigation of https://bugzilla.redhat.com/show_bug.cgi?id=2015052 the following was found:

The kube-storage-version-migrator-operator is number two in top5 requests in "busy" cluster:

system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount	418144
system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator	156061
system:serviceaccount:openshift-controller-manager-operator:openshift-controller-manager-operator	134877
system:apiserver	133597

And number 4 in requesting the "cluster" resource:

/apis/authorization.k8s.io/v1/subjectaccessreviews?timeout=10s	104472
/apis/operator.openshift.io/v1/openshiftcontrollermanagers/cluster	96652
/apis/authentication.k8s.io/v1/tokenreviews	68198
/apis/operator.openshift.io/v1/kubestorageversionmigrators/cluster	64526


And

kubestorageversionmigrators.v1.operator.openshift.io.yaml 7432

This means there were 7432 requests in 60 minutes from this operator.

This all suggests there might be some leak in the operator as the assumption is we should not see this many requests in an idle (although relatively big) cluster.

The must-gathers and other data can be found in original bug, this one is only tracking the kube storage migrator.

Version-Release number of selected component (if applicable):

4.8.z (but probably not limited to)

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

storage migrator operator produces extensive amount of API requests

Expected results:

storage migrator operator is silent in the idle cluster that is a past the upgrade.

Additional info:

Comment 2 Rahul Gangwar 2021-11-11 11:02:27 UTC
kube-storage-version-migrator-operator and kubestorageversionmigrators/cluster are low in 3 hours. @Luis Sanchez Is this count is acceptable for verfification?

oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-11-09-181140   True        False         3h34m   Cluster version is 4.10.0-0.nightly-2021-11-09-181140

 for i in `oc get node|grep master|awk '{print $1}'`;do oc debug node/$i -- chroot /host bash -c "cat /var/log/kube-apiserver/audit*.log|jq -r '.user.username'|sort |uniq  -c|sort -nr|grep kube-storage-version-migrator-operator";done
W1111 16:25:13.067792   58618 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/rgangwar-11de9-clq74-master-0-debug ...
To use host binaries, run `chroot /host`

Removing debug pod ...
W1111 16:25:22.763541   58635 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/rgangwar-11de9-clq74-master-1-debug ...
To use host binaries, run `chroot /host`
   2322 system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator

Removing debug pod ...
W1111 16:25:41.543509   58669 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/rgangwar-11de9-clq74-master-2-debug ...
To use host binaries, run `chroot /host`
    110 system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator

for i in `oc get node|grep master|awk '{print $1}'`;do oc debug node/$i -- chroot /host bash -c "cat /var/log/kube-apiserver/audit*.log|jq -r '.requestURI'|sort |uniq  -c|sort -nr|grep kubestorageversionmigrators/cluster";done
W1111 16:00:27.994058   56694 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/rgangwar-11de9-clq74-master-0-debug ...
To use host binaries, run `chroot /host`
     57 /apis/operator.openshift.io/v1/kubestorageversionmigrators/cluster

Removing debug pod ...
W1111 16:00:42.538018   56712 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
W1111 16:00:43.761608   56712 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/rgangwar-11de9-clq74-master-1-debug ...
To use host binaries, run `chroot /host`
      2 /apis/operator.openshift.io/v1/kubestorageversionmigrators/cluster

Removing debug pod ...
W1111 16:01:01.698012   56744 warnings.go:70] would violate "latest" version of "baseline" PodSecurity profile: host namespaces (hostNetwork=true, hostPID=true), hostPath volumes (volume "host"), privileged (container "container-00" must not set securityContext.privileged=true)
Starting pod/rgangwar-11de9-clq74-master-2-debug ...
To use host binaries, run `chroot /host`
      6 /apis/operator.openshift.io/v1/kubestorageversionmigrators/cluster/status
      1 /apis/operator.openshift.io/v1/kubestorageversionmigrators/cluster

Removing debug pod ...

Comment 3 Luis Sanchez 2021-11-11 15:28:24 UTC
@rgangwar: Looks good, but for more context, compare to all the requests. You can run this command remotely:

oc adm node-logs --role=master --path="kube-apiserver" | \
grep -v -E "(.terminating|.lock|termination.log)" | \
sed "s|^| kube-apiserver |" | \
xargs --max-args=3 bash -c 'oc adm node-logs $2 --path=$1/$3' bash | \
# grep 'namespaces/openshift-kube-storage-version-migrator' | \
jq -r '.user.username+" "+.useragent+" "+.verb+" "+.requestURI' | sort | uniq -c | sort -n |tail -n 10

> Note the commented out part of the command that can be used to limit the count to migrator related logs.

Comment 4 Rahul Gangwar 2021-11-12 04:16:36 UTC
@Luis Sanchez Below is the output for your suggested query. I think count is low. Can you please confirm?

oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-11-09-181140   True        False         11h     Cluster version is 4.10.0-0.nightly-2021-11-09-181140
rahulgangwar@rgangwar-mac hosts % oc adm node-logs --role=master --path="kube-apiserver"|grep -v -E "(.terminating|.lock|termination.log)"|sed "s|^| kube-apiserver |"|xargs  bash -c 'oc adm node-logs $2 --path=$1/$3' bash|jq -r '.user.username+" "+.useragent+" "+.verb+" "+.requestURI' | sort | uniq -c | sort -n |tail -n 10
1089 system:serviceaccount:openshift-authentication:oauth-openshift  create /apis/authorization.k8s.io/v1/subjectaccessreviews?timeout=10s
1173 system:anonymous  get /.well-known/oauth-authorization-server
1217 system:serviceaccount:openshift-apiserver:openshift-apiserver-sa  create /apis/authorization.k8s.io/v1/subjectaccessreviews?timeout=10s
1273 system:serviceaccount:openshift-apiserver:openshift-apiserver-sa  get /api/v1/namespaces/default/services/docker-registry
1398 system:serviceaccount:openshift-monitoring:prometheus-k8s  get /metrics
2119 system:anonymous  get /livez
2127 system:apiserver  get /api/v1/namespaces/default
2127 system:apiserver  get /api/v1/namespaces/default/services/kubernetes
2128 system:apiserver  get /api/v1/namespaces/default/endpoints/kubernetes
2128 system:apiserver  get /apis/discovery.k8s.io/v1/namespaces/default/endpointslices/kubernetes

rahulgangwar@rgangwar-mac hosts % oc adm node-logs --role=master --path="kube-apiserver"|grep -v -E "(.terminating|.lock|termination.log)"|sed "s|^| kube-apiserver |"|xargs  bash -c 'oc adm node-logs $2 --path=$1/$3' bash|grep 'namespaces/openshift-kube-storage-version-migrator'|jq -r '.user.username+" "+.useragent+" "+.verb+" "+.requestURI' | sort | uniq -c | sort -n |tail -n 10 
     3 system:serviceaccount:kube-system:generic-garbage-collector  get /apis/apps/v1/namespaces/openshift-kube-storage-version-migrator-operator/deployments/kube-storage-version-migrator-operator
   3 system:serviceaccount:openshift-insights:operator  list /api/v1/namespaces/openshift-kube-storage-version-migrator-operator/serviceaccounts?limit=1000
   3 system:serviceaccount:openshift-insights:operator  list /api/v1/namespaces/openshift-kube-storage-version-migrator/serviceaccounts?limit=1000
   3 system:serviceaccount:openshift-insights:operator  list /apis/operators.coreos.com/v1alpha1/namespaces/openshift-kube-storage-version-migrator-operator/installplans?limit=500
   3 system:serviceaccount:openshift-insights:operator  list /apis/operators.coreos.com/v1alpha1/namespaces/openshift-kube-storage-version-migrator/installplans?limit=500
  12 system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator  get /api/v1/namespaces/openshift-kube-storage-version-migrator
  13 system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator  get /api/v1/namespaces/openshift-kube-storage-version-migrator/serviceaccounts/kube-storage-version-migrator-sa
  19 system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator  get /api/v1/namespaces/openshift-kube-storage-version-migrator-operator/configmaps/openshift-kube-storage-version-migrator-operator-lock?timeout=1m47s
  19 system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator  update /api/v1/namespaces/openshift-kube-storage-version-migrator-operator/configmaps/openshift-kube-storage-version-migrator-operator-lock?timeout=1m47s
  28 system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator  get /apis/apps/v1/namespaces/openshift-kube-storage-version-migrator/deployments/migrator

Comment 5 Luis Sanchez 2021-11-12 13:18:49 UTC
The new numbers look good to me. Thanks.

Comment 8 errata-xmlrpc 2022-03-10 16:25:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.