Bug 2049671

Summary: system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator trying to GET and DELETE /api/v1/namespaces/openshift-cluster-csi-drivers/configmaps/kube-cloud-config which does not exist
Product: OpenShift Container Platform Reporter: Simon Reber <sreber>
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Storage sub component: Storage QA Contact: Penghao Wang <pewang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, fbertina, jsafrane
Version: 4.9   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: x86_64   
OS: Linux   
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:46:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Simon Reber 2022-02-02 14:25:53 UTC
Description of problem:

On a fresh installed OpenShift Container Platform 4.9.15 - Cluster (using IPI installation method on AWS), we are seeing 3898 failed requests from "system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator" to `/api/v1/namespaces/openshift-cluster-csi-drivers/configmaps/kube-cloud-config` using GET and DELETE Method.

$ oc dev_tool audit -f kube-apiserver/ -otop --by=resource --user="system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator"  --failed-only
had 133483 line read failures
count: 3899, first: 2022-02-01T04:35:29+01:00, last: 2022-02-02T14:06:41+01:00, duration: 33h31m12.448447s
3898x                v1/configmaps
1x                   storage.k8s.io/v1/csidrivers

$ oc dev_tool audit -f kube-apiserver/ -otop --by=verb --user="system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator"  --resource="configmaps" --failed-only
had 133483 line read failures
count: 3898, first: 2022-02-01T04:35:29+01:00, last: 2022-02-02T14:06:41+01:00, duration: 33h31m12.448447s

Top 10 "DELETE" (of 1949 total hits):
   1949x [  3.090904ms] [404-1948] /api/v1/namespaces/openshift-cluster-csi-drivers/configmaps/kube-cloud-config [system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator]

Top 10 "GET" (of 1949 total hits):
   1949x [  4.863701ms] [404-1948] /api/v1/namespaces/openshift-config-managed/configmaps/kube-cloud-config [system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator]

Since `openshift-config-managed/configmaps/kube-cloud-config` is only created/needed when using custom service endpoints we should provide a solution that prevents these GET and DELETE requests and only trigger them when the ConfigMap is really created.

Version-Release number of selected component (if applicable):

 - OpenShift Container Platform 4.9.15

How reproducible:

 - Always

Steps to Reproduce:
1. openshift-install create cluster --dir ocpX --log-level debug (basically https://docs.openshift.com/container-platform/4.9/installing/installing_aws/installing-aws-default.html#installing-aws-default)
2. Added custom PKI certificate as per (https://docs.openshift.com/container-platform/4.9/networking/configuring-a-custom-pki.html). No Proxy! Not sure if that has an impact or not, but I doubt.

Actual results:

All is working as expected, but we have a good amount of failed API requests caused by "system:serviceaccount:openshift-cluster-csi-drivers:aws-ebs-csi-driver-operator" towards `/api/v1/namespaces/openshift-cluster-csi-drivers/configmaps/kube-cloud-config` because the ConfigMap does not exist.

Expected results:

The "aws-ebs-csi-driver-operator" should only try to access `/api/v1/namespaces/openshift-cluster-csi-drivers/configmaps/kube-cloud-config` if custom service endpoints are being used and therefore `/api/v1/namespaces/openshift-cluster-csi-drivers/configmaps/kube-cloud-config` is created. Otherwise it should not try to GET and DELETE the `/api/v1/namespaces/openshift-cluster-csi-drivers/configmaps/kube-cloud-config` resource.

Additional Data:

If you wish, I can upload a `must-gather` containing the configuration as well as the Audit logs. But it's very easy to verify with a simple, fresh installation.

Comment 1 Fabio Bertinatto 2022-02-10 17:50:22 UTC
@Simon, thanks for reporting this.

It seems like there's a confusion between the ConfigMap located at the namespace "openshift-config-managed" and the namespace "openshift-cluster-csi-drivers".

When the "openshift-config-managed/kube-cloud-config" ConfigMap exists, the operator will copy it to the "openshift-cluster-csi-drivers" namespace. On the other hand, when the ConfigMap doesn't exist in the "openshift-config-managed" namespace, the operator _needs_ to make sure that it's absent from from the "openshift-cluster-csi-drivers" namespace as well.

There are 2 ways of doing that:

1. Perform a GET and, if the ConfigMap is present, perform a DELETE.
2. Directly perform a DELETE (saving one GET request when the ConfigMap is present).

Currently, the operator follows the second option.

If I understand correctly, you're suggesting the operator should go with the first option?

Comment 12 errata-xmlrpc 2022-08-10 10:46:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.