Bug 1949357 - manila-csi-controller pod not running due to secret lack(in another ns)
Summary: manila-csi-controller pod not running due to secret lack(in another ns)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Jan Safranek
QA Contact: Wei Duan
URL:
Whiteboard:
: 1952144 1954010 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-14 05:42 UTC by Wei Duan
Modified: 2021-07-27 23:00 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:00:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-storage-operator pull 162 0 None open Bug 1949357: Allow Manila operator to create ServiceMonitor in the driver namespace 2021-04-15 09:08:48 UTC
Github openshift cluster-storage-operator pull 163 0 None open Bug 1949357: Add missing RBAC rules to Manila operator 2021-04-19 19:29:16 UTC
Github openshift csi-driver-manila-operator pull 96 0 None open Bug 1949357: Fix namespace in metrics collection objects 2021-04-15 08:55:01 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:00:47 UTC

Description Wei Duan 2021-04-14 05:42:55 UTC
Description of problem:
After install an OCP4.8 cluster on OSP, storage co is not available and manila-csi-controller pod is not running with following error:
  Warning  FailedMount  20m (x5 over 51m)        kubelet            Unable to attach or mount volumes: unmounted volumes=[metrics-serving-cert], unattached volumes=[metrics-serving-cert socket-dir cacert manila-csi-driver-controller-sa-token-wt67q]: timed out waiting for the condition
  Warning  FailedMount  5m43s (x35 over 60m)     kubelet            MountVolume.SetUp failed for volume "metrics-serving-cert" : secret "manila-csi-driver-controller-metrics-serving-cert" not found

Looks like this secret is under openshift-cluster-csi-drivers but not openshift-manila-csi-driver:
$ oc get secret -A | grep manila-csi-driver-controller-metrics-serving-cert
openshift-cluster-csi-drivers                      manila-csi-driver-controller-metrics-serving-cert             kubernetes.io/tls                     2      72m

Possibly introduced by https://github.com/openshift/csi-driver-manila-operator/pull/95.

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-04-13-171608

How reproducible:
Always

Steps to Reproduce:
1. Install cluster on OSP
2. storage co and manila csi driver is not available 
3. Check the pod openstack-manila-csi-controllerplugin pod
$ oc -n openshift-manila-csi-driver describe pod openstack-manila-csi-controllerplugin-68bcf6c576-zdlrh
Events:
  Type     Reason       Age                      From               Message
  ----     ------       ----                     ----               -------
  Normal   Scheduled    60m                      default-scheduler  Successfully assigned openshift-manila-csi-driver/openstack-manila-csi-controllerplugin-68bcf6c576-zdlrh to wduan-0414d-zmphs-master-1
  Warning  FailedMount  42m (x3 over 58m)        kubelet            Unable to attach or mount volumes: unmounted volumes=[metrics-serving-cert], unattached volumes=[manila-csi-driver-controller-sa-token-wt67q metrics-serving-cert socket-dir cacert]: timed out waiting for the condition
  Warning  FailedMount  40m (x5 over 54m)        kubelet            Unable to attach or mount volumes: unmounted volumes=[metrics-serving-cert], unattached volumes=[cacert manila-csi-driver-controller-sa-token-wt67q metrics-serving-cert socket-dir]: timed out waiting for the condition
  Warning  FailedMount  20m (x5 over 51m)        kubelet            Unable to attach or mount volumes: unmounted volumes=[metrics-serving-cert], unattached volumes=[metrics-serving-cert socket-dir cacert manila-csi-driver-controller-sa-token-wt67q]: timed out waiting for the condition
  Warning  FailedMount  5m43s (x35 over 60m)     kubelet            MountVolume.SetUp failed for volume "metrics-serving-cert" : secret "manila-csi-driver-controller-metrics-serving-cert" not found
  Warning  FailedMount  <invalid> (x5 over 24m)  kubelet            Unable to attach or mount volumes: unmounted volumes=[metrics-serving-cert], unattached volumes=[socket-dir cacert manila-csi-driver-controller-sa-token-wt67q metrics-serving-cert]: timed out waiting for the condition

Actual results:
openstack-manila-csi-controllerplugin is not running

Expected results:
openstack-manila-csi-controllerplugin should in running status

Comment 1 Wei Duan 2021-04-14 08:08:26 UTC
@Jan
Maybe need your help on this but not stack team? 
This is some kind of block for our CI/upgrade due to the CSO status.

Comment 2 Denis Ollier 2021-04-14 14:31:50 UTC
Hi,

I confirm this issue.

Deploying OCP-4.8.0-0.nightly-2021-04-13-171608 on RHOS-PSI is failing for me with the same error.

Regards.

Denis.

Comment 6 Denis Ollier 2021-04-17 12:52:00 UTC
It's still not working with OCP 4.8.0-0.nightly-2021-04-17-044339.

Same issue:

> MountVolume.SetUp failed for volume "metrics-serving-cert" : secret "manila-csi-driver-controller-metrics-serving-cert" not found

Comment 8 Matthew Booth 2021-04-19 16:35:01 UTC
@mfedosin is reporting that this is not fixed by https://github.com/openshift/csi-driver-manila-operator/pull/96

Comment 9 Mike Fedosin 2021-04-19 17:23:04 UTC
Yes, this is what I have:

The fix #96 has been applied, and now I see a service monitor in the right namespace
❯ oc get servicemonitor -n openshift-manila-csi-driver
NAME                                   AGE
manila-csi-driver-controller-monitor   3h22m


But there is still no secret
❯ oc get secret -n openshift-manila-csi-driver manila-csi-driver-controller-metrics-serving-cert
Error from server (NotFound): secrets "manila-csi-driver-controller-metrics-serving-cert" not found


And Manila controller fails to start
Events:
  Type     Reason       Age                    From                              Message
  ----     ------       ----                   ----                              -------
  Normal   Scheduled    <unknown>                                                Successfully assigned openshift-manila-csi-driver/openstack-manila-csi-controllerplugin-bc4b9cbb9-dd6xf to mfedosin-pnqxg-master-0
  Warning  FailedMount  96m (x7 over 155m)     kubelet, mfedosin-pnqxg-master-0  Unable to attach or mount volumes: unmounted volumes=[metrics-serving-cert], unattached volumes=[metrics-serving-cert socket-dir cacert manila-csi-driver-controller-sa-token-lgb7n]: timed out waiting for the condition
  Warning  FailedMount  35m (x19 over 151m)    kubelet, mfedosin-pnqxg-master-0  Unable to attach or mount volumes: unmounted volumes=[metrics-serving-cert], unattached volumes=[socket-dir cacert manila-csi-driver-controller-sa-token-lgb7n metrics-serving-cert]: timed out waiting for the condition
  Warning  FailedMount  22m (x15 over 139m)    kubelet, mfedosin-pnqxg-master-0  Unable to attach or mount volumes: unmounted volumes=[metrics-serving-cert], unattached volumes=[manila-csi-driver-controller-sa-token-lgb7n metrics-serving-cert socket-dir cacert]: timed out waiting for the condition
  Warning  FailedMount  7m10s (x82 over 157m)  kubelet, mfedosin-pnqxg-master-0  MountVolume.SetUp failed for volume "metrics-serving-cert" : secret "manila-csi-driver-controller-metrics-serving-cert" not found
  Warning  FailedMount  112s (x14 over 148m)   kubelet, mfedosin-pnqxg-master-0  Unable to attach or mount volumes: unmounted volumes=[metrics-serving-cert], unattached volumes=[cacert manila-csi-driver-controller-sa-token-lgb7n metrics-serving-cert socket-dir]: timed out waiting for the condition

Comment 10 Jan Safranek 2021-04-19 19:15:24 UTC
One step further: 

E0419 17:57:57.765388       1 base_controller.go:253] StaticResourceController reconciliation failed: ["service.yaml" (string): services "manila-csi-driver-controller-metrics" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:manila-csi-driver-operator" cannot get resource "services" in API group "" in the namespace "openshift-manila-csi-driver", "rbac/prometheus_role.yaml" (string): roles.rbac.authorization.k8s.io "manila-csi-driver-prometheus" is forbidden: user "system:serviceaccount:openshift-cluster-csi-drivers:manila-csi-driver-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:openshift-cluster-csi-drivers" "system:authenticated"]) is attempting to grant RBAC permissions not currently held:
{APIGroups:[""], Resources:["endpoints"], Verbs:["get" "list" "watch"]}
{APIGroups:[""], Resources:["pods"], Verbs:["get" "list" "watch"]}
{APIGroups:[""], Resources:["services"], Verbs:["get" "list" "watch"]}, "rbac/prometheus_rolebinding.yaml" (string): roles.rbac.authorization.k8s.io "manila-csi-driver-prometheus" not found]

Comment 12 Wei Duan 2021-04-21 02:59:02 UTC
Verified pass on 4.8.0-0.nightly-2021-04-20-101404

Comment 13 Mike Fedosin 2021-04-21 15:49:16 UTC
*** Bug 1952144 has been marked as a duplicate of this bug. ***

Comment 14 Mike Fedosin 2021-04-27 21:09:53 UTC
*** Bug 1954010 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2021-07-27 23:00:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.