Bug 1779531 - [3.11] Dynamic provisioning with Ceph RBD failed with error "failed to get admin secret" while the secret exist
Summary: [3.11] Dynamic provisioning with Ceph RBD failed with error "failed to get ad...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Jan Safranek
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks: 1719686
TreeView+ depends on / blocked
 
Reported: 2019-12-04 07:03 UTC by Liang Xia
Modified: 2019-12-10 14:54 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-10 14:54:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Liang Xia 2019-12-04 07:03:15 UTC
Description of problem:
Dynamic provisioning with Ceph RBD failed with error,
Failed to provision volume with StorageClass "ceph-block": failed to get admin secret from ["lxiap"/"cephrbd-secret"]: failed to get secret from ["lxiap"/"cephrbd-secret"]
but the secret exist in the namespace.


Version-Release number of selected component (if applicable):
openshift v3.11.157
kubernetes v1.11.0+d4cacc0


How reproducible:
Always

Steps to Reproduce:
1. Create the Ceph secret.
2. Create the storage class.
3. Create a PVC, and check the PVC.

Actual results:
Provisioning failed with error failed to get admin secret.

Expected results:
Provisioning succeed.



PV Dump:
No PV provisioned.

PVC Dump:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftNewApp
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/rbd
  creationTimestamp: "2019-12-04T04:03:22Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: mysql-persistent
    template: mysql-persistent-template
  name: mysql
  namespace: lxiap
  resourceVersion: "20180"
  selfLink: /api/v1/namespaces/lxiap/persistentvolumeclaims/mysql
  uid: 03893253-164b-11ea-8189-fa163ee74be8
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: ceph-block
status:
  phase: Pending

StorageClass Dump (if StorageClass used by PV/PVC):
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2019-12-04T04:04:06Z"
  name: ceph-block
  resourceVersion: "20292"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/ceph-block
  uid: 1dd48646-164b-11ea-8189-fa163ee74be8
parameters:
  adminId: admin
  adminSecretName: cephrbd-secret
  adminSecretNamespace: lxiap
  monitors: 10.0.149.138:6789
  pool: rbd
  userId: admin
  userSecretName: cephrbd-secret
provisioner: kubernetes.io/rbd
reclaimPolicy: Delete
volumeBindingMode: Immediate


Additional info:
# oc describe pvc mysql
Name:          mysql
Namespace:     lxiap
StorageClass:  ceph-block
Status:        Pending
Volume:        
Labels:        app=mysql-persistent
               template=mysql-persistent-template
Annotations:   openshift.io/generated-by=OpenShiftNewApp
               volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/rbd
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
Events:
  Type     Reason              Age                 From                         Message
  ----     ------              ----                ----                         -------
  Warning  ProvisioningFailed  46m (x3 over 46m)   persistentvolume-controller  Failed to provision volume with StorageClass "ceph-block": failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"]
  Warning  ProvisioningFailed  46m                 persistentvolume-controller  storageclass.storage.k8s.io "ceph-block" not found
  Warning  ProvisioningFailed  1m (x184 over 45m)  persistentvolume-controller  Failed to provision volume with StorageClass "ceph-block": failed to get admin secret from ["lxiap"/"cephrbd-secret"]: failed to get secret from ["lxiap"/"cephrbd-secret"]

# oc get secret cephrbd-secret -n lxiap
NAME             TYPE      DATA      AGE
cephrbd-secret   Opaque    1         1h
# oc get secret cephrbd-secret -n default
NAME             TYPE      DATA      AGE
cephrbd-secret   Opaque    1         2h

Comment 1 Jan Safranek 2019-12-06 15:12:31 UTC
Liang, since you used "Regression" keyword, do you remember in what OCP release it was working? I don't recall any recent changes in this area. In addition, can you see any suspicious errors in kube-controller-manager logs? Especially any permission denied.

Comment 2 Liang Xia 2019-12-09 06:45:40 UTC
(In reply to Jan Safranek from comment #1)
> Liang, since you used "Regression" keyword, do you remember in what OCP
> release it was working? I don't recall any recent changes in this area. In
> addition, can you see any suspicious errors in kube-controller-manager logs?
> Especially any permission denied.

We haven't touched this for a long time, so no exact time/build info when this starts broken.

kube-system-master-controllers-lxia-bz1719686master-etcd-1.log
I1209 01:25:12.294770       1 graph_builder.go:586] GraphBuilder process object: v1/PersistentVolumeClaim, namespace lxiap, name mysql, uid a7c1d05d-1a4c-11ea-814d-fa163eb6601c, event type add
I1209 01:25:12.298274       1 pv_controller_base.go:171] enqueued "lxiap/mysql" for sync
I1209 01:25:12.301748       1 pv_controller_base.go:357] claimWorker[lxiap/mysql]
I1209 01:25:12.301803       1 pv_controller_base.go:507] storeObjectUpdate: adding claim "lxiap/mysql", version 30138
I1209 01:25:12.301889       1 pv_controller.go:238] synchronizing PersistentVolumeClaim[lxiap/mysql]: phase: Pending, bound to: "", bindCompleted: false, boundByController: false
I1209 01:25:12.301985       1 pv_controller.go:336] synchronizing unbound PersistentVolumeClaim[lxiap/mysql]: no volume found
I1209 01:25:12.303683       1 pvc_protection_controller.go:256] Got event on PVC lxiap/mysql
I1209 01:25:12.314764       1 factory.go:100] Adding deployment config lxiap/mysql
I1209 01:25:12.314920       1 deploymentconfig_controller.go:88] Reconciling lxiap/mysql
I1209 01:25:12.315093       1 deploymentconfig_controller.go:156] Postponing rollout #0 for DeploymentConfig lxiap/mysql because of invalid or unresolved image for container #0 (name=mysql)
I1209 01:25:12.316137       1 image_trigger_controller.go:363] Started syncing resource "deploymentconfigs.apps.openshift.io/lxiap/mysql"
I1209 01:25:12.317311       1 deploymentconfigs.go:247] Deployment config mysql has changed
I1209 01:25:12.324236       1 graph_builder.go:586] GraphBuilder process object: apps.openshift.io/v1/DeploymentConfig, namespace lxiap, name mysql, uid a7c7263a-1a4c-11ea-814d-fa163eb6601c, event type add
I1209 01:25:12.337728       1 pv_controller.go:1352] provisionClaim[lxiap/mysql]: started
I1209 01:25:12.342238       1 pv_controller.go:1597] scheduleOperation[provision-lxiap/mysql[a7c1d05d-1a4c-11ea-814d-fa163eb6601c]]
I1209 01:25:12.342463       1 pv_controller.go:1365] provisionClaimOperation [lxiap/mysql] started, class: "cephrbdprovisioner"
I1209 01:25:12.388127       1 factory.go:108] Updating deployment config lxiap/mysql
I1209 01:25:12.389144       1 graph_builder.go:586] GraphBuilder process object: apps.openshift.io/v1/DeploymentConfig, namespace lxiap, name mysql, uid a7c7263a-1a4c-11ea-814d-fa163eb6601c, event type update
I1209 01:25:12.389447       1 image_trigger_controller.go:365] Finished syncing resource "deploymentconfigs.apps.openshift.io/lxiap/mysql" (73.313645ms)
I1209 01:25:12.389600       1 image_trigger_controller.go:363] Started syncing resource "deploymentconfigs.apps.openshift.io/lxiap/mysql"
I1209 01:25:12.389789       1 deploymentconfigs.go:244] Deployment config mysql has not changed
I1209 01:25:12.389905       1 image_trigger_controller.go:365] Finished syncing resource "deploymentconfigs.apps.openshift.io/lxiap/mysql" (345.038µs)
I1209 01:25:12.394777       1 deploymentconfig_controller.go:507] Error syncing deployment config lxiap/mysql: Operation cannot be fulfilled on deploymentconfigs.apps.openshift.io "mysql": the object has been modified; please apply your changes to the latest version and try again
I1209 01:25:12.394805       1 deploymentconfig_controller.go:88] Reconciling lxiap/mysql
I1209 01:25:12.394955       1 deploymentconfig_controller.go:574] Rolling out initial deployment for lxiap/mysql as it now have images available
I1209 01:25:12.402807       1 graph_builder.go:586] GraphBuilder process object: v1/PersistentVolumeClaim, namespace lxiap, name mysql, uid a7c1d05d-1a4c-11ea-814d-fa163eb6601c, event type update
I1209 01:25:12.402845       1 pv_controller_base.go:171] enqueued "lxiap/mysql" for sync
I1209 01:25:12.402857       1 pv_controller_base.go:357] claimWorker[lxiap/mysql]
I1209 01:25:12.402867       1 pv_controller_base.go:535] storeObjectUpdate updating claim "lxiap/mysql" with version 30141
I1209 01:25:12.402877       1 pv_controller.go:238] synchronizing PersistentVolumeClaim[lxiap/mysql]: phase: Pending, bound to: "", bindCompleted: false, boundByController: false
I1209 01:25:12.402895       1 pv_controller_base.go:535] storeObjectUpdate updating claim "lxiap/mysql" with version 30141
I1209 01:25:12.402903       1 pv_controller.go:336] synchronizing unbound PersistentVolumeClaim[lxiap/mysql]: no volume found
I1209 01:25:12.402911       1 pv_controller.go:1352] provisionClaim[lxiap/mysql]: started
I1209 01:25:12.402916       1 pv_controller.go:1597] scheduleOperation[provision-lxiap/mysql[a7c1d05d-1a4c-11ea-814d-fa163eb6601c]]
I1209 01:25:12.402926       1 pv_controller.go:1608] operation "provision-lxiap/mysql[a7c1d05d-1a4c-11ea-814d-fa163eb6601c]" is already running, skipping
I1209 01:25:12.402941       1 pvc_protection_controller.go:256] Got event on PVC lxiap/mysql
I1209 01:25:12.407295       1 graph_builder.go:586] GraphBuilder process object: apps.openshift.io/v1/DeploymentConfig, namespace lxiap, name mysql, uid a7c7263a-1a4c-11ea-814d-fa163eb6601c, event type update
I1209 01:25:12.407957       1 deploymentconfig_controller.go:88] Reconciling lxiap/mysql
I1209 01:25:12.408005       1 deploymentconfig_controller.go:574] Rolling out initial deployment for lxiap/mysql as it now have images available
I1209 01:25:12.408237       1 factory.go:108] Updating deployment config lxiap/mysql
E1209 01:25:12.409213       1 rbd.go:1101] failed to get secret from ["default"/"cephrbd-secret"]
I1209 01:25:12.409426       1 pv_controller.go:1476] failed to provision volume for claim "lxiap/mysql" with StorageClass "cephrbdprovisioner": failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"]
I1209 01:25:12.409937       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"lxiap", Name:"mysql", UID:"a7c1d05d-1a4c-11ea-814d-fa163eb6601c", APIVersion:"v1", ResourceVersion:"30141", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' Failed to provision volume with StorageClass "cephrbdprovisioner": failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"]

Comment 3 Jan Safranek 2019-12-09 11:24:48 UTC
I looked at the cluster at lxia-bz1719686master-etcd-1.int.1209-som.qe.rhcloud.com and cephrbd-secret had wrong type:

apiVersion: v1
data:
  key: QVFEUnJLTlZiRWV2Q2hBQUVtUkMrcFcvS0JWSHhhMHcvUE9JTEE9PQo=
kind: Secret
metadata:
  creationTimestamp: "2019-12-09T03:12:43Z"
  name: cephrbd-secret
  namespace: default
  resourceVersion: "4760"
  selfLink: /api/v1/namespaces/default/secrets/cephrbd-secret
  uid: c3fb2ec3-1a31-11ea-814d-fa163eb6601c
type: Opaque


Should be:

type: kubernetes.io/rbd


With this change, RBD provisioning works. I don't know who created the secret, was it installer? It needs to be fixed then.

We can improve the error message by backporting https://github.com/kubernetes/kubernetes/pull/72033:

> failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"]: Cannot get secret of type Opaque

Comment 6 Liang Xia 2019-12-10 02:31:12 UTC
(In reply to Jan Safranek from comment #3)
> With this change, RBD provisioning works. I don't know who created the
> secret, was it installer? It needs to be fixed then.

QE have fixed the templates and scripts.

> 
> We can improve the error message by backporting
> https://github.com/kubernetes/kubernetes/pull/72033:
> 
> > failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"]: Cannot get secret of type Opaque

That would be helpful.

Comment 7 Jan Safranek 2019-12-10 14:54:13 UTC
(In reply to Liang Xia from comment #6)
> > 
> > We can improve the error message by backporting
> > https://github.com/kubernetes/kubernetes/pull/72033:
> > 
> > > failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"]: Cannot get secret of type Opaque
> 
> That would be helpful.

I got second thoughts, this is 3.11 but and we backport only important fixes there. Otherwise we're going to backport everything.

We fixed our documentation, that should be enough for now.


Note You need to log in before you can comment on or make changes to this bug.