Description of problem: Dynamic provisioning with Ceph RBD failed with error, Failed to provision volume with StorageClass "ceph-block": failed to get admin secret from ["lxiap"/"cephrbd-secret"]: failed to get secret from ["lxiap"/"cephrbd-secret"] but the secret exist in the namespace. Version-Release number of selected component (if applicable): openshift v3.11.157 kubernetes v1.11.0+d4cacc0 How reproducible: Always Steps to Reproduce: 1. Create the Ceph secret. 2. Create the storage class. 3. Create a PVC, and check the PVC. Actual results: Provisioning failed with error failed to get admin secret. Expected results: Provisioning succeed. PV Dump: No PV provisioned. PVC Dump: apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: openshift.io/generated-by: OpenShiftNewApp volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/rbd creationTimestamp: "2019-12-04T04:03:22Z" finalizers: - kubernetes.io/pvc-protection labels: app: mysql-persistent template: mysql-persistent-template name: mysql namespace: lxiap resourceVersion: "20180" selfLink: /api/v1/namespaces/lxiap/persistentvolumeclaims/mysql uid: 03893253-164b-11ea-8189-fa163ee74be8 spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: ceph-block status: phase: Pending StorageClass Dump (if StorageClass used by PV/PVC): allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" creationTimestamp: "2019-12-04T04:04:06Z" name: ceph-block resourceVersion: "20292" selfLink: /apis/storage.k8s.io/v1/storageclasses/ceph-block uid: 1dd48646-164b-11ea-8189-fa163ee74be8 parameters: adminId: admin adminSecretName: cephrbd-secret adminSecretNamespace: lxiap monitors: 10.0.149.138:6789 pool: rbd userId: admin userSecretName: cephrbd-secret provisioner: kubernetes.io/rbd reclaimPolicy: Delete volumeBindingMode: Immediate Additional info: # oc describe pvc mysql Name: mysql Namespace: lxiap StorageClass: ceph-block Status: Pending Volume: Labels: app=mysql-persistent template=mysql-persistent-template Annotations: openshift.io/generated-by=OpenShiftNewApp volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/rbd Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 46m (x3 over 46m) persistentvolume-controller Failed to provision volume with StorageClass "ceph-block": failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"] Warning ProvisioningFailed 46m persistentvolume-controller storageclass.storage.k8s.io "ceph-block" not found Warning ProvisioningFailed 1m (x184 over 45m) persistentvolume-controller Failed to provision volume with StorageClass "ceph-block": failed to get admin secret from ["lxiap"/"cephrbd-secret"]: failed to get secret from ["lxiap"/"cephrbd-secret"] # oc get secret cephrbd-secret -n lxiap NAME TYPE DATA AGE cephrbd-secret Opaque 1 1h # oc get secret cephrbd-secret -n default NAME TYPE DATA AGE cephrbd-secret Opaque 1 2h
Liang, since you used "Regression" keyword, do you remember in what OCP release it was working? I don't recall any recent changes in this area. In addition, can you see any suspicious errors in kube-controller-manager logs? Especially any permission denied.
(In reply to Jan Safranek from comment #1) > Liang, since you used "Regression" keyword, do you remember in what OCP > release it was working? I don't recall any recent changes in this area. In > addition, can you see any suspicious errors in kube-controller-manager logs? > Especially any permission denied. We haven't touched this for a long time, so no exact time/build info when this starts broken. kube-system-master-controllers-lxia-bz1719686master-etcd-1.log I1209 01:25:12.294770 1 graph_builder.go:586] GraphBuilder process object: v1/PersistentVolumeClaim, namespace lxiap, name mysql, uid a7c1d05d-1a4c-11ea-814d-fa163eb6601c, event type add I1209 01:25:12.298274 1 pv_controller_base.go:171] enqueued "lxiap/mysql" for sync I1209 01:25:12.301748 1 pv_controller_base.go:357] claimWorker[lxiap/mysql] I1209 01:25:12.301803 1 pv_controller_base.go:507] storeObjectUpdate: adding claim "lxiap/mysql", version 30138 I1209 01:25:12.301889 1 pv_controller.go:238] synchronizing PersistentVolumeClaim[lxiap/mysql]: phase: Pending, bound to: "", bindCompleted: false, boundByController: false I1209 01:25:12.301985 1 pv_controller.go:336] synchronizing unbound PersistentVolumeClaim[lxiap/mysql]: no volume found I1209 01:25:12.303683 1 pvc_protection_controller.go:256] Got event on PVC lxiap/mysql I1209 01:25:12.314764 1 factory.go:100] Adding deployment config lxiap/mysql I1209 01:25:12.314920 1 deploymentconfig_controller.go:88] Reconciling lxiap/mysql I1209 01:25:12.315093 1 deploymentconfig_controller.go:156] Postponing rollout #0 for DeploymentConfig lxiap/mysql because of invalid or unresolved image for container #0 (name=mysql) I1209 01:25:12.316137 1 image_trigger_controller.go:363] Started syncing resource "deploymentconfigs.apps.openshift.io/lxiap/mysql" I1209 01:25:12.317311 1 deploymentconfigs.go:247] Deployment config mysql has changed I1209 01:25:12.324236 1 graph_builder.go:586] GraphBuilder process object: apps.openshift.io/v1/DeploymentConfig, namespace lxiap, name mysql, uid a7c7263a-1a4c-11ea-814d-fa163eb6601c, event type add I1209 01:25:12.337728 1 pv_controller.go:1352] provisionClaim[lxiap/mysql]: started I1209 01:25:12.342238 1 pv_controller.go:1597] scheduleOperation[provision-lxiap/mysql[a7c1d05d-1a4c-11ea-814d-fa163eb6601c]] I1209 01:25:12.342463 1 pv_controller.go:1365] provisionClaimOperation [lxiap/mysql] started, class: "cephrbdprovisioner" I1209 01:25:12.388127 1 factory.go:108] Updating deployment config lxiap/mysql I1209 01:25:12.389144 1 graph_builder.go:586] GraphBuilder process object: apps.openshift.io/v1/DeploymentConfig, namespace lxiap, name mysql, uid a7c7263a-1a4c-11ea-814d-fa163eb6601c, event type update I1209 01:25:12.389447 1 image_trigger_controller.go:365] Finished syncing resource "deploymentconfigs.apps.openshift.io/lxiap/mysql" (73.313645ms) I1209 01:25:12.389600 1 image_trigger_controller.go:363] Started syncing resource "deploymentconfigs.apps.openshift.io/lxiap/mysql" I1209 01:25:12.389789 1 deploymentconfigs.go:244] Deployment config mysql has not changed I1209 01:25:12.389905 1 image_trigger_controller.go:365] Finished syncing resource "deploymentconfigs.apps.openshift.io/lxiap/mysql" (345.038µs) I1209 01:25:12.394777 1 deploymentconfig_controller.go:507] Error syncing deployment config lxiap/mysql: Operation cannot be fulfilled on deploymentconfigs.apps.openshift.io "mysql": the object has been modified; please apply your changes to the latest version and try again I1209 01:25:12.394805 1 deploymentconfig_controller.go:88] Reconciling lxiap/mysql I1209 01:25:12.394955 1 deploymentconfig_controller.go:574] Rolling out initial deployment for lxiap/mysql as it now have images available I1209 01:25:12.402807 1 graph_builder.go:586] GraphBuilder process object: v1/PersistentVolumeClaim, namespace lxiap, name mysql, uid a7c1d05d-1a4c-11ea-814d-fa163eb6601c, event type update I1209 01:25:12.402845 1 pv_controller_base.go:171] enqueued "lxiap/mysql" for sync I1209 01:25:12.402857 1 pv_controller_base.go:357] claimWorker[lxiap/mysql] I1209 01:25:12.402867 1 pv_controller_base.go:535] storeObjectUpdate updating claim "lxiap/mysql" with version 30141 I1209 01:25:12.402877 1 pv_controller.go:238] synchronizing PersistentVolumeClaim[lxiap/mysql]: phase: Pending, bound to: "", bindCompleted: false, boundByController: false I1209 01:25:12.402895 1 pv_controller_base.go:535] storeObjectUpdate updating claim "lxiap/mysql" with version 30141 I1209 01:25:12.402903 1 pv_controller.go:336] synchronizing unbound PersistentVolumeClaim[lxiap/mysql]: no volume found I1209 01:25:12.402911 1 pv_controller.go:1352] provisionClaim[lxiap/mysql]: started I1209 01:25:12.402916 1 pv_controller.go:1597] scheduleOperation[provision-lxiap/mysql[a7c1d05d-1a4c-11ea-814d-fa163eb6601c]] I1209 01:25:12.402926 1 pv_controller.go:1608] operation "provision-lxiap/mysql[a7c1d05d-1a4c-11ea-814d-fa163eb6601c]" is already running, skipping I1209 01:25:12.402941 1 pvc_protection_controller.go:256] Got event on PVC lxiap/mysql I1209 01:25:12.407295 1 graph_builder.go:586] GraphBuilder process object: apps.openshift.io/v1/DeploymentConfig, namespace lxiap, name mysql, uid a7c7263a-1a4c-11ea-814d-fa163eb6601c, event type update I1209 01:25:12.407957 1 deploymentconfig_controller.go:88] Reconciling lxiap/mysql I1209 01:25:12.408005 1 deploymentconfig_controller.go:574] Rolling out initial deployment for lxiap/mysql as it now have images available I1209 01:25:12.408237 1 factory.go:108] Updating deployment config lxiap/mysql E1209 01:25:12.409213 1 rbd.go:1101] failed to get secret from ["default"/"cephrbd-secret"] I1209 01:25:12.409426 1 pv_controller.go:1476] failed to provision volume for claim "lxiap/mysql" with StorageClass "cephrbdprovisioner": failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"] I1209 01:25:12.409937 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"lxiap", Name:"mysql", UID:"a7c1d05d-1a4c-11ea-814d-fa163eb6601c", APIVersion:"v1", ResourceVersion:"30141", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' Failed to provision volume with StorageClass "cephrbdprovisioner": failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"]
I looked at the cluster at lxia-bz1719686master-etcd-1.int.1209-som.qe.rhcloud.com and cephrbd-secret had wrong type: apiVersion: v1 data: key: QVFEUnJLTlZiRWV2Q2hBQUVtUkMrcFcvS0JWSHhhMHcvUE9JTEE9PQo= kind: Secret metadata: creationTimestamp: "2019-12-09T03:12:43Z" name: cephrbd-secret namespace: default resourceVersion: "4760" selfLink: /api/v1/namespaces/default/secrets/cephrbd-secret uid: c3fb2ec3-1a31-11ea-814d-fa163eb6601c type: Opaque Should be: type: kubernetes.io/rbd With this change, RBD provisioning works. I don't know who created the secret, was it installer? It needs to be fixed then. We can improve the error message by backporting https://github.com/kubernetes/kubernetes/pull/72033: > failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"]: Cannot get secret of type Opaque
(In reply to Jan Safranek from comment #3) > With this change, RBD provisioning works. I don't know who created the > secret, was it installer? It needs to be fixed then. QE have fixed the templates and scripts. > > We can improve the error message by backporting > https://github.com/kubernetes/kubernetes/pull/72033: > > > failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"]: Cannot get secret of type Opaque That would be helpful.
(In reply to Liang Xia from comment #6) > > > > We can improve the error message by backporting > > https://github.com/kubernetes/kubernetes/pull/72033: > > > > > failed to get admin secret from ["default"/"cephrbd-secret"]: failed to get secret from ["default"/"cephrbd-secret"]: Cannot get secret of type Opaque > > That would be helpful. I got second thoughts, this is 3.11 but and we backport only important fixes there. Otherwise we're going to backport everything. We fixed our documentation, that should be enough for now.