Description of problem (please be detailed as possible and provide log snippests): PVC from OCS on IBM cloud does not mount to pod Version of all relevant components (if applicable): OCS installed on IBM cloud as part of ROKS. Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? yes Can this issue reproduce from the UI? Not tested / NA If this is a regression, please provide more details to justify this: NA Steps to Reproduce: 1. Install OCP/OCS on IBM cloud - so called ROKS 2. create pvc 3. create pod pvc create with --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rbd-pvc-ext4 spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: ocs-storagecluster-ceph-rbd pod create with --- apiVersion: v1 kind: Pod metadata: name: csirbd-demo-pod spec: containers: - name: web-server image: nginx volumeMounts: - name: mypvc mountPath: /var/lib/www/html volumes: - name: mypvc persistentVolumeClaim: claimName: rbd-pvc-ext4 readOnly: false Actual results: pod will not start as expected Expected results: pod to start Additional info: using different storage class eg. "ibmc-block-gold" pod starts. --- storage class: # oc get sc ocs-storagecluster-ceph-rbd -o yaml allowVolumeExpansion: false apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: creationTimestamp: "2020-07-13T16:01:55Z" name: ocs-storagecluster-ceph-rbd resourceVersion: "1050654" selfLink: /apis/storage.k8s.io/v1/storageclasses/ocs-storagecluster-ceph-rbd uid: 48d4be92-3368-4603-a83f-a2303af14a38 parameters: clusterID: openshift-storage csi.storage.k8s.io/fstype: ext4 csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node csi.storage.k8s.io/node-stage-secret-namespace: openshift-storage csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/provisioner-secret-namespace: openshift-storage imageFeatures: layering imageFormat: "2" pool: ocs-storagecluster-cephblockpool provisioner: openshift-storage.rbd.csi.ceph.com reclaimPolicy: Delete volumeBindingMode: Immediate --- # oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE rbd-pvc-ext4 Bound pvc-b589282e-b01b-4803-a261-90b110791d54 1Gi RWO ocs-storagecluster-ceph-rbd 13m # oc describe pvc # oc describe pvc Name: rbd-pvc-ext4 Namespace: elko StorageClass: ocs-storagecluster-ceph-rbd Status: Bound Volume: pvc-b589282e-b01b-4803-a261-90b110791d54 Labels: <none> Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: 1Gi Access Modes: RWO VolumeMode: Filesystem Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ExternalProvisioning 12m (x2 over 12m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator Normal Provisioning 12m openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d75cf8f45-xcgd5_22e2bd0a-cf38-4f18-8436-9f024e9c6e87 External provisioner is provisioning volume for claim "elko/rbd-pvc-ext4" Normal ProvisioningSucceeded 12m openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d75cf8f45-xcgd5_22e2bd0a-cf38-4f18-8436-9f024e9c6e87 Successfully provisioned volume pvc-b589282e-b01b-4803-a261-90b110791d54 Mounted By: csirbd-demo-pod # oc get pod NAME READY STATUS RESTARTS AGE csirbd-demo-pod 0/1 ContainerCreating 0 13m root@ip-172-31-59-125: ~/pvc_attach_test # oc describe pod Name: csirbd-demo-pod Namespace: elko Priority: 0 PriorityClassName: <none> Node: 10.208.42.248/10.208.42.248 Start Time: Wed, 15 Jul 2020 07:01:46 +0000 Labels: <none> Annotations: openshift.io/scc: anyuid Status: Pending IP: Containers: web-server: Container ID: Image: nginx Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /var/lib/www/html from mypvc (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-hvfp5 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: mypvc: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: rbd-pvc-ext4 ReadOnly: false default-token-hvfp5: Type: Secret (a volume populated by a Secret) SecretName: default-token-hvfp5 Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned elko/csirbd-demo-pod to 10.208.42.248 Warning FailedMount 6m44s kubelet, 10.208.42.248 Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[default-token-hvfp5 mypvc]: timed out waiting for the condition Warning FailedMount 2m13s (x4 over 11m) kubelet, 10.208.42.248 Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc default-token-hvfp5]: timed out waiting for the condition Warning FailedMount 59s (x14 over 13m) kubelet, 10.208.42.248 MountVolume.MountDevice failed for volume "pvc-b589282e-b01b-4803-a261-90b110791d54" : rpc error: code = InvalidArgument desc = staging path does not exists on node ---- different storage class works ------ # oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE rbd-pvc-ext4 Bound pvc-c046820f-2bac-432c-8664-70624e2c6ff2 20Gi RWO ibmc-block-gold 109s root@ip-172-31-59-125: ~/pvc_attach_test # oc get pod oNAME READY STATUS RESTARTS AGE csirbd-demo-pod 1/1 Running 0 40s root@ip-172-31-59-125: ~/pvc_attach_test # oc describe pod Name: csirbd-demo-pod Namespace: teste Priority: 0 PriorityClassName: <none> Node: 10.208.42.224/10.208.42.224 Start Time: Wed, 15 Jul 2020 07:17:55 +0000 Labels: <none> Annotations: cni.projectcalico.org/podIP: 172.30.49.98/32 cni.projectcalico.org/podIPs: 172.30.49.98/32 k8s.v1.cni.cncf.io/networks-status: [{ "name": "k8s-pod-network", "ips": [ "172.30.49.98" ], "dns": {} }] openshift.io/scc: anyuid Status: Running IP: 172.30.49.98 Containers: web-server: Container ID: cri-o://7dfd66dff45d65eceafa86e5eeac07543c618972738571a989d1bd7bd1f577e5 Image: nginx Image ID: docker.io/library/nginx@sha256:8ff4598873f588ca9d2bf1be51bdb117ec8f56cdfd5a81b5bb0224a61565aa49 Port: <none> Host Port: <none> State: Running Started: Wed, 15 Jul 2020 07:18:23 +0000 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/lib/www/html from mypvc (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-7g9wg (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: mypvc: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: rbd-pvc-ext4 ReadOnly: false default-token-7g9wg: Type: Secret (a volume populated by a Secret) SecretName: default-token-7g9wg Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned teste/csirbd-demo-pod to 10.208.42.224 Normal Pulling 19s kubelet, 10.208.42.224 Pulling image "nginx" Normal Pulled 15s kubelet, 10.208.42.224 Successfully pulled image "nginx" Normal Created 14s kubelet, 10.208.42.224 Created container web-server Normal Started 14s kubelet, 10.208.42.224 Started container web-server root@ip-172-31-59-125: ~/pvc_attach_test # oc describe pvc Name: rbd-pvc-ext4 Namespace: teste StorageClass: ibmc-block-gold Status: Bound Volume: pvc-c046820f-2bac-432c-8664-70624e2c6ff2 Labels: region=us-south zone=dal13 Annotations: ibm.io/provisioning-status: complete pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: ibm.io/ibmc-block Finalizers: [kubernetes.io/pvc-protection] Capacity: 20Gi Access Modes: RWO VolumeMode: Filesystem Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Provisioning 116s ibm.io/ibmc-block_ibmcloud-block-storage-plugin-7584f7b495-th546_d835cdbe-c2ed-11ea-b27e-466b8487c5a2 External provisioner is provisioning volume for claim "teste/rbd-pvc-ext4" Normal ExternalProvisioning 61s (x8 over 116s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ibm.io/ibmc-block" or manually created by system administrator Normal ProvisioningSucceeded 61s ibm.io/ibmc-block_ibmcloud-block-storage-plugin-7584f7b495-th546_d835cdbe-c2ed-11ea-b27e-466b8487c5a2 Successfully provisioned volume pvc-c046820f-2bac-432c-8664-70624e2c6ff2 Mounted By: csirbd-demo-pod
@Elvir please provide logs for ceph-csi plugins (rbd plugins) and also the definition of ibmc-block-gold storage class.
(In reply to Mudit Agarwal from comment #3) > @Elvir please provide logs for ceph-csi plugins (rbd plugins) and also the > definition of ibmc-block-gold storage class. q: Is this deployment using a custom KUBELET path for ceph-csi? a: not sure, IBM team installed cluster q: --root-dir is /var/data/kubelet , check [1] below # oc get sc ibmc-block-gold -o yaml allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" creationTimestamp: "2020-07-10T10:13:07Z" labels: app: ibmcloud-block-storage-plugin chart: ibmcloud-block-storage-plugin-1.7.1 heritage: Helm release: release-name name: ibmc-block-gold resourceVersion: "3754" selfLink: /apis/storage.k8s.io/v1/storageclasses/ibmc-block-gold uid: a13ff46d-192b-4848-a304-36b717679954 parameters: billingType: hourly classVersion: "2" fsType: ext4 iopsPerGB: "10" sizeRange: '[20-4000]Gi' type: Endurance provisioner: ibm.io/ibmc-block reclaimPolicy: Delete volumeBindingMode: Immediate --- # oc logs csi-rbdplugin-l2cpx -c driver-registrar I0713 22:18:02.263705 32109 main.go:110] Version: v4.3.27-202006211650.p0-0-g23f9061-dirty I0713 22:18:02.264000 32109 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock" I0713 22:18:02.264036 32109 connection.go:151] Connecting to unix:///csi/csi.sock I0713 22:18:03.264721 32109 main.go:127] Calling CSI driver to discover driver name I0713 22:18:03.264761 32109 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo I0713 22:18:03.264767 32109 connection.go:181] GRPC request: {} I0713 22:18:03.268470 32109 connection.go:183] GRPC response: {"name":"openshift-storage.rbd.csi.ceph.com","vendor_version":"release-4.4"} I0713 22:18:03.269055 32109 connection.go:184] GRPC error: <nil> I0713 22:18:03.269080 32109 main.go:137] CSI driver name: "openshift-storage.rbd.csi.ceph.com" I0713 22:18:03.269126 32109 node_register.go:58] Starting Registration Server at: /registration/openshift-storage.rbd.csi.ceph.com-reg.sock I0713 22:18:03.269279 32109 node_register.go:67] Registration Server started at: /registration/openshift-storage.rbd.csi.ceph.com-reg.sock I0713 22:18:03.402840 32109 main.go:77] Received GetInfo call: &InfoRequest{} I0713 22:18:04.403358 32109 main.go:77] Received GetInfo call: &InfoRequest{} I0713 22:18:04.533539 32109 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,} Must gather and csi pod log : http://jmencak-pub.usersys.redhat.com/ekuric/rooks_bz/ [1] cat /usr/lib/systemd/system/kubelet.service [Unit] After=decrypt-docker.service Requires=decrypt-docker.service Description=Kubernetes Kubelet Documentation=https://github.com/kubernetes/kubernetes After=network.target auditd.service [Service] ExecStartPre=/sbin/swapoff -a ExecStartPre=/bin/systemctl stop -f haproxy.service ExecStartPre=-/usr/local/sbin/create-localproxy-netns.sh ExecStart=/usr/bin/hyperkube kubelet \ --root-dir=/var/data/kubelet \ --enable-controller-attach-detach=false \ --cgroup-driver=systemd \ --provider-id=ibm://a068244bebefc19e34e92d445a8504f3///bs43q5qd0d2ev8f1d8bg/kube-bs43q5qd0d2ev8f1d8bg-perfocs-default-000003f4 \ --cloud-provider=external \ --cluster-dns=172.21.0.10 \ --cluster-domain=cluster.local \ \ --feature-gates=ExpandInUsePersistentVolumes=true,LegacyNodeRoleBehavior=false,NodeDisruptionExclusion=false,ServiceNodeExclusion=false,SCTPSupport=false \ \ --pod-manifest-path=/etc/kubernetes/manifests \ --kubeconfig=/etc/kubernetes/kubelet-kubeconfig \ \ --max-pods=250 \ \ --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 \ --v=2 \ --file-check-frequency=5s \ "--hostname-override=10.208.42.210" \ "--anonymous-auth=false" \ "--client-ca-file=/etc/kubernetes/cert/ca.pem" \ "--read-only-port=0" \ --network-plugin=cni --cni-conf-dir=/etc/kubernetes/cni/net.d/ --cni-bin-dir=/var/lib/cni/bin \ --tls-cert-file=/etc/kubernetes/cert/kubelet.pem \ --tls-private-key-file=/etc/kubernetes/cert/kubelet-key.pem \ --authorization-mode=Webhook \ --authentication-token-webhook \ --container-runtime=remote \ --runtime-request-timeout=15m \ --container-runtime-endpoint=/var/run/crio/crio.sock \ --feature-gates=CRIContainerLogRotation=true \ --container-log-max-size=100Mi \ --container-log-max-files=3 \ --streaming-connection-idle-timeout=30m \ --event-qps=0 \ --kube-reserved-cgroup=/podruntime.slice \ --system-reserved-cgroup=/system.slice \ --pod-max-pids=228748 \ --kube-reserved=memory=2197Mi,cpu=64m,pid=12708 \ --system-reserved=memory=3295Mi,cpu=96m,pid=12708 \ --kubelet-cgroups=/podruntime.slice \ --runtime-cgroups=/podruntime.slice \ --enforce-node-allocatable=pods \ --kube-api-qps=20 \ --kube-api-burst=40 \ --eviction-soft=memory.available<100Mi,nodefs.available<10%,imagefs.available<10%,nodefs.inodesFree<10%,imagefs.inodesFree<10% \ --eviction-soft-grace-period=memory.available=10m,nodefs.available=10m,imagefs.available=10m,nodefs.inodesFree=10m,imagefs.inodesFree=10m \ --eviction-hard=memory.available<100Mi,nodefs.available<5%,imagefs.available<5%,nodefs.inodesFree<5%,imagefs.inodesFree<5% Restart=always RestartSec=5 TimeoutStartSec=15 SyslogIdentifier=kubelet.service [Install] WantedBy=multi-user.target
Elvir, the csi plugin logs are not useful because they don't capture the time when the issue occured. But looks like the issue is with the KUBELET path as Madhu mentioned, by default ceph csi uses "/var/lib/kubelet" but here we are using "/var/data/kubelet". We can confirm the same by looking at the complete plugin logs, OCS4.5 onwards this path is configurable via rook-cephcsi config file see https://github.com/openshift/ocs-operator/issues/454 For now you have to define the complete path in mountPath field of the pod. @Madhu, please add/correct if I have missed anything.
So why isn't it closed as dup? *** This bug has been marked as a duplicate of bug 1801365 ***
(In reply to Yaniv Kaul from comment #13) > So why isn't it closed as dup? > > *** This bug has been marked as a duplicate of bug 1801365 *** Waited for someone from ocs-operator to confirm.