Bug 1857114 - [ROKS] PVC from OCS on IBM cloud does not mount to pod
Summary: [ROKS] PVC from OCS on IBM cloud does not mount to pod
Keywords:
Status: CLOSED DUPLICATE of bug 1801365
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Jose A. Rivera
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-15 07:19 UTC by Elvir Kuric
Modified: 2020-07-16 13:56 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-16 13:35:16 UTC
Embargoed:


Attachments (Terms of Use)

Description Elvir Kuric 2020-07-15 07:19:11 UTC
Description of problem (please be detailed as possible and provide log
snippests):

PVC from OCS on IBM cloud does not mount to pod


Version of all relevant components (if applicable):

OCS installed on IBM cloud as part of ROKS. 

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

yes

Is there any workaround available to the best of your knowledge?
NA

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3

Can this issue reproducible?
yes


Can this issue reproduce from the UI?
Not tested / NA


If this is a regression, please provide more details to justify this:
NA

Steps to Reproduce:
1. Install OCP/OCS on IBM cloud - so called ROKS
2. create pvc 
3. create pod

pvc create with 

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc-ext4
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: ocs-storagecluster-ceph-rbd


pod create with 

---
apiVersion: v1
kind: Pod
metadata:
  name: csirbd-demo-pod
spec:
  containers:
   - name: web-server
     image: nginx
     volumeMounts:
       - name: mypvc
         mountPath: /var/lib/www/html
  volumes:
   - name: mypvc
     persistentVolumeClaim:
       claimName: rbd-pvc-ext4
       readOnly: false



Actual results:

pod will not start as expected 

Expected results:

pod to start 

Additional info:

using different storage class eg. "ibmc-block-gold" pod starts.

--- 
storage class:

# oc get sc ocs-storagecluster-ceph-rbd -o yaml 
allowVolumeExpansion: false
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2020-07-13T16:01:55Z"
  name: ocs-storagecluster-ceph-rbd
  resourceVersion: "1050654"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/ocs-storagecluster-ceph-rbd
  uid: 48d4be92-3368-4603-a83f-a2303af14a38
parameters:
  clusterID: openshift-storage
  csi.storage.k8s.io/fstype: ext4
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: openshift-storage
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: openshift-storage
  imageFeatures: layering
  imageFormat: "2"
  pool: ocs-storagecluster-cephblockpool
provisioner: openshift-storage.rbd.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate


--- 
# oc get pvc
NAME           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
rbd-pvc-ext4   Bound    pvc-b589282e-b01b-4803-a261-90b110791d54   1Gi        RWO            ocs-storagecluster-ceph-rbd   13m


# oc describe pvc

# oc describe pvc
Name:          rbd-pvc-ext4
Namespace:     elko
StorageClass:  ocs-storagecluster-ceph-rbd
Status:        Bound
Volume:        pvc-b589282e-b01b-4803-a261-90b110791d54
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Events:
  Type       Reason                 Age                From                                                                                                               Message
  ----       ------                 ----               ----                                                                                                               -------
  Normal     ExternalProvisioning   12m (x2 over 12m)  persistentvolume-controller                                                                                        waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator
  Normal     Provisioning           12m                openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d75cf8f45-xcgd5_22e2bd0a-cf38-4f18-8436-9f024e9c6e87  External provisioner is provisioning volume for claim "elko/rbd-pvc-ext4"
  Normal     ProvisioningSucceeded  12m                openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-d75cf8f45-xcgd5_22e2bd0a-cf38-4f18-8436-9f024e9c6e87  Successfully provisioned volume pvc-b589282e-b01b-4803-a261-90b110791d54
Mounted By:  csirbd-demo-pod



# oc get pod
NAME              READY   STATUS              RESTARTS   AGE
csirbd-demo-pod   0/1     ContainerCreating   0          13m

root@ip-172-31-59-125: ~/pvc_attach_test # oc describe pod 
Name:               csirbd-demo-pod
Namespace:          elko
Priority:           0
PriorityClassName:  <none>
Node:               10.208.42.248/10.208.42.248
Start Time:         Wed, 15 Jul 2020 07:01:46 +0000
Labels:             <none>
Annotations:        openshift.io/scc: anyuid
Status:             Pending
IP:                 
Containers:
  web-server:
    Container ID:   
    Image:          nginx
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/www/html from mypvc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hvfp5 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  mypvc:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rbd-pvc-ext4
    ReadOnly:   false
  default-token-hvfp5:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hvfp5
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason       Age                  From                    Message
  ----     ------       ----                 ----                    -------
  Normal   Scheduled    <unknown>            default-scheduler       Successfully assigned elko/csirbd-demo-pod to 10.208.42.248
  Warning  FailedMount  6m44s                kubelet, 10.208.42.248  Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[default-token-hvfp5 mypvc]: timed out waiting for the condition
  Warning  FailedMount  2m13s (x4 over 11m)  kubelet, 10.208.42.248  Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc default-token-hvfp5]: timed out waiting for the condition
  Warning  FailedMount  59s (x14 over 13m)   kubelet, 10.208.42.248  MountVolume.MountDevice failed for volume "pvc-b589282e-b01b-4803-a261-90b110791d54" : rpc error: code = InvalidArgument desc = staging path does not exists on node



---- different storage class works ------


 # oc get pvc
NAME           STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rbd-pvc-ext4   Bound    pvc-c046820f-2bac-432c-8664-70624e2c6ff2   20Gi       RWO            ibmc-block-gold   109s
root@ip-172-31-59-125: ~/pvc_attach_test # oc get pod 
oNAME              READY   STATUS    RESTARTS   AGE
csirbd-demo-pod   1/1     Running   0          40s
root@ip-172-31-59-125: ~/pvc_attach_test # oc describe pod 
Name:               csirbd-demo-pod
Namespace:          teste
Priority:           0
PriorityClassName:  <none>
Node:               10.208.42.224/10.208.42.224
Start Time:         Wed, 15 Jul 2020 07:17:55 +0000
Labels:             <none>
Annotations:        cni.projectcalico.org/podIP: 172.30.49.98/32
                    cni.projectcalico.org/podIPs: 172.30.49.98/32
                    k8s.v1.cni.cncf.io/networks-status:
                      [{
                          "name": "k8s-pod-network",
                          "ips": [
                              "172.30.49.98"
                          ],
                          "dns": {}
                      }]
                    openshift.io/scc: anyuid
Status:             Running
IP:                 172.30.49.98
Containers:
  web-server:
    Container ID:   cri-o://7dfd66dff45d65eceafa86e5eeac07543c618972738571a989d1bd7bd1f577e5
    Image:          nginx
    Image ID:       docker.io/library/nginx@sha256:8ff4598873f588ca9d2bf1be51bdb117ec8f56cdfd5a81b5bb0224a61565aa49
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Wed, 15 Jul 2020 07:18:23 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/www/html from mypvc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7g9wg (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  mypvc:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rbd-pvc-ext4
    ReadOnly:   false
  default-token-7g9wg:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-7g9wg
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age        From                    Message
  ----    ------     ----       ----                    -------
  Normal  Scheduled  <unknown>  default-scheduler       Successfully assigned teste/csirbd-demo-pod to 10.208.42.224
  Normal  Pulling    19s        kubelet, 10.208.42.224  Pulling image "nginx"
  Normal  Pulled     15s        kubelet, 10.208.42.224  Successfully pulled image "nginx"
  Normal  Created    14s        kubelet, 10.208.42.224  Created container web-server
  Normal  Started    14s        kubelet, 10.208.42.224  Started container web-server
root@ip-172-31-59-125: ~/pvc_attach_test # oc describe pvc
Name:          rbd-pvc-ext4
Namespace:     teste
StorageClass:  ibmc-block-gold
Status:        Bound
Volume:        pvc-c046820f-2bac-432c-8664-70624e2c6ff2
Labels:        region=us-south
               zone=dal13
Annotations:   ibm.io/provisioning-status: complete
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: ibm.io/ibmc-block
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      20Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Events:
  Type       Reason                 Age                 From                                                                                                   Message
  ----       ------                 ----                ----                                                                                                   -------
  Normal     Provisioning           116s                ibm.io/ibmc-block_ibmcloud-block-storage-plugin-7584f7b495-th546_d835cdbe-c2ed-11ea-b27e-466b8487c5a2  External provisioner is provisioning volume for claim "teste/rbd-pvc-ext4"
  Normal     ExternalProvisioning   61s (x8 over 116s)  persistentvolume-controller                                                                            waiting for a volume to be created, either by external provisioner "ibm.io/ibmc-block" or manually created by system administrator
  Normal     ProvisioningSucceeded  61s                 ibm.io/ibmc-block_ibmcloud-block-storage-plugin-7584f7b495-th546_d835cdbe-c2ed-11ea-b27e-466b8487c5a2  Successfully provisioned volume pvc-c046820f-2bac-432c-8664-70624e2c6ff2
Mounted By:  csirbd-demo-pod

Comment 3 Mudit Agarwal 2020-07-15 08:02:38 UTC
@Elvir please provide logs for ceph-csi plugins (rbd plugins) and also the definition of ibmc-block-gold storage class.

Comment 5 Elvir Kuric 2020-07-15 09:50:16 UTC
(In reply to Mudit Agarwal from comment #3)
> @Elvir please provide logs for ceph-csi plugins (rbd plugins) and also the
> definition of ibmc-block-gold storage class.


q: Is this deployment using a custom KUBELET path for ceph-csi?
a: not sure, IBM team installed cluster



q: --root-dir is /var/data/kubelet , check [1] below 


 # oc get sc ibmc-block-gold -o yaml 
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2020-07-10T10:13:07Z"
  labels:
    app: ibmcloud-block-storage-plugin
    chart: ibmcloud-block-storage-plugin-1.7.1
    heritage: Helm
    release: release-name
  name: ibmc-block-gold
  resourceVersion: "3754"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/ibmc-block-gold
  uid: a13ff46d-192b-4848-a304-36b717679954
parameters:
  billingType: hourly
  classVersion: "2"
  fsType: ext4
  iopsPerGB: "10"
  sizeRange: '[20-4000]Gi'
  type: Endurance
provisioner: ibm.io/ibmc-block
reclaimPolicy: Delete
volumeBindingMode: Immediate


---

 # oc logs csi-rbdplugin-l2cpx -c driver-registrar
I0713 22:18:02.263705   32109 main.go:110] Version: v4.3.27-202006211650.p0-0-g23f9061-dirty
I0713 22:18:02.264000   32109 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0713 22:18:02.264036   32109 connection.go:151] Connecting to unix:///csi/csi.sock
I0713 22:18:03.264721   32109 main.go:127] Calling CSI driver to discover driver name
I0713 22:18:03.264761   32109 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0713 22:18:03.264767   32109 connection.go:181] GRPC request: {}
I0713 22:18:03.268470   32109 connection.go:183] GRPC response: {"name":"openshift-storage.rbd.csi.ceph.com","vendor_version":"release-4.4"}
I0713 22:18:03.269055   32109 connection.go:184] GRPC error: <nil>
I0713 22:18:03.269080   32109 main.go:137] CSI driver name: "openshift-storage.rbd.csi.ceph.com"
I0713 22:18:03.269126   32109 node_register.go:58] Starting Registration Server at: /registration/openshift-storage.rbd.csi.ceph.com-reg.sock
I0713 22:18:03.269279   32109 node_register.go:67] Registration Server started at: /registration/openshift-storage.rbd.csi.ceph.com-reg.sock
I0713 22:18:03.402840   32109 main.go:77] Received GetInfo call: &InfoRequest{}
I0713 22:18:04.403358   32109 main.go:77] Received GetInfo call: &InfoRequest{}
I0713 22:18:04.533539   32109 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}



Must gather and csi pod log : http://jmencak-pub.usersys.redhat.com/ekuric/rooks_bz/ 


[1] 
cat /usr/lib/systemd/system/kubelet.service
[Unit]
After=decrypt-docker.service
Requires=decrypt-docker.service
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=network.target auditd.service


[Service]
ExecStartPre=/sbin/swapoff -a
ExecStartPre=/bin/systemctl stop -f haproxy.service
ExecStartPre=-/usr/local/sbin/create-localproxy-netns.sh
ExecStart=/usr/bin/hyperkube kubelet \
                    --root-dir=/var/data/kubelet \
                              --enable-controller-attach-detach=false \
                              --cgroup-driver=systemd \
          --provider-id=ibm://a068244bebefc19e34e92d445a8504f3///bs43q5qd0d2ev8f1d8bg/kube-bs43q5qd0d2ev8f1d8bg-perfocs-default-000003f4 \
          --cloud-provider=external \
                    --cluster-dns=172.21.0.10 \
                    --cluster-domain=cluster.local \
           \
          --feature-gates=ExpandInUsePersistentVolumes=true,LegacyNodeRoleBehavior=false,NodeDisruptionExclusion=false,ServiceNodeExclusion=false,SCTPSupport=false \
           \
          --pod-manifest-path=/etc/kubernetes/manifests \
          --kubeconfig=/etc/kubernetes/kubelet-kubeconfig \
           \
          --max-pods=250 \
           \
          --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 \
          --v=2 \
          --file-check-frequency=5s \
          "--hostname-override=10.208.42.210" \
          "--anonymous-auth=false" \
          "--client-ca-file=/etc/kubernetes/cert/ca.pem" \
          "--read-only-port=0" \
          --network-plugin=cni --cni-conf-dir=/etc/kubernetes/cni/net.d/ --cni-bin-dir=/var/lib/cni/bin  \
          --tls-cert-file=/etc/kubernetes/cert/kubelet.pem \
          --tls-private-key-file=/etc/kubernetes/cert/kubelet-key.pem \
          --authorization-mode=Webhook \
          --authentication-token-webhook \
          --container-runtime=remote \
          --runtime-request-timeout=15m \
          --container-runtime-endpoint=/var/run/crio/crio.sock \
          --feature-gates=CRIContainerLogRotation=true \
          --container-log-max-size=100Mi \
          --container-log-max-files=3 \
          --streaming-connection-idle-timeout=30m \
          --event-qps=0 \
          --kube-reserved-cgroup=/podruntime.slice \
          --system-reserved-cgroup=/system.slice \
          --pod-max-pids=228748 \
          --kube-reserved=memory=2197Mi,cpu=64m,pid=12708 \
          --system-reserved=memory=3295Mi,cpu=96m,pid=12708 \
          --kubelet-cgroups=/podruntime.slice \
          --runtime-cgroups=/podruntime.slice  \
          --enforce-node-allocatable=pods \
                    --kube-api-qps=20 \
                              --kube-api-burst=40 \
                    --eviction-soft=memory.available<100Mi,nodefs.available<10%,imagefs.available<10%,nodefs.inodesFree<10%,imagefs.inodesFree<10% \
          --eviction-soft-grace-period=memory.available=10m,nodefs.available=10m,imagefs.available=10m,nodefs.inodesFree=10m,imagefs.inodesFree=10m \
          --eviction-hard=memory.available<100Mi,nodefs.available<5%,imagefs.available<5%,nodefs.inodesFree<5%,imagefs.inodesFree<5%
Restart=always
RestartSec=5
TimeoutStartSec=15
SyslogIdentifier=kubelet.service

[Install]
WantedBy=multi-user.target

Comment 6 Mudit Agarwal 2020-07-15 10:50:14 UTC
Elvir, the csi plugin logs are not useful because they don't capture the time when the issue occured. 

But looks like the issue is with the KUBELET path as Madhu mentioned, by default ceph csi uses "/var/lib/kubelet" but here we are using "/var/data/kubelet". 

We can confirm the same by looking at the complete plugin logs, OCS4.5 onwards this path is configurable via rook-cephcsi config file see https://github.com/openshift/ocs-operator/issues/454

For now you have to define the complete path in mountPath field of the pod.

@Madhu, please add/correct if I have missed anything.

Comment 13 Yaniv Kaul 2020-07-16 13:35:16 UTC
So why isn't it closed as dup?

*** This bug has been marked as a duplicate of bug 1801365 ***

Comment 14 Mudit Agarwal 2020-07-16 13:56:28 UTC
(In reply to Yaniv Kaul from comment #13)
> So why isn't it closed as dup?
> 
> *** This bug has been marked as a duplicate of bug 1801365 ***

Waited for someone from ocs-operator to confirm.


Note You need to log in before you can comment on or make changes to this bug.