Description of problem (please be detailed as possible and provide log snippests): ---------------------------------------------------------------------- While discussing Clone for CSI, it was seen that even though kubernetes allows providing a size greater than the source PVC size, but ceph CSI doesn't support it yet for OCS 4.6 Hence, in case a user still provides a different size during CLone creation(from UI or CLI), the PVC stays in Pending state and retries keep happening. For Blocking this from UI - Raised Bug 1870331 But it would be good to block the creation from CLI as well, so that the endless loop for creation wouldn't happen Though it is an expected behavior due to current Clone design for ceph CSI, but it would be good to provide a way to block it. Current Observation: ----------------------- If we create a clone via CLI/UI with a size different than source PVC, the Clone PVC stays in Pending state and retries keep happening in the logs Warning ProvisioningFailed 3m30s (x14 over 27m) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-58ffd4559d-pqdb6_e2222a36-5810-4ccd-b322-6c6c336444e2 failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = size missmatch, requested volume size 21474836480 and source volume size 10737418240 Normal ExternalProvisioning 119s (x105 over 27m) persistentvolume-controller Version of all relevant components (if applicable): ---------------------------------------------------------------------- $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-08-18-165040 True False 4h5m Cluster version is 4.6.0-0.nightly-2020-08-18-165040 $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.6.0-533.ci OpenShift Container Storage 4.6.0-533.ci Succeeded sh-4.4# ceph version ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable) sh-4.4# Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ---------------------------------------------------------------------- Yes the Clone PVC is in Pending state and endless retries are happening Is there any workaround available to the best of your knowledge? ---------------------------------------------------------------------- Not sure Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? ---------------------------------------------------------------------- 2 Can this issue reproducible? ---------------------------------------------------------------------- Yes Can this issue reproduce from the UI? ---------------------------------------------------------------------- Yes If this is a regression, please provide more details to justify this: ---------------------------------------------------------------------- No Steps to Reproduce: ---------------------------------------------------------------------- 1. Create an OCS + OCP 4.6 cluster on AWS/Vmware 2. Create a PVC from UI : Storage->PersistentVolumeClaims-> Create Persistent Volume Claim 3. once it is Bound, either create a Clone PVC from UI or CLI by providing a size different than Source volume 4. ClI yaml $ cat clone.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-clone-cli Namespace: test-clone spec: storageClassName: ocs-storagecluster-ceph-rbd dataSource: name: test-pvc kind: PersistentVolumeClaim apiGroup: "" accessModes: - ReadWriteOnce resources: requests: storage: 20Gi # NOTE this capacity must be specified and must be >= the capacity of the source volume Actual results: ---------------------------------------------------------------------- UI/CLI allows to provide a different size for the Clone PVC and the resulting Clone PVC is in Pending state $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE test-clone-cli Pending ocs-storagecluster-ceph-rbd 9m21s test-pvc Bound pvc-222051c3-68ed-4650-8142-856f4f5af361 10Gi RWO ocs-storagecluster-ceph-rbd 39m test-pvc-clone Pending ocs-storagecluster-ceph-rbd 36m Expected results: ---------------------------------------------------------------------- UI and CLI both should handle this more gracefully to not allow providing a different size for Clone PVC. Or atleast an error message should be displayed. Additional info: ---------------------------------------------------------------------- $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE test-clone-cli Pending ocs-storagecluster-ceph-rbd 9m21s test-pvc Bound pvc-222051c3-68ed-4650-8142-856f4f5af361 10Gi RWO ocs-storagecluster-ceph-rbd 39m test-pvc-clone Pending ocs-storagecluster-ceph-rbd 36m $ oc describe pvc test-clone-cli Name: test-clone-cli Namespace: test-clone StorageClass: ocs-storagecluster-ceph-rbd Status: Pending Volume: Labels: <none> Annotations: volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem DataSource: APIGroup: Kind: PersistentVolumeClaim Name: test-pvc Mounted By: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Provisioning 2m13s (x11 over 10m) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-58ffd4559d-pqdb6_e2222a36-5810-4ccd-b322-6c6c336444e2 External provisioner is provisioning volume for claim "test-clone/test-clone-cli" Warning ProvisioningFailed 2m13s (x11 over 10m) openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-58ffd4559d-pqdb6_e2222a36-5810-4ccd-b322-6c6c336444e2 failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = InvalidArgument desc = size missmatch, requested volume size 21474836480 and source volume size 10737418240 Normal ExternalProvisioning 40s (x43 over 10m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator
Neha, thanks for opening this bug as a followup for our discussion. I would like to share couple of thoughts here: First of all, this scenario was communicated ( we will be supporting same size restore) since EPIC readout. This will be documented as well. We had also provided this requirement to UI team since start and the only reason for them to not enable till now was, the code is general one for OCP and OCS drivers. We will check on what extra we can do here from CSI driver: But, I would like to request couple of experiments from your end: 1) Can you try the same for CephFS now with OCS 4.6 build ? -> it should work 2) Can you also try editing or patching the PVC to match the size of parent PVC when its pending and while you have this error ?
(In reply to Humble Chirammal from comment #4) > Neha, thanks for opening this bug as a followup for our discussion. > > I would like to share couple of thoughts here: > > First of all, this scenario was communicated ( we will be supporting same > size restore) since EPIC readout. This will be documented as well. We had > also provided this requirement to UI team since start and the only reason > for them to not enable till now was, the code is general one for OCP and OCS > drivers. > > We will check on what extra we can do here from CSI driver: > > But, I would like to request couple of experiments from your end: > > 1) Can you try the same for CephFS now with OCS 4.6 build ? -> it should > work > 2) Can you also try editing or patching the PVC to match the size of parent > PVC when its pending and while you have this error ? I can try it in a day or two. thanks
I tried cloning with a bigger size than original for CephFS today, it worked without any issues.
(In reply to Humble Chirammal from comment #4) > Neha, thanks for opening this bug as a followup for our discussion. > > I would like to share couple of thoughts here: > > First of all, this scenario was communicated ( we will be supporting same > size restore) since EPIC readout. This will be documented as well. We had > also provided this requirement to UI team since start and the only reason > for them to not enable till now was, the code is general one for OCP and OCS > drivers. > > We will check on what extra we can do here from CSI driver: > > But, I would like to request couple of experiments from your end: > > 1) Can you try the same for CephFS now with OCS 4.6 build ? -> it should > work As per #comment6, this works. > 2) Can you also try editing or patching the PVC to match the size of parent > PVC when its pending and while you have this error ? For pending rbd PVC, editing the requested storage does not work. $ oc patch PersistentVolumeClaim test-clone-pvc-clone -p '{"spec": {"resources": {"requests": {"storage": "5Gi"}}}}' The PersistentVolumeClaim "test-clone-pvc-clone" is invalid: * spec: Forbidden: spec is immutable after creation except resources.requests for bound claims core.PersistentVolumeClaimSpec{ AccessModes: []core.PersistentVolumeAccessMode{"ReadWriteOnce"}, Selector: nil, Resources: core.ResourceRequirements{ Limits: nil, - Requests: core.ResourceList{ - s"storage": {i: resource.int64Amount{value: 5368709120}, s: "5Gi", Format: "BinarySI"}, - }, + Requests: core.ResourceList{ + s"storage": {i: resource.int64Amount{value: 10737418240}, s: "10Gi", Format: "BinarySI"}, + }, }, VolumeName: "", StorageClassName: &"ocs-storagecluster-ceph-rbd", ... // 2 identical fields } * spec.resources.requests.storage: Forbidden: field can not be less than previous value
Thanks Jilju, yeah so we are getting expected result. :). Its blocked to have a lesser value once we have the size, thats what the actual blocker for recovering from failed resize too. however wanted to double confirm this at very first provisioning. The CephFS result is actually temporary till we have a core fix in CephFS on size of clone attribute. Once its available we will be disabling it just to make sure we have parity between our RBD and CephFS driver. We will also explore if we could support this for both. But, I dont want to introduce any confusion here or our current plan is to stick with the requirement listed or agreed at start of the feature planning which is nothing but we wont be allowing volume restore/clone on bigger size.
Humble, have we reached on some consensus here? Should we close this as WONT_FIX or move to next release if there is some plan to do this in future.
Targetting this to OCS 4.7 or beyond as we need to get the support from the upstream kube side.
Madhu, should we add doc_text for this know issue?
Still dependent on upstream kubernetes changes, can't be fixed in 4.9
Yug, it seems that the Cinder CSI-driver does a resize in NodeStageVolume() now. This looks like something that we can do as well. See https://github.com/kubernetes/cloud-provider-openstack/pull/1563 for details.
Sure.