Bug 2057495

Summary: Alibaba Disk CSI driver does not provision small PVCs
Product: OpenShift Container Platform Reporter: Jan Safranek <jsafrane>
Component: StorageAssignee: Jan Safranek <jsafrane>
Storage sub component: Kubernetes QA Contact: Rohit Patil <ropatil>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: aos-bugs
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Alibaba Cloud supports only volumes larger than 20 GiB. Alibaba CSI driver, shipped as part of OpenShift, returned error when user created a PersistentVolumeClaim (PVC) smaller that 20 GiB with a message 'The specified parameter "Size" is not valid'. We updated the Alibaba CSI driver to automatically increase all volume sizes to at least 20 GiB and smaller PVCs are now dynamically provisioned. For example, a PVC requesting 1 byte will result in a new dynamically provisioned 20 GiB volume. This can could result in increased costs. Cluster admins should consider using quota on PVC count for each namespace in restricted environments.
Story Points: ---
Clone Of:
: 2076671 (view as bug list) Environment:
Last Closed: 2022-08-10 10:50:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2076671, 2098655    

Description Jan Safranek 2022-02-23 14:03:53 UTC
Description of problem:

Alibaba supports volumes only larger than 20 GiB. Generic e2e tests (openshift-tests run openshift/conformance/parallel) create too small PVCs and fail, see:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/5604/pull-ci-openshift-installer-master-e2e-alibaba/1496424012073406464


At least these tests need to be fixed somehow:

[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should not deadlock when a pod's predecessor fails 
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should perform rolling updates and roll backs of template modifications with PVCs 
[sig-storage] PVC Protection Verify that PVC in active use by a pod is not removed immediately
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity
[sig-storage] PVC Protection Verify that scheduling of a pod that uses PVC that is being deleted fails and the pod becomes Unschedulable 
[sig-storage] PVC Protection Verify "immediate" deletion of a PVC that is not in active use by a pod
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should adopt matching orphans and release non-matching pods

Comment 2 Jan Safranek 2022-03-01 16:02:25 UTC
Assigning to Alibaba if they want to implement automatic increase of volumes to 20GiB. See above, we run Kubernetes e2e tests with the Alibaba Disk CSI driver in the default storage class and the tests create really small PVCs (1 byte!). The tests expect that the CSI driver provisions the smallest volume for this 1 byte, which is 20GiB in Alibaba Disk case.

The CSI driver gets:

> time="2022-02-23T10:59:08Z" level=info msg="CreateVolume: Starting CreateVolume, pvc-53021639-69e6-4751-b36e-891273a7b3b6,
> name:\"pvc-53021639-69e6-4751-b36e-891273a7b3b6\"
> capacity_range:<required_bytes:1 >
> volume_capabilities:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > >
> parameters: ...

(edited for readability)

limit_bytes is zero, i.e. unspecified. Strictly from the CSI protocol perspective, the CSI driver can provision as large volume as it wants.

I understand that this may bring some additional costs to the customers, ordering 1 byte and paying for 20 GiB is quite a difference. Still, it's better than ordering 1 byte and getting nothing. What do you think? All the other CSI drivers we ship round the volume size to the smallest size they support, which is typically 1 GiB.

Comment 3 Jan Safranek 2022-03-01 16:03:18 UTC
This blocks our CI, all stateful set tests fail. It's not blocking 4.10 in any way, but we should decide what to do about it soon.

Comment 4 Jan Safranek 2022-03-15 09:10:47 UTC
In addition, from user perspective it looks weird if all other CSI drivers increase the volume size to the smallest size they can provision. For most of them it's 1GiB, but for example IBM has 10 GiB minimum and it does increase the volume size too.

Comment 5 Jan Safranek 2022-03-29 09:17:04 UTC
https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/pull/628 got merged upstream, cherry-picking into OCP now.

Comment 9 Rohit Patil 2022-05-04 11:48:12 UTC
Payload: 4.11.0-0.nightly-2022-04-26-181148
Flexy template: ipi-on-alicloud/versioned-installer-ci 
Verifications: PASS
#1 With default sc, pvc with 20Gi file system, dep, write data => Pass
#2 With default sc, pvc with 1Gi file system, dep, write data => Pass 
#3 With default sc, pvc with 20Gi block, dep, write data => Pass
#4 With default sc, pvc with 1Gi block, dep, write data => Pass
#5 With new sc, pvc with 20Gi fs, dep, write data => Pass
#6 With new sc, pvc with 1Gi fs, dep, write data => Pass
#7 With new sc, pvc with 20Gi bl, dep, write data => Pass
#8 With new sc, pvc with 1Gi bl, dep, write data => Pass 
#9 With new sc volumeSizeAutoAvailable: "false", pvc with 1Gi fs, dep, write data => Pass
#10 47918 ali_csi.go => for fstypes(ext4,ext3,xfs) Golang Automtion
   File: https://github.com/openshift/openshift-tests-private/blob/master/test/extended/storage/ali_csi.go#L30

upgrade Payload: 4.10.0-0.nightly-2022-05-03-165256 => 4.11.0-0.nightly-2022-04-26-181148
#11 to check the sc parameters volumeSizeAutoAvailable: "true" => 
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-upgrade/job/upgrade-runner/9975/console

Earlier it was showing as (Status)open for PR:25, for which reason i did not tested immediately, after checking PR, got to know the PR is merged.
Done sync then tested.   
https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/25 => merged

Comment 11 errata-xmlrpc 2022-08-10 10:50:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069