Description of problem:
Alibaba supports volumes only larger than 20 GiB. Generic e2e tests (openshift-tests run openshift/conformance/parallel) create too small PVCs and fail, see:
At least these tests need to be fixed somehow:
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should not deadlock when a pod's predecessor fails
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should perform rolling updates and roll backs of template modifications with PVCs
[sig-storage] PVC Protection Verify that PVC in active use by a pod is not removed immediately
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity
[sig-storage] PVC Protection Verify that scheduling of a pod that uses PVC that is being deleted fails and the pod becomes Unschedulable
[sig-storage] PVC Protection Verify "immediate" deletion of a PVC that is not in active use by a pod
[sig-apps] StatefulSet Basic StatefulSet functionality [StatefulSetBasic] should adopt matching orphans and release non-matching pods
StatefulSets use 1 byte volumes: https://github.com/kubernetes/kubernetes/blob/296bf4f01668374ade252a751d4c3567917b9890/test/e2e/framework/statefulset/fixtures.go#L107
PVC protection use 1GiB: https://github.com/kubernetes/kubernetes/blob/296bf4f01668374ade252a751d4c3567917b9890/test/e2e/storage/pvc_protection.go#L82
(here it could be possible to use HostPath PVs, as PV protection does)
Assigning to Alibaba if they want to implement automatic increase of volumes to 20GiB. See above, we run Kubernetes e2e tests with the Alibaba Disk CSI driver in the default storage class and the tests create really small PVCs (1 byte!). The tests expect that the CSI driver provisions the smallest volume for this 1 byte, which is 20GiB in Alibaba Disk case.
The CSI driver gets:
> time="2022-02-23T10:59:08Z" level=info msg="CreateVolume: Starting CreateVolume, pvc-53021639-69e6-4751-b36e-891273a7b3b6,
> capacity_range:<required_bytes:1 >
> volume_capabilities:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > >
> parameters: ...
(edited for readability)
limit_bytes is zero, i.e. unspecified. Strictly from the CSI protocol perspective, the CSI driver can provision as large volume as it wants.
I understand that this may bring some additional costs to the customers, ordering 1 byte and paying for 20 GiB is quite a difference. Still, it's better than ordering 1 byte and getting nothing. What do you think? All the other CSI drivers we ship round the volume size to the smallest size they support, which is typically 1 GiB.
This blocks our CI, all stateful set tests fail. It's not blocking 4.10 in any way, but we should decide what to do about it soon.
In addition, from user perspective it looks weird if all other CSI drivers increase the volume size to the smallest size they can provision. For most of them it's 1GiB, but for example IBM has 10 GiB minimum and it does increase the volume size too.
https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/pull/628 got merged upstream, cherry-picking into OCP now.
Flexy template: ipi-on-alicloud/versioned-installer-ci
#1 With default sc, pvc with 20Gi file system, dep, write data => Pass
#2 With default sc, pvc with 1Gi file system, dep, write data => Pass
#3 With default sc, pvc with 20Gi block, dep, write data => Pass
#4 With default sc, pvc with 1Gi block, dep, write data => Pass
#5 With new sc, pvc with 20Gi fs, dep, write data => Pass
#6 With new sc, pvc with 1Gi fs, dep, write data => Pass
#7 With new sc, pvc with 20Gi bl, dep, write data => Pass
#8 With new sc, pvc with 1Gi bl, dep, write data => Pass
#9 With new sc volumeSizeAutoAvailable: "false", pvc with 1Gi fs, dep, write data => Pass
#10 47918 ali_csi.go => for fstypes(ext4,ext3,xfs) Golang Automtion
upgrade Payload: 4.10.0-0.nightly-2022-05-03-165256 => 4.11.0-0.nightly-2022-04-26-181148
#11 to check the sc parameters volumeSizeAutoAvailable: "true" =>
Earlier it was showing as (Status)open for PR:25, for which reason i did not tested immediately, after checking PR, got to know the PR is merged.
Done sync then tested.
https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/25 => merged
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.