Bug 1965016
Summary: | [Tracker for BZ #1969301] [RBD][Thick] If storage cluster utilization reach 85% while a thick PVC is being provisioned, deleting the pending PVC will not stop provisioning operation and the image will retain | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Jilju Joy <jijoy> | |
Component: | ceph | Assignee: | Madhu Rajanna <mrajanna> | |
Status: | CLOSED WONTFIX | QA Contact: | Jilju Joy <jijoy> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 4.8 | CC: | assingh, bniver, kramdoss, madam, mhackett, mrajanna, muagarwa, ndevos, ocs-bugs, odf-bz-bot, owasserm, rcyriac, tdesala | |
Target Milestone: | --- | Keywords: | AutomationBackLog | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | v4.9.0-182.ci | Doc Type: | Bug Fix | |
Doc Text: |
.Provisioning attempts did not stop once the storage cluster utilization reached 85% or even after deleting the PVC.
If the storage cluster utilization reaches 85% while an RBD thick PVC is being provisioned, the provisioning attempt will not stop automatically by deleting the pending PVC and the RBD image will not get deleted even after deleting the pending PVC.
The best approach is not to start provisioning if the requested size is beyond the available storage.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1969301 (view as bug list) | Environment: | ||
Last Closed: | 2021-10-20 11:24:23 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1969301 | |||
Bug Blocks: | 1966894 |
Description
Jilju Joy
2021-05-26 14:37:04 UTC
We got a VolumeCreate request at 11:55 >> 2021-05-26T11:55:30.029505495Z I0526 11:55:30.029431 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"rbdthick120", UID:"187bb8db-8f9b-4cfe-bdb1-797cdbe00d02", APIVersion:"v1", ResourceVersion:"1701250", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/rbdthick120" >> 2021-05-26T11:55:30.035659473Z I0526 11:55:30.034289 1 controller.go:731] CreateVolumeRequest name:"pvc-187bb8db-8f9b-4cfe-bdb1-797cdbe00d02" capacity_range:<required_bytes:128849018880 > volume_capabilities:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > parameters:<key:"clusterID" value:"openshift-storage" > parameters:<key:"csi.storage.k8s.io/pv/name" value:"pvc-187bb8db-8f9b-4cfe-bdb1-797cdbe00d02" > parameters:<key:"csi.storage.k8s.io/pvc/name" value:"rbdthick120" > parameters:<key:"csi.storage.k8s.io/pvc/namespace" value:"default" > parameters:<key:"imageFeatures" value:"layering" > parameters:<key:"imageFormat" value:"2" > parameters:<key:"pool" value:"ocs-storagecluster-cephblockpool" > parameters:<key:"thickProvision" value:"true" > secrets:<key:"userID" value:"csi-rbd-provisioner" > secrets:<key:"userKey" value:"AQBEfKtguieTDBAABY/VGkM5TB+JAJvQ6Er1KQ==" > >> 2021-05-26T11:55:30.035659473Z I0526 11:55:30.034969 1 connection.go:182] GRPC call: /csi.v1.Controller/CreateVolume >> 2021-05-26T11:55:30.035659473Z I0526 11:55:30.034978 1 connection.go:183] GRPC request: {"capacity_range":{"required_bytes":128849018880},"name":"pvc-187bb8db-8f9b-4cfe-bdb1-797cdbe00d02","parameters":{"clusterID":"openshift-storage","csi.storage.k8s.io/pv/name":"pvc-187bb8db-8f9b-4cfe-bdb1-797cdbe00d02","csi.storage.k8s.io/pvc/name":"rbdthick120","csi.storage.k8s.io/pvc/namespace":"default","imageFeatures":"layering","imageFormat":"2","pool":"ocs- This request was not completed in next 3 minutes, may be because we didn't have space and at 11:58 we hit a DeadlineExceeded error >> 2021-05-26T11:58:00.036327187Z I0526 11:58:00.034855 1 connection.go:186] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded 2021-05-26T11:58:00.036327187Z I0526 11:58:00.035237 1 controller.go:752] CreateVolume failed, supports topology = false, node selected false => may reschedule = false => state = Background: rpc error: code = DeadlineExceeded desc = context deadline exceeded >> 2021-05-26T11:58:00.036327187Z I0526 11:58:00.035398 1 controller.go:1109] Temporary error received, adding PVC 187bb8db-8f9b-4cfe-bdb1-797cdbe00d02 to claims in progress >> 2021-05-26T11:58:00.036327187Z W0526 11:58:00.035431 1 controller.go:961] Retrying syncing claim "187bb8db-8f9b-4cfe-bdb1-797cdbe00d02", failure 0 >> 2021-05-26T11:58:00.036327187Z E0526 11:58:00.035464 1 controller.go:984] error syncing claim "187bb8db-8f9b-4cfe-bdb1-797cdbe00d02": failed to provision volume with StorageClass "rbdthick": rpc error: code = DeadlineExceeded desc = context deadline exceeded But looks like Kubernetes kept sending CreateVolume request which kept failing with error >> Operation already in progress with the given volume. This looks like a very common case with thick provisioning volume but expected with the current design unless I am missing something here. We need to do some thick provisioning specific fix here. Proposing as a blocker because the cluster remain in same state of storage utilization even after deleting the PVC. Recovery is not possible because the manual attempt to delete the image failed as described in comment #0 Madhu is helping with this one. What we currently do for thick provisioning from ceph is this: 1. Get a PVC creation request (thick provisioned) 2. Create a thin provisioned image 3. Fill it with zeroes for the requested capacity 4. Once it is filled return the response back to the user. Now in step 3, if there is not enough storage there are two options: Option 1. Keep waiting for storage to be available (which is happening in this case) Option 2. Return ENOSPC if not enough space is available, this can be achieved by using a flag in Ceph Even if we implement option 2, we still need to cleanup as there would be a stale image and other leftovers. Admin has to wait for space to be freed up after the clean up. One alternative to this is to check the space upfront before even creating the image and return from there if enough space is not there. There are some issues/corner cases with this approach given that ceph is a distributed system but I think we can live with those for now given that we don't have any other option. Moreover, for those options we will implement option 2 as well (cleanup option) So, for now we are doing two things: 1. short term: From ceph-csi side check the available storage and based on that either create or don't create the image. 2. long term: implement option 2 as well (In reply to Mudit Agarwal from comment #10) > What we currently do for thick provisioning from ceph is this: > 1. Get a PVC creation request (thick provisioned) > 2. Create a thin provisioned image > 3. Fill it with zeroes for the requested capacity > 4. Once it is filled return the response back to the user. > > Now in step 3, if there is not enough storage there are two options: > Option 1. Keep waiting for storage to be available (which is happening in > this case) > Option 2. Return ENOSPC if not enough space is available, this can be > achieved by using a flag in Ceph > > Even if we implement option 2, we still need to cleanup as there would be a > stale image and other leftovers. > Admin has to wait for space to be freed up after the clean up. > > One alternative to this is to check the space upfront before even creating > the image and return from there if enough space is not there. > There are some issues/corner cases with this approach given that ceph is a > distributed system but I think we can live with those for now > given that we don't have any other option. > There are two important scenarios: 1. Attempt to create a single RBD thick PVC which is capable of increasing the total storage cluster utilization(consider replica size also) up to or beyond the limit which makes the storage cluster read-only. Code change to accept or deny volume create request based on the available storage will fix the issue with scenarios 1. 2. Creating an RBD thick PVC which is small in size but while provisioning, the storage cluster became read only because other PVCs were also utilizing the storage. The probability of occurrence of this scenario is less when compared to 1, but this is also important. We need to ensure that no stale image is present. > Moreover, for those options we will implement option 2 as well (cleanup > option) > > So, for now we are doing two things: > 1. short term: From ceph-csi side check the available storage and based on > that either create or don't create the image. What is the maximum achievable storage utilization(if the thick PVC is created) in percentage we plan to set ? We need to set a safe value for this because at 85% the storage cluster will become read only. > 2. long term: implement option 2 as well Yes, for most of the cases we will deny the pvc creation is space is not there. This will avoid any stale image.
>> What is the maximum achievable storage utilization(if the thick PVC is created) in percentage we plan to set ?
Plan is to check it up to 80%, is that ok?
Madhu, FYI
Administrators are encouraged to configure resource quotas for their users. With a storage quota, users can not request more storage than they got assigned: - https://access.redhat.com/documentation/en-us/openshift_container_platform/4.5/html/applications/quotas - https://kubernetes.io/docs/concepts/policy/resource-quotas/#storage-resource-quota When provisioning fails, the image should get deleted. Provisioning does not fail at the moment, but hangs. This is expected to be addressed with https://github.com/ceph/ceph-csi/pull/2115 Removing the acks as this won't/can't be fixed in OCS in 4.8 Keeping it open and as a proposed blocker till we have the recommended recovery steps as well as other findings. We need on more BZ as a tracker for BZ #1969301 As discussed in offline mail as well as separate meetings, there is nothing which we can do in this from csi side. Marking it a tracker for the ceph bz which will improve the situation here. Already flagged this as a known issue for 4.8, moving it out to 4.9 Hi Madhu, What is the expected behavior now ? The fix mentioned in the comment #c13 is not merged. |