Bug 1887556 - [rbd] Block volume mode PVC expansion pending on 'FileSystemResizePending' on OCS 4.5.1 - OCP 4.6 cluster
Summary: [rbd] Block volume mode PVC expansion pending on 'FileSystemResizePending' on...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat
Component: csi-driver
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.5.1
Assignee: Mudit Agarwal
QA Contact: Jilju Joy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-12 19:31 UTC by Jilju Joy
Modified: 2020-10-27 14:09 UTC (History)
8 users (show)

Fixed In Version: v4.5.1-130.ci
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 14:09:09 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ceph-csi pull 10 0 None closed Bug 1887556: rbd: Bail out from nodeexpansion if its block mode pvc 2020-11-20 11:31:00 UTC

Description Jilju Joy 2020-10-12 19:31:41 UTC
Description of problem (please be detailed as possible and provide log
snippests):
-----------

Rbd block volume mode PVC expansion is not succeeding but waiting on 'FileSystemResizePending'. This issue is seen in OCP4.6-OCS4.5.1 cluster. PV capacity changed but capacity filed in PVC is not changing. PVC remains in condition "FileSystemResizePending" with a message "Waiting for user to (re-)start a pod to finish file system resize of volume on node."
The PVC was already atached to a pod before doing the expansion.

The cluster was deployed with OCS 4.5 and OCP 4.5 GA versions and then upgraded to OCS 4.5.1 and OCP 4.6 before running the test. The test case passed in OCS 4.5.1 - OCP 4.5 cluster.


Test case: tests/manage/pv_services/pvc_resize/test_pvc_expansion.py::TestPvcExpand::test_pvc_expansion
must-gather : http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-45-oct12/jijoy-45-oct12_20201012T123140/logs/failed_testcase_ocs_logs_1602519643/test_pvc_expansion_ocs_logs/
Test case logs : http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-45-oct12/jijoy-45-oct12_20201012T123140/logs/ocs-ci-logs-1602519643/tests/manage/pv_services/pvc_resize/test_pvc_expansion.py/TestPvcExpand/test_pvc_expansion/


PVCs which remained in 'FileSystemResizePending' when tried to expand to 25Gi

NAMESPACE                                         NAME                                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   
namespace-test-3a6b08f22e5840f388d8dddbc037950c   pvc-test-9898596005f1418fad089a2c761863a0   Bound    pvc-e1dea4aa-2ff6-47ca-8bca-543da3095a22   10Gi       RWO            ocs-storagecluster-ceph-rbd   6m3s
namespace-test-3a6b08f22e5840f388d8dddbc037950c   pvc-test-cf0afaaf6d96489e9ea6f8af722f5679   Bound    pvc-9b8eba03-d479-4643-8f5d-21ea497e4529   10Gi       RWX            ocs-storagecluster-ceph-rbd   6m3s


Status section in PVC pvc-test-9898596005f1418fad089a2c761863a0 yaml


status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-10-12T16:23:31Z"
    message: Waiting for user to (re-)start a pod to finish file system resize of
      volume on node.
    status: "True"
    type: FileSystemResizePending
  phase: Bound
  
  
  
Error logs in csi-rbdplugin-bvh8m pod csi-rbdplugin container.

2020-10-12T16:24:58.777935340Z E1012 16:24:58.777895    4810 utils.go:163] ID: 67 Req-ID: 0001-0011-openshift-storage-0000000000000001-0433c7c5-0ca7-11eb-a758-0a580a81021e GRPC error: rbd: resize failed on path /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-e1dea4aa-2ff6-47ca-8bca-543da3095a22/ce1512aa-4412-485c-8fa8-141f4ccbd12b, error: <nil>
2020-10-12T16:26:02.900019402Z E1012 16:26:02.899951    4810 utils.go:163] ID: 73 Req-ID: 0001-0011-openshift-storage-0000000000000001-0433c7c5-0ca7-11eb-a758-0a580a81021e GRPC error: rbd: resize failed on path /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-e1dea4aa-2ff6-47ca-8bca-543da3095a22/ce1512aa-4412-485c-8fa8-141f4ccbd12b, error: <nil>



Tested manually also. Seems like the condition 'FileSystemResizePending' remains forever (waited for an hour). Tried to attach the PVC which is in condition 'FileSystemResizePending' to new pod (after deleting the existing pods). The new pod did not came up running. The pod remained in ContainerCreating state.

State:          Waiting
      Reason:       ContainerCreating

Events:
  Type     Reason                  Age                   From                     Message
  ----     ------                  ----                  ----                     -------
  Normal   SuccessfulAttachVolume  51m                   attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-605aec52-988f-4a31-83d2-3e2c9021ce0e"
  Warning  FailedMapVolume         50m (x8 over 51m)     kubelet, compute-0       MapVolume.MarkVolumeAsMounted failed while expanding volume for volume "pvc-605aec52-988f-4a31-83d2-3e2c9021ce0e" : Expander.NodeExpand failed to expand the volume : rpc error: code = Unknown desc = rbd: resize failed on path /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-605aec52-988f-4a31-83d2-3e2c9021ce0e/1aa6e5b5-5d3d-489b-9837-734f5360680b, error: <nil>
  Warning  FailedMount             11m (x17 over 49m)    kubelet, compute-0       Unable to attach or mount volumes: unmounted volumes=[my-volume], unattached volumes=[default-token-8nxk9 my-volume]: timed out waiting for the condition
  Normal   SuccessfulMountVolume   4m45s (x31 over 51m)  kubelet, compute-0       MapVolume.MapPodDevice succeeded for volume "pvc-605aec52-988f-4a31-83d2-3e2c9021ce0e" volumeMapPath "/var/lib/kubelet/pods/1aa6e5b5-5d3d-489b-9837-734f5360680b/volumeDevices/kubernetes.io~csi"
  Normal   SuccessfulMountVolume   41s (x33 over 51m)    kubelet, compute-0       MapVolume.MapPodDevice succeeded for volume "pvc-605aec52-988f-4a31-83d2-3e2c9021ce0e" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-605aec52-988f-4a31-83d2-3e2c9021ce0e/dev"

=================================================================================

Version of all relevant components (if applicable):
----------------------------------------------------

$ oc -n openshift-storage get csv
NAME                         DISPLAY                       VERSION        REPLACES              PHASE
ocs-operator.v4.5.1-594.ci   OpenShift Container Storage   4.5.1-594.ci   ocs-operator.v4.5.0   Succeeded

$ oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-rc.2   True        False         3h6m    Cluster version is 4.6.0-rc.2

# ceph version
ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable)


$ oc describe pod csi-rbdplugin-provisioner-7fc6f7bf87-2vw4l -n openshift-storage|grep  Image
    Image:         quay.io/rhceph-dev/ose-csi-external-provisioner@sha256:8a6c3a0bd6f2de58b632dbde2dd0847a8ce15ce0429a8bb9a3d5787ef38c1ea2
    Image ID:      quay.io/rhceph-dev/ose-csi-external-provisioner@sha256:46f51a9ccd4cf6d8543f2167fabffc34d15834ea8f023ed91eebbdbf3bfb3635
    Image:         quay.io/rhceph-dev/ose-csi-external-resizer@sha256:50db03c10e55f2f439ce910037ee874c9285157761b0d6b3d078d3349fa6bde3
    Image ID:      quay.io/rhceph-dev/ose-csi-external-resizer@sha256:1b0b15c0edf1aa3e2098b2230752b977c734b08514a1aa074fd3b5cff00b92d2
    Image:         quay.io/rhceph-dev/ose-csi-external-attacher@sha256:a13c104ea13944963e4ca94ae6d413a428cfd6a6d6938917a8deeb16d3c3f073
    Image ID:      quay.io/rhceph-dev/ose-csi-external-attacher@sha256:72e8623ee661e5120c225d7bf2c5716a74e391fee0c9cc2bd2a0e9b685d7163d
    Image:         quay.io/rhceph-dev/cephcsi@sha256:c5ab6ca51a15066b2bdb1e9cc79afd071014147f01858f73b804750dc2376a79
    Image ID:      quay.io/rhceph-dev/cephcsi@sha256:795bcc3f751015e78314d7f894d8812cb49f49ff2c20d83328154fc641d0ee73
    Image:         quay.io/rhceph-dev/cephcsi@sha256:c5ab6ca51a15066b2bdb1e9cc79afd071014147f01858f73b804750dc2376a79
    Image ID:      quay.io/rhceph-dev/cephcsi@sha256:795bcc3f751015e78314d7f894d8812cb49f49ff2c20d83328154fc641d0ee73

==========================================================================


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, PVC expansion is not succeeding


Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1


Can this issue reproducible?
Yes


Can this issue reproduce from the UI?
Yes


If this is a regression, please provide more details to justify this:
Works in OCS 4.5.0

Steps to Reproduce:
1. Create and rbd block volume mode PVC and attach it to a pod.
2. Expand the PVC and wait for the process to complete



Actual results:
PVC expansion pending on 'FileSystemResizePending'


Expected results:
PVC expansion should succeed and the Capacity field in PVC should reflect the expanded size.


Additional info:

Comment 13 Mudit Agarwal 2020-10-13 11:08:05 UTC
Jilju tested the image with backport PR on OCP4.6/OCS4.5, fix is working fine. Providing the dev ack.

We need it to test with OCP 4.5 and OCP 4.4 as well, meanwhile I will raise the backport PR.

Comment 14 Humble Chirammal 2020-10-13 11:32:08 UTC
(In reply to Mudit Agarwal from comment #13)
> Jilju tested the image with backport PR on OCP4.6/OCS4.5, fix is working
> fine. Providing the dev ack.
> 


Thanks Mudit and jilju!

Comment 15 Humble Chirammal 2020-10-13 11:43:32 UTC
PR got merged to release-4.5 branch. We will move this to ON_QA once the next OCS 4.5 build available.

Comment 18 Jilju Joy 2020-10-13 16:41:16 UTC
(In reply to Mudit Agarwal from comment #13)
> Jilju tested the image with backport PR on OCP4.6/OCS4.5, fix is working
> fine. Providing the dev ack.
> 
> We need it to test with OCP 4.5 and OCP 4.4 as well, meanwhile I will raise
> the backport PR.

Tested in OCP 4.5.13 - OCS 4.5.1-130.ci. 
Test cases passed.
1. tests/manage/pv_services/pvc_resize/test_pvc_expansion.py::TestPvcExpand::test_pvc_expansion
2. tests/manage/pv_services/pvc_resize/test_pvc_expansion.py::TestPvcExpand::test_pvc_expand_expanded_pvc

Comment 20 Jilju Joy 2020-10-13 19:09:04 UTC
Tested in version:

OCS operator	v4.5.1-600.ci
Cluster Version	4.6.0-0.nightly-2020-10-13-064047
Ceph Version	14.2.8-91.el8cp
rook_csi_ceph	cephcsi@sha256:00100822dbcdbc86286bdde8be6a459c698abd5c22d335bc121185f0a59a5b44
rook_csi_resizer  ose-csi-external-resizer@sha256:50db03c10e55f2f439ce910037ee874c9285157761b0d6b3d078d3349fa6bde3

Test cases passed:
tests/manage/pv_services/pvc_resize/test_pvc_expansion.py::TestPvcExpand::test_pvc_expansion
tests/manage/pv_services/pvc_resize/test_pvc_expansion.py::TestPvcExpand::test_pvc_expand_expanded_pvc

Comment 21 Humble Chirammal 2020-10-14 03:38:34 UTC
(In reply to Jilju Joy from comment #20)
> Tested in version:
> 
> OCS operator	v4.5.1-600.ci
> Cluster Version	4.6.0-0.nightly-2020-10-13-064047
> Ceph Version	14.2.8-91.el8cp
> rook_csi_ceph
> cephcsi@sha256:
> 00100822dbcdbc86286bdde8be6a459c698abd5c22d335bc121185f0a59a5b44
> rook_csi_resizer 
> ose-csi-external-resizer@sha256:
> 50db03c10e55f2f439ce910037ee874c9285157761b0d6b3d078d3349fa6bde3
> 
> Test cases passed:
> tests/manage/pv_services/pvc_resize/test_pvc_expansion.py::TestPvcExpand::
> test_pvc_expansion
> tests/manage/pv_services/pvc_resize/test_pvc_expansion.py::TestPvcExpand::
> test_pvc_expand_expanded_pvc

Thanks a lot Jilju for the quick verification!

Comment 23 Jilju Joy 2020-10-14 06:01:06 UTC
Based on comment #18, comment #20 and comment #22, marking this bug as verified.


Note You need to log in before you can comment on or make changes to this bug.