Bug 1956232
Summary: | [RHEL7][RBD] FailedMount error when using restored PVC on app pod | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Jilju Joy <jijoy> | |
Component: | csi-driver | Assignee: | Rakshith <rar> | |
Status: | CLOSED ERRATA | QA Contact: | Jilju Joy <jijoy> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 4.7 | CC: | dwalveka, ebenahar, idryomov, madam, mrajanna, muagarwa, nberry, ocs-bugs, olakra, owasserm, rar | |
Target Milestone: | --- | Keywords: | Automation | |
Target Release: | OCS 4.8.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | 4.8.0-406.ci | Doc Type: | Bug Fix | |
Doc Text: |
.Newly restored PVC can now be mounted on nodes
Previously, a bug in Ceph-CSI driver caused wrong ‘RBD image not found’ error while mounting newly restored PVC with deleted parent snapshot on nodes with Red Hat Enterprise Linux version of less than 8.2 (with no deep flattening feature). This issue was fixed by flattening the newly restored PVC before mounting on nodes with Red Hat Enterprise Linux version of less than 8.2(with no deep flattening feature).
|
Story Points: | --- | |
Clone Of: | ||||
: | 1962483 1962484 (view as bug list) | Environment: | ||
Last Closed: | 2021-08-03 18:15:57 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1962483, 1962484 |
Description
Jilju Joy
2021-05-03 08:57:56 UTC
As mentioned by Madhu, this will fail because of no kernel support. Questions for QE: 1. This should exist in 4.6 too, have we tested? 2. Do we want to document it, do we know if customers have this kernel version? >2021-04-30T23:21:52.534270944Z I0430 23:21:52.534257 24684 rbd_util.go:814] ID: 2896 Req-ID: 0001-0011-openshift-storage-0000000000000001-6fc62615-aa0a-11eb-b36c-0a580a800411 setting disableInUseChecks on rbd volume to: false
2021-04-30T23:21:52.535308640Z I0430 23:21:52.535278 24684 omap.go:84] ID: 2896 Req-ID: 0001-0011-openshift-storage-0000000000000001-6fc62615-aa0a-11eb-b36c-0a580a800411 got omap values: (pool="ocs-storagecluster-cephblockpool", namespace="", name="csi.volume.6fc62615-aa0a-11eb-b36c-0a580a800411"): map[csi.imageid:5e6410ea0866 csi.imagename:csi-vol-6fc62615-aa0a-11eb-b36c-0a580a800411 csi.volname:pvc-215ee066-b305-4c33-af22-2f8b393d6e7a csi.volume.owner:namespace-test-051dbfaf22db424b8a7df29f8]
2021-04-30T23:21:52.535437394Z E0430 23:21:52.535423 24684 util.go:232] kernel 3.10.0-1160.25.1.el7.x86_64 does not support required features
>
Just to mention, missing flatten supported as mentioned above in kernel ' kernel 3.10.0-1160.25.1.el7.x86_64' in this setup causes it.
(In reply to Mudit Agarwal from comment #4) > As mentioned by Madhu, this will fail because of no kernel support. > > Questions for QE: > > 1. This should exist in 4.6 too, have we tested? I found a single run done as part of OCS4.6 testing over RHEL7 nodes - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j007icm1r3-t1/j007icm1r3-t1_20210218T132305/logs/ Jilju - I see the same test case failed there too, is it from the same reason? (In reply to Mudit Agarwal from comment #4) > As mentioned by Madhu, this will fail because of no kernel support. > > Questions for QE: > > 1. This should exist in 4.6 too, have we tested? > 2. Do we want to document it, do we know if customers have this kernel > version? There is a failure instance of this test case in OCS 4.6 on VSPHERE-UPI/RHEL configuration. Do to some other issues, must-gather logs was not collected after the failure. Run: ocs-ci results for OCS4-6-Downstream-OCP4-6-VSPHERE-UPI-ENCRYPTION-1AZ-RHEL-VSAN-3M-3W-tier1 (BUILD ID: v4.6.0-149.ci RUN ID: 1604372378) Elad,
>> Do we want to document it, do we know if customers have this kernel version?
Emphasizing because if we have customers then this becomes a serious issue and I want it to be backported to 4.6.z/4.7.z
Madhu, can we add doc text for this? We might want to tell them that they should not delete the parent pvc (in case the policy is not 'retain') Updating the Steps to Reproduce: 1. Create an RBD PVC named "pvc-rbd" and attach it to an app pod. 2. Run I/O 3. Create snapshot from pvc-rbd. Ensure snapshot is Ready. 4. Delete the PVC pvc-rbd 5. Restore snapshot and verify the restored PVC is Bound. 6. Delete the volume snapshot. 7. Attach the restored PVC to an app pod. The test case deletes snapshot before attaching the restored PVC to app pod. (In reply to Madhu Rajanna from comment #20) > steps we need to verify > > Create PVC > Create snapshot > Create Restore PVC from snapshot > Delete Snapshot (don't Delete parent PVC) > Mount restore PVC to app pod (mounting will fail) Verified the above with live cluster and csi driver image with the upstream patch( https://github.com/ceph/ceph-csi/pull/2045 ). The patch works. It will result rpc error: code = Internal desc = flatten in progress: flatten is in progress but will eventually succeed. > > Create PVC > Create snapshot > Delete parent PVC > Create Restore PVC from snapshot > Mount restore PVC to app pod (Mounting should be successful) This works as expected too. > Create PVC > Create PVC-PVC clone > Delete parent PVC > Mount PVC to app pod (Mounting should be successful) --> This is already > verified at #17 Madhu, In the doc, is it possible to mention a specific RHEL release or kernel version below which this error can occur ? From the comment #9 , this is applicable for RHEL8 also. Looks good to me. Thanks Jilju/Madhu/Rakshith Doc text needs to be changed as we have fixed this issue now. Verified using the test case tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py::TestSnapshotAtDifferentPvcUsageLevel::test_snapshot_at_different_usage_level. The test case deletes the parent PVC and snapshot before attaching the restored RBD PVC to the pod. Test case logs : http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-jun3/jijoy-jun3_20210603T181951/logs/ocs-ci-logs-1622752627/tests/manage/pv_services/pvc_snapshot/test_snapshot_at_different_pvc_utlilization_level.py/TestSnapshotAtDifferentPvcUsageLevel/test_snapshot_at_different_usage_level/logs Worker nodes kernel version is 3.10.0-1160.25.1.el7.x86_64 Verified in version: OCS operator v4.8.0-407.ci Cluster Version 4.8.0-0.nightly-2021-06-03-101158 Ceph Version 14.2.11-147.el8cp (1f54d52f20d93c1b91f1ec6af4c67a4b81402800) nautilus (stable) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3003 |