Bug 1954030
| Summary: | [Tracker for Ceph BZ #1968325] AWS | reclaim capacity after snapshot deletion is very slow | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Avi Liani <alayani> | |
| Component: | ceph | Assignee: | Patrick Donnelly <pdonnell> | |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Elad <ebenahar> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.7 | CC: | bniver, kramdoss, madam, muagarwa, ocs-bugs, odf-bz-bot, pdonnell | |
| Target Milestone: | --- | Keywords: | Automation, Performance | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1968325 (view as bug list) | Environment: | ||
| Last Closed: | 2022-03-10 01:55:21 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1968325 | |||
Hi Patrick, This is marked as a blocker for OCS4.8, do we need to fix this in 4.8? Thanks Patrick, moving it out of 4.9 Will create a ceph clone if required. BZ #1968325 is not planned for 5.0z1 BZ #1968325 is targeted for RHCS 5.2 |
Description of problem (please be detailed as possible and provide log snippests): I created one PVC of 14 GiB on CephFS volume, and filed it up with data (~10GiB). I took 100 snapshots of this PVC, rewrite all data after each snapshot - total data written to the storage ~3.3 TiB at the end of the test, i deletes all snapshots, the PVC and the project they was created in. watching the `rados df` command in the rook-ceph-toolbox pod, confirm to me that the data is actually deleted from the backend, but very very slow - less than 1M/Min. The cluster is on AWS with M5.4XL worker type and 2TiB OSD size Version of all relevant components (if applicable): Driver versions ================ OCP versions ============== clientVersion: buildDate: "2021-04-09T04:34:49Z" compiler: gc gitCommit: 2513fdbb36e2ddf13bc0b17460151c03eb3a3547 gitTreeState: clean gitVersion: 4.7.0-202104090228.p0-2513fdb goVersion: go1.15.7 major: "" minor: "" platform: linux/amd64 openshiftVersion: 4.7.6 releaseClientVersion: 4.7.7 serverVersion: buildDate: "2021-03-14T16:01:39Z" compiler: gc gitCommit: bafe72fb05eddc8246040b9945ec242b9f805935 gitTreeState: clean gitVersion: v1.20.0+bafe72f goVersion: go1.15.7 major: "1" minor: "20" platform: linux/amd64 Cluster version: NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.6 True False 4h16m Cluster version is 4.7.6 OCS versions ============== NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.7.0-360.ci OpenShift Container Storage 4.7.0-360.ci Succeeded Rook versions =============== rook: 4.7-133.80f8b1112.release_4.7 go: go1.15.7 Ceph versions =============== ceph version 14.2.11-147.el8cp (1f54d52f20d93c1b91f1ec6af4c67a4b81402800) nautilus (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? no Is there any workaround available to the best of your knowledge? not that i am aware of Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? yes Can this issue reproduce from the UI? no If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. deploy OCS 4.7 2. run the ocs-ci test : tests/e2e/performance/test_pvc_multi_snapshot_performance.py::TestPvcMultiSnapshotPerformance::test_pvc_multiple_snapshot_performance[CephFS] - this will take ~3.5Hours 3. watch the reclaim process from UI or CLI Actual results: the reclaim capacity is very slow Expected results: the capacity will be reclaim fast Additional info: all must-gather info will be uploaded.