Bug 1839807 - After deleting pools, Ceph OSD is causing high CPU and NVMe utilisation making cluster unusable
Summary: After deleting pools, Ceph OSD is causing high CPU and NVMe utilisation makin...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 4.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 5.*
Assignee: Neha Ojha
QA Contact: Manohar Murthy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-25 15:54 UTC by karan singh
Modified: 2020-06-03 21:21 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-03 21:21:46 UTC
Embargoed:


Attachments (Terms of Use)

Description karan singh 2020-05-25 15:54:41 UTC
Description of problem:

This has occurred to me twice in the last 2 weeks, desperately needs a fix.

I am conducting benchmarking of Ceph object storage, per the workflow, I ingested 500 Million S3 64K objects into the cluster (EC 4+2 data pool). 

Due to some unrelated reason, i had to delete all the objects from the cluster. So the best quickest ways was to delete the entire RGW data pool containing 500 Million objects. As soon as i did that, i found CPU reaching to 99% on all Ceph OSD nodes. At the same time observed 100% NVMe (bluestore device) across all devices on all OSD nodes. Upon further investigation, found that
- NVMe are 100% utilized because of insane read IOs (almost no write IO)
- This is causing super high IO WAIT on the CPU, causing CPU saturation too
- The OSD devices (HDD) were idle, not doing anything.

I left the system like this for another 8 hours (overnight), later next morning the system still was unusable, 100% CPU utilization, 100% NVMe utilization. As i was running short on time, i had to purge (PV,VG,LV) and re-deploy the entire cluster  



Version-Release number of selected component (if applicable):

RHCS 4.1


How reproducible:

ALWAYS

Steps to Reproduce:
1. Fill Ceph cluster pools with high amount of objects (ex : 500 Million)
2. delete the pool which is storing those 500M objects
3. Check OSd nodes CPU utilization, NVME (Bluestore) utilization

Actual results:

Deleting large pool, causing CPU / NVMe saturation. Making the cluster unusable


Expected results:

After deleting pools (i.e all the underlying PG), Ceph cluster should reclaim all the capacity in a several minutes (if not instantaneously) and should not impact CPU / NVMe devices utilization. 

Best would be if a user deletes a pool, ceph will just logically skip those blocks and just overwrite those at the time of writing the data ? (Just a thought)


Additional info:

Comment 1 RHEL Program Management 2020-05-25 15:54:49 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 karan singh 2020-05-25 18:53:50 UTC
I have previously observed similar behaviour from the system with a low object count (45M).

https://bugzilla.redhat.com/show_bug.cgi?id=1837493#c7

Comment 3 karan singh 2020-05-25 18:55:08 UTC
Exact similar behaviors have been reported in this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1770510

Comment 4 Josh Durgin 2020-06-03 21:21:46 UTC
Deletion of radosgw objects is very expensive. This will be mitigated by moving RGW's bucket index out of omap and longer term improved for small objects in general with seastore. In the short term the fastest thing to do is redeploy the cluster - there's no way to easily delete everything.


Note You need to log in before you can comment on or make changes to this bug.