Bug 1377875 - [support] OSD recovery causes pause in IO which lasts longer than expected
Summary: [support] OSD recovery causes pause in IO which lasts longer than expected
Keywords:
Status: CLOSED DUPLICATE of bug 1452780
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.3.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 3.0
Assignee: Matt Benjamin (redhat)
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-20 21:14 UTC by Mike Hackett
Modified: 2020-01-17 15:56 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-28 19:46:51 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 13104 0 None None None 2016-09-20 21:15:21 UTC
Red Hat Bugzilla 1378994 0 medium CLOSED [DOC] RGW docs should have clear warnings about large buckets 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1378995 0 unspecified CLOSED [RFE] [rhcs-1.3.x] RGW resharding tool 2022-02-21 18:20:45 UTC
Red Hat Bugzilla 1379397 0 high CLOSED [DOCS] Request to include information on proper number of shards to configure when using rgw bucket sharding 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1452780 1 None None None 2021-02-18 21:48:08 UTC


Description Mike Hackett 2016-09-20 21:14:29 UTC
Description of problem:

OSD node is taken offline for 15 mins (with noout flag set) and is brought back online, when recovery IO starts RadosGW client IO halts on the cluster due to large amounts of slow requests currently waiting for degraded object. When the OSD node was brought offline client IO was still active to the cluster for those 15 minutes.

This OSD node houses 2 SSD and 14 SATA HDD's. SSD OSD's back the radosgw.index pool.

The degraded objects were present in the radosgw index pool which is required to be accessed for each RGW op, so a large range of RGW operations would be affected.

Rack replication is being used with 3 racks, 6 SSD's per rack, 2 per OSD node.

Per upstream Tracker: http://tracker.ceph.com/issues/13104 this is expected behavior as the object is degraded and the OSD is waiting for it to get repaired. Writes to degraded objects (present on the primary) are not allowed in Hammer and below but this has changed in Infernalis.

Initially the recovery threads were throttled on the entire cluster to 1 to prevent client IO impact during cluster recovery but a recommendation was made to increase this value back to the default of 15 on the SSD OSD's, this did not alleviate the issue and issue was seen again on the next node move.


Is recovery operating properly here as expected in Hammer?
Do we have any method to prevent this impact from occurring during an OSD node move? 


Version-Release number of selected component (if applicable):
ceph-0.94.5-14.el7cp.x86_64  

How reproducible:
Consistent

Steps to Reproduce:
1. Set noout on cluster.
2. Write several GB to cluster.
2. Down one of the OSD nodes.
4. Write several GB to the cluster.
5. Bring OSD node back into cluster, to generate recovery.
6. While recovery is ongoing generate further IO to cluster and validate IO has halted.

Logs from issues are here:

https://api.access.redhat.com/rs/cases/01703018/attachments/ceb38cac-0a54-4781-9d9c-4498f37abddb

https://api.access.redhat.com/rs/cases/01703018/attachments/db1e4598-6e5e-4808-9592-a904738f408f

https://api.access.redhat.com/rs/cases/01703018/attachments/38fd0ded-07d5-4f4e-b71b-10601f9ee58b

Comment 81 Mike Hackett 2016-11-07 21:07:38 UTC
adding needinfo back as last update cleared it.


Note You need to log in before you can comment on or make changes to this bug.