1377875 – [support] OSD recovery causes pause in IO which lasts longer than expected

Bug 1377875 - [support] OSD recovery causes pause in IO which lasts longer than expected

Summary: [support] OSD recovery causes pause in IO which lasts longer than expected

Keywords:
Status:	CLOSED DUPLICATE of bug 1452780
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	1.3.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	3.0
Assignee:	Matt Benjamin (redhat)
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-09-20 21:14 UTC by Mike Hackett
Modified:	2020-01-17 15:56 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-28 19:46:51 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	13104	0	None	None	None	2016-09-20 21:15:21 UTC
Red Hat Bugzilla	1378994	0	medium	CLOSED	[DOC] RGW docs should have clear warnings about large buckets	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1378995	0	unspecified	CLOSED	[RFE] [rhcs-1.3.x] RGW resharding tool	2022-02-21 18:20:45 UTC
Red Hat Bugzilla	1379397	0	high	CLOSED	[DOCS] Request to include information on proper number of shards to configure when using rgw bucket sharding	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1452780	1	None	None	None	2021-02-18 21:48:08 UTC

Internal Links: 1378994 1378995 1379397 1387724 1452780

Description Mike Hackett 2016-09-20 21:14:29 UTC

Description of problem:

OSD node is taken offline for 15 mins (with noout flag set) and is brought back online, when recovery IO starts RadosGW client IO halts on the cluster due to large amounts of slow requests currently waiting for degraded object. When the OSD node was brought offline client IO was still active to the cluster for those 15 minutes.

This OSD node houses 2 SSD and 14 SATA HDD's. SSD OSD's back the radosgw.index pool.

The degraded objects were present in the radosgw index pool which is required to be accessed for each RGW op, so a large range of RGW operations would be affected.

Rack replication is being used with 3 racks, 6 SSD's per rack, 2 per OSD node.

Per upstream Tracker: http://tracker.ceph.com/issues/13104 this is expected behavior as the object is degraded and the OSD is waiting for it to get repaired. Writes to degraded objects (present on the primary) are not allowed in Hammer and below but this has changed in Infernalis.

Initially the recovery threads were throttled on the entire cluster to 1 to prevent client IO impact during cluster recovery but a recommendation was made to increase this value back to the default of 15 on the SSD OSD's, this did not alleviate the issue and issue was seen again on the next node move.

Is recovery operating properly here as expected in Hammer?
Do we have any method to prevent this impact from occurring during an OSD node move?

Version-Release number of selected component (if applicable):
ceph-0.94.5-14.el7cp.x86_64

How reproducible:
Consistent

Steps to Reproduce:
1. Set noout on cluster.
2. Write several GB to cluster.
2. Down one of the OSD nodes.
4. Write several GB to the cluster.
5. Bring OSD node back into cluster, to generate recovery.
6. While recovery is ongoing generate further IO to cluster and validate IO has halted.

Logs from issues are here:

https://api.access.redhat.com/rs/cases/01703018/attachments/ceb38cac-0a54-4781-9d9c-4498f37abddb

https://api.access.redhat.com/rs/cases/01703018/attachments/db1e4598-6e5e-4808-9592-a904738f408f

https://api.access.redhat.com/rs/cases/01703018/attachments/38fd0ded-07d5-4f4e-b71b-10601f9ee58b

Comment 81 Mike Hackett 2016-11-07 21:07:38 UTC

adding needinfo back as last update cleared it.

Note You need to log in before you can comment on or make changes to this bug.