1367186 – [RFE] mirroring to multiple secondaries from a single primary

Bug 1367186 - [RFE] mirroring to multiple secondaries from a single primary

Summary: [RFE] mirroring to multiple secondaries from a single primary

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RBD
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	3.2
Assignee:	Jason Dillaman
QA Contact:	Vasishta
Docs Contact:	Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks:	1416922 1494421 1629656
TreeView+	depends on / blocked

Reported:	2016-08-15 19:40 UTC by Federico Lucifredi
Modified:	2019-01-03 19:01 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	.Support for RBD mirroring to multiple secondary clusters Mirroring RADOS Block Devices (RBD) from one primary cluster to multiple secondary clusters is now fully supported.
Clone Of:
Environment:
Last Closed:	2019-01-03 19:01:20 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	17028	0	None	None	None	2016-08-15 19:55:29 UTC
Red Hat Product Errata	RHBA-2019:0020	0	None	None	None	2019-01-03 19:01:49 UTC

Description Federico Lucifredi 2016-08-15 19:40:20 UTC

Description of problem:

Currently, we support use of RBD mirroring with a single secondary site.

We have an outstanding support exception for multiple secondaries where synchronization is one-way (meaning only one site is primary for all images). 

We want to test this configuration as we are supporting it for an important customer.

Comment 2 Jason Dillaman 2017-01-04 21:03:05 UTC

This is a QE BZ for creating test cases for this scenario.

Comment 4 Harish NV Rao 2017-01-18 11:02:57 UTC

@Federico and Jason, we have few questions:
1) How many secondary sites are supported?
2) Is the delay between primary and all secondary same or different? Please let us know the delay specification
3) Should this be tested with RHEL OSP? which version of OSP? What specific use cases from OSP need to be tested?
4) how is process wise? all secondary sites will share one process on master or a new process for each secondary site? In case of former how is primary going to distribute time/resources for each site?
5) How an image is determined to be in synced state? is it when all secondary sites have synced or something else?
6) Should this be tested on Ubuntu also?

Comment 5 Harish NV Rao 2017-01-18 11:03:30 UTC

@Jason, please check comment 4

Comment 6 Jason Dillaman 2017-01-18 14:05:28 UTC

@Harish:

1) While there is no technical upper limit imposed, the use-cases I've seen so far are aimed at 2 secondary sites. The OpenStack group has even discussed a ring topology where each region (and Ceph cluster) uses a unique set of pools (i.e. region 1 has r1_images, region 2 has r2_images, ...), and mirroring is configured bidirectionally between two sites in a pair-wise fashion (r1 r1_images <-> r2 r1_images, r2 r2_images <-> r3 r2_images, and r3 r3_images <-> r1 r3_images).

2) Not quite sure what you are asking here, but the two secondaries do not require the same latency between the primary site. However, if throughput is less than IO injected into the image's journal, this will result in journal growth.

3) OpenStack Ocata is gaining the Cinder integration w/ enabling RBD mirroring on a per-image basis. 

4) rbd-mirror is a pull operation, so the daemon would be on the non-primary site(s) and pulling data from the primary site.

5) With rbd-mirroring, there isn't a "synced" state since the goal is just to provide consistency. If the link is fast enough and the primary image isn't being written against, they will be "synced". The best way to create a sync point is to create a snapshot on the primary image -- when the snapshot appears on the non-primary site, they are "synced" up to the creation of the snapshot.

Comment 7 Federico Lucifredi 2017-01-20 18:23:15 UTC

1) let's start with supporting two secondaries.

2) the latency will nearly always be different by virtue of different geographic distances.

3) not until OSP 11. Test Ceph only this time around.

6) do run a few tests on Ubuntu for "smoke testing", but there should not be platform difference of note here.

Comment 8 Harish NV Rao 2017-01-24 09:18:37 UTC

(In reply to Federico Lucifredi from comment #7)
> 1) let's start with supporting two secondaries.

That means there will not be any mirroring happening between the two secondary sites. Right?

Please let us know.

Comment 9 Harish NV Rao 2017-01-24 14:41:23 UTC

(In reply to Harish NV Rao from comment #8)
> (In reply to Federico Lucifredi from comment #7)
> > 1) let's start with supporting two secondaries.
> 
> That means there will not be any mirroring happening between the two
> secondary sites. Right?

To be more specific, does the following scope sound ok?
"There will be one primary and two secondary sites with Secondary sites configured PURELY as back up sites. That is, there will not be any mirroring established between secondary sites and also secondary sites do not host any pool or image for which they are 'primary' "

Comment 11 Federico Lucifredi 2017-01-24 21:27:58 UTC

That is correct - mirroring is from a primary to a secondary, and not between secondaries.

Comment 12 Rachana Patel 2017-01-25 18:29:57 UTC

(In reply to Federico Lucifredi from comment #11)
> That is correct - mirroring is from a primary to a secondary, and not
> between secondaries.

Please let us know if this is valid/expected use case or not?
Pool - data1 ; site A is primary and Site B-Site C  are secondary sites for mirroring
At the same time for Pool - Data2 :- site B is primaray and Site C, Site A are Secondary sites

Comment 15 Federico Lucifredi 2017-02-21 00:34:16 UTC

Rachana, the use case described in #12 is not currently supported, but it will be in future releases.

One primary and one secondary are the key use case for mirroring at the moment. One primary with multiple secondaries is supposed to work, and would be interesting to test.  The intersecting primary/secondary pools in #12 are interesting but not a testing priority.

Comment 35 Vasishta 2018-11-09 06:22:48 UTC

All planned Testcases are completed successfully (No blockers), 
Moving BZ to VERIFIED state.

Regards,
Vasishta shastry
QE, Ceph

Comment 37 errata-xmlrpc 2019-01-03 19:01:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0020

Note You need to log in before you can comment on or make changes to this bug.