Bug 1630308 - [Doc RFE] Prepare documentation ensuring RBD mirroring for multiple secondary sites with one-way synchronization is production ready
Summary: [Doc RFE] Prepare documentation ensuring RBD mirroring for multiple secondary...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Documentation
Version: 3.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 3.2
Assignee: John Brier
QA Contact: Vasishta
URL:
Whiteboard:
: 1416136 (view as bug list)
Depends On:
Blocks: 1629585
TreeView+ depends on / blocked
 
Reported: 2018-09-18 11:12 UTC by Anjana Suparna Sriram
Modified: 2019-01-23 09:59 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-23 09:59:42 UTC
Embargoed:


Attachments (Terms of Use)

Description Anjana Suparna Sriram 2018-09-18 11:12:26 UTC
User Story:
As a storage admin, I want to know that RBD mirroring for multiple secondary sites with one-way synchronization (i.e. one site is primary for all images) is production ready.

Content Plan Reference: https://docs.google.com/document/d/1Nxnh6XxpTiDO2TANEw5pvXZ0nYUwf36zTaqxCm0014w/edit#heading=h.nh8311opzbco

Comment 9 Vasishta 2018-11-08 11:04:36 UTC
A small fine tune we can do in example commands, 

In step 3 and 4 'Configuring Image Mirroring' (One way mirroring), example command needs to be appended with pool name 'data'. (In second code block of each step)

Comment 10 Vasishta 2018-11-08 11:05:25 UTC
Hi John,

I think we need to address disaster recovery scenario for multi-secondary scenario in section "4.7. Recovering from a Disaster".

I've copied you some info in a mail thread, please let me know if you need anything else, Moving back to ASSIGNED state.


Regards,
Vasishta Shastry
QE, Ceph

Comment 11 John Brier 2018-11-12 01:21:30 UTC
(In reply to Vasishta from comment #10)
> Hi John,
> 
> I think we need to address disaster recovery scenario for multi-secondary
> scenario in section "4.7. Recovering from a Disaster".
> 
> I've copied you some info in a mail thread, please let me know if you need
> anything else, Moving back to ASSIGNED state.
> 
> 
> Regards,
> Vasishta Shastry
> QE, Ceph

Thanks Vasishta. I tried to come up with some instructions for how to do this based on the information in the email. [1] I substituted cluter names "local" for DC1 and "remote" for DC2. Please review them and provide suggestions/corrections as needed. 

1) Add the new primary (remote) as peer on the original primary (local).

$ rbd mirror pool peer add data client.remote@remote --cluster local

^^ I am not sure the "order" of this command is correct. It's not intuitive to me. Is it right?

2) Demote the image on local if it's still listed as primary

a.  To get status of a mirrored image:

rbd mirror image status <pool-name>/<image-name>

Example

To get the status of the image2 image in the data pool:

$ rbd mirror image status data/image2
image2:
  global_id:   2c928338-4a86-458b-9381-e68158da8970
  state:       up+replaying
  description: replaying, master_position=[object_number=6, tag_tid=2,
entry_tid=22598], mirror_position=[object_number=6, tag_tid=2,
entry_tid=29598], entries_behind_master=0
  last_update: 2016-04-28 18:47:39


b. If the state is not up+replaying, demote the image to non-primary:

^^ Is up+replaying equivalent to "primary?" If so, what would it say if it wasn't primary?

rbd mirror image demote <pool-name>/<image-name>

Example

To demote the image2 image in the data pool:

$ rbd mirror image demote data/image2

3) initiate a full resync from the new primary (remote)

To request a resynchronization to the primary image:

rbd mirror image resync <pool-name>/<image-name>

Example

To request resynchronization of the image2 image in the data pool:

$ rbd mirror image resync data/image2

4) Once the resync is complete, demote the image on remote and promote it on local.

a. Demote the image to non-primary:

rbd mirror image demote <pool-name>/<image-name>

Example

To demote the image2 image in the data pool:

$ rbd mirror image demote data/image2

b. Promote the image to primary:

rbd mirror image promote <pool-name>/<image-name>

Example

To promote the image2 image in the data pool:

$ rbd mirror image promote data/image2


1) "You won't be able to perform a traditional failback. Instead, after
the failover from DC1 to DC2 or DC3, you would add the new primary DC
(DC2 or 3) as peer on DC1, demote the image on DC1 (if it's still
listed as primary), and initiate a full resync from the new primary DC
(DC2 or 3). Once the resync is complete, you can demote the image on
DC2 or 3 and promote it on DC1." - Jason Dillaman

Comment 12 Vasishta 2018-11-13 17:53:37 UTC
(In reply to John Brier from comment #11)
Hi John,

(i) All the steps you have come up with are of Failback, Can we add a note in Failover description regarding what a user with multi-secondary need to do ?

(ii) Prior to these steps we need to ask users to get rbd-mirroring daemon up in local, For that we need to ask users to follow Step 2, 5 ,6 on local also. Is it okay to add a note ?

> 
> 1) Add the new primary (remote) as peer on the original primary (local).
> 
> $ rbd mirror pool peer add data client.remote@remote --cluster local
> 
> ^^ I am not sure the "order" of this command is correct. It's not intuitive
> to me. Is it right?

Order had worked for me, So I think it is okay

> 
> 2) Demote the image on local if it's still listed as primary
> 
> a.  To get status of a mirrored image:
> 
> rbd mirror image status <pool-name>/<image-name>
> 
> Example
> 
> To get the status of the image2 image in the data pool:
> 
> $ rbd mirror image status data/image2
> image2:
>   global_id:   2c928338-4a86-458b-9381-e68158da8970
>   state:       up+replaying
>   description: replaying, master_position=[object_number=6, tag_tid=2,
> entry_tid=22598], mirror_position=[object_number=6, tag_tid=2,
> entry_tid=29598], entries_behind_master=0
>   last_update: 2016-04-28 18:47:39
> 
> 
> b. If the state is not up+replaying, demote the image to non-primary:
> 
> ^^ Is up+replaying equivalent to "primary?" If so, what would it say if it
> wasn't primary?
> 

To check whether an image is primary or not, I think the appropriate way would be to check 'rbd info <image-spec>' ('$ rbd info data/image2' in this case)
and up+replying is not equivalent to primary.

> rbd mirror image demote <pool-name>/<image-name>
> 
> Example
> 
> To demote the image2 image in the data pool:
> 
> $ rbd mirror image demote data/image2
> 
> 3) initiate a full resync from the new primary (remote)
> 
> To request a resynchronization to the primary image:
> 
> rbd mirror image resync <pool-name>/<image-name>
> 
> Example
> 
> To request resynchronization of the image2 image in the data pool:
> 
> $ rbd mirror image resync data/image2
> 
> 4) Once the resync is complete, demote the image on remote and promote it on
> local.
> 
> a. Demote the image to non-primary:
> 
> rbd mirror image demote <pool-name>/<image-name>
> 
> Example
> 
> To demote the image2 image in the data pool:
> 
> $ rbd mirror image demote data/image2
> 
> b. Promote the image to primary:
> 
> rbd mirror image promote <pool-name>/<image-name>
> 
> Example
> 
> To promote the image2 image in the data pool:
> 
> $ rbd mirror image promote data/image2

(iii) After these steps we need to ask users to resync image in 2nd secondary (The one which was not promoted during Failover)

(iv) In the steps you have formulated, can we add cluster details (local or primary) ? 
I think it will avoid confusions for users.


Please let me know if you need more info on any of my requests/answers from (i) to (iv).


Regards,
Vasishta Shatsry
QE, Ceph

Comment 17 Vasishta 2018-11-19 10:43:57 UTC
*** Bug 1416136 has been marked as a duplicate of this bug. ***

Comment 20 Vasishta 2018-11-29 11:21:40 UTC
Looks good to me.
Thank you John and Json.

Moving to VERIFIED state.

Regards,
Vasishta Shastry
QE, Ceph

Comment 22 Anjana Suparna Sriram 2019-01-23 09:59:42 UTC
Published on the customer portal as part of the RHCS 3.2 GA on 3rd Jan 2019


Note You need to log in before you can comment on or make changes to this bug.