Description of problem: ======================= After Failback, tried to resync Image from 2nd secondary site but resync didnt synced image as Split brain was detected Version-Release number of selected component (if applicable): ============================================================= 10.2.5-13.el7cp.x86_64 How reproducible: ================= always Steps to Reproduce: =================== 1. have 3 cluster. Site A being primary and Site B and site C are secondary sites (site B has bidirectional relation with A while C has one-directional) 2. enable pool level or image level mirroring for few images. 3. create images and let it sync to secondary.(A->B, A->C) 4. Demote Images on Site A. Promote mirrored images from site B 5. Do some I/O on images from site B. 6. let data synced to Site A from site B. (B->A) 7. Now demote Images from site B and promote Images from Site A. 8. Issue resync command on site C to sync Images with site A. [root@magna099 ubuntu]# rbd mirror image resync i-mirror/image1 --cluster slave2 Flagged image for resync from primary Actual results: =============== Check Image status. [root@magna099 ubuntu]# rbd mirror image status i-mirror/image1 --cluster slave2 image1: global_id: 37ce0781-30ef-416f-9c3a-5ed6124b55ec state: up+error description: error bootstrapping replay last_update: 2017-02-17 17:45:28 Expected results: ================= Image resync is not happening
in repro, step 1, it reads "(site B has bidirectional relation with A while C has one-directional)". What is the difference there?
multiple secondaries are not a blocker for release 2.2.
@Federico: "site B has bidirectional relation with A while C has one-directional" means site B was configured to mirror primary images from site A and site A was configured to to mirror primary images from site B. Site C was configured to only mirror primary images from site A. This resync issue is an issue regardless of whether or not multiple secondaries are in-use should you hit a split-brain condition.
Thanks Jason, understood. One-directional A->B with an optional A->C is the key use case. If we can get bi-directional A->B and B->A for different images/pools in this release, that is great. Do not worry about multiple secondaries at this late stage, we can punt those bugs to 2.3.
Executed bewlow case to verify defect precondition ============ --> have 3 cluster. Site A being primary and Site B and site C are secondary sites (site B has bidirectional relation with A while C has one-directional) --> enable pool level or image level mirroring for few images. --> create images and let it sync to secondary.(A->B, A->C) 1) orderly shutdown a)failover --> demote image on A, promote image on B --> shutdown cluster A --> I/O on image from cluster B b)Failback --> bring up cluster A and let image sync to A --> demote image on B , promote image on A --> resync image on C --> do I/O on image from cluster A and let it sync to cluster B & C 2) nonorderly shutdown a)failover --> bring down cluster A --> force promote image on B --> **WORKAROUND** - restart rbd-mirror on cluster B --> do I/O on image from cluster B b)Failback --> bring cluster A back --> demote Image on A, resync Image on A --> demote image on cluster B, promote image on cluster A --> resync image from cluster C resync worked in both cases, hence moving back to verified verified with version - 10.2.5-29.el7cp.x86_64
(In reply to Rachana Patel from comment #13) > Executed bewlow case to verify defect > > precondition > ============ > --> have 3 cluster. Site A being primary and Site B and site C are secondary > sites > (site B has bidirectional relation with A while C has one-directional) > --> enable pool level or image level mirroring for few images. > --> create images and let it sync to secondary.(A->B, A->C) > > > 1) orderly shutdown > a)failover > --> demote image on A, promote image on B > --> shutdown cluster A > --> I/O on image from cluster B > > b)Failback > --> bring up cluster A and let image sync to A > --> demote image on B , promote image on A > --> resync image on C this should be 'resync image on cluster C from cluster A' > --> do I/O on image from cluster A and let it sync to cluster B & C > > 2) nonorderly shutdown > a)failover > --> bring down cluster A > --> force promote image on B > --> **WORKAROUND** - restart rbd-mirror on cluster B > --> do I/O on image from cluster B > > b)Failback > --> bring cluster A back > --> demote Image on A, resync Image on A > --> demote image on cluster B, promote image on cluster A > --> resync image from cluster C it should be 'resync image from cluster A to cluster C' > > > resync worked in both cases, hence moving back to verified > verified with version - 10.2.5-29.el7cp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0514.html