Bug 1424881

Summary: [rbd-mirror]: Image syncing fails on 2nd secondary if 1st secondary has completed syncing before it(rbd-mirror daemon on Primary was stop/started in between)
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Rachana Patel <racpatel>
Component: RBDAssignee: Jason Dillaman <jdillama>
Status: CLOSED ERRATA QA Contact: Rachana Patel <racpatel>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.2CC: ceph-eng-bugs, flucifre, hnallurv, tserlin
Target Milestone: rc   
Target Release: 2.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.5-32.el7cp Ubuntu: ceph_10.2.5-24redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-14 15:49:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachana Patel 2017-02-20 02:30:23 UTC
Description of problem:
=======================
Have multiple secondary sites. Image syncing on 2nd secondary fails if syncing finished on 1st secondary first.


Version-Release number of selected component (if applicable):
=============================================================
10.2.5-13.el7cp.x86_64

How reproducible:
=================
always


Steps to Reproduce:
==================
1. have 3 cluster. Site A-'master' being primary and Site B-'slave1' and site C-'slave2' are secondary sites
(site B has bidirectional relation with A while C has one-directional)
2. enable pool level mirroring
3. create Image but do not enable journaling.
4. Do some I/O using bench-write
5. enable journaling and keep doing I/O on image
6. when sync reaches 20+% stop rbd-mirror on 'master' cluster(Site A)
7. keep checking image status on Site B and Site c.
8. after few seconds start daemon on Site A.
9. stop I/O on image

Actual results:
===============

Syncing on Site B is successful but on Site C it failed after sometime
[root@magna099 ubuntu]# rbd mirror image status con/re1 --cluster slave2
re1:
  global_id:   1d65a791-cc3e-4fb2-a02e-aec830f0113c
  state:       up+syncing
  description: bootstrapping, IMAGE_COPY/COPY_OBJECT 37%
  last_update: 2017-02-19 19:03:37
[root@magna099 ubuntu]# rbd mirror image status con/re1 --cluster slave2
re1:
  global_id:   1d65a791-cc3e-4fb2-a02e-aec830f0113c
  state:       up+syncing
  description: bootstrapping, IMAGE_COPY/COPY_OBJECT 50%
  last_update: 2017-02-19 19:03:57
[root@magna099 ubuntu]# rbd mirror image status con/re1 --cluster slave2
re1:
  global_id:   1d65a791-cc3e-4fb2-a02e-aec830f0113c
  state:       up+error
  description: error bootstrapping replay
  last_update: 2017-02-19 19:04:19


Expected results:
=================
Image should sync to all secondary sites


Additional info:

Comment 4 Federico Lucifredi 2017-02-21 00:41:11 UTC
Multiple secondaries are not a blocker for release 2.2.

Comment 5 Jason Dillaman 2017-02-21 00:51:10 UTC
I believe you should also be able to hit this condition if you delete an old snapshot from the primary image while the non-primary cluster is performing a full image-sync.

Comment 6 Federico Lucifredi 2017-02-21 00:53:02 UTC
timeframe for a fix?

Comment 15 Rachana Patel 2017-02-27 19:46:13 UTC
verified with build - 10.2.5-34.el7cp.x86_64
working as expected hence moving to verified

Comment 17 errata-xmlrpc 2017-03-14 15:49:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html