Bug 1424881 - [rbd-mirror]: Image syncing fails on 2nd secondary if 1st secondary has completed syncing before it(rbd-mirror daemon on Primary was stop/started in between)
Summary: [rbd-mirror]: Image syncing fails on 2nd secondary if 1st secondary has compl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD
Version: 2.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 2.2
Assignee: Jason Dillaman
QA Contact: Rachana Patel
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-20 02:30 UTC by Rachana Patel
Modified: 2017-07-30 15:26 UTC (History)
4 users (show)

Fixed In Version: RHEL: ceph-10.2.5-32.el7cp Ubuntu: ceph_10.2.5-24redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-14 15:49:54 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 18990 0 None None None 2017-02-20 12:53:44 UTC
Red Hat Product Errata RHBA-2017:0514 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.2 bug fix and enhancement update 2017-03-21 07:24:26 UTC

Description Rachana Patel 2017-02-20 02:30:23 UTC
Description of problem:
=======================
Have multiple secondary sites. Image syncing on 2nd secondary fails if syncing finished on 1st secondary first.


Version-Release number of selected component (if applicable):
=============================================================
10.2.5-13.el7cp.x86_64

How reproducible:
=================
always


Steps to Reproduce:
==================
1. have 3 cluster. Site A-'master' being primary and Site B-'slave1' and site C-'slave2' are secondary sites
(site B has bidirectional relation with A while C has one-directional)
2. enable pool level mirroring
3. create Image but do not enable journaling.
4. Do some I/O using bench-write
5. enable journaling and keep doing I/O on image
6. when sync reaches 20+% stop rbd-mirror on 'master' cluster(Site A)
7. keep checking image status on Site B and Site c.
8. after few seconds start daemon on Site A.
9. stop I/O on image

Actual results:
===============

Syncing on Site B is successful but on Site C it failed after sometime
[root@magna099 ubuntu]# rbd mirror image status con/re1 --cluster slave2
re1:
  global_id:   1d65a791-cc3e-4fb2-a02e-aec830f0113c
  state:       up+syncing
  description: bootstrapping, IMAGE_COPY/COPY_OBJECT 37%
  last_update: 2017-02-19 19:03:37
[root@magna099 ubuntu]# rbd mirror image status con/re1 --cluster slave2
re1:
  global_id:   1d65a791-cc3e-4fb2-a02e-aec830f0113c
  state:       up+syncing
  description: bootstrapping, IMAGE_COPY/COPY_OBJECT 50%
  last_update: 2017-02-19 19:03:57
[root@magna099 ubuntu]# rbd mirror image status con/re1 --cluster slave2
re1:
  global_id:   1d65a791-cc3e-4fb2-a02e-aec830f0113c
  state:       up+error
  description: error bootstrapping replay
  last_update: 2017-02-19 19:04:19


Expected results:
=================
Image should sync to all secondary sites


Additional info:

Comment 4 Federico Lucifredi 2017-02-21 00:41:11 UTC
Multiple secondaries are not a blocker for release 2.2.

Comment 5 Jason Dillaman 2017-02-21 00:51:10 UTC
I believe you should also be able to hit this condition if you delete an old snapshot from the primary image while the non-primary cluster is performing a full image-sync.

Comment 6 Federico Lucifredi 2017-02-21 00:53:02 UTC
timeframe for a fix?

Comment 15 Rachana Patel 2017-02-27 19:46:13 UTC
verified with build - 10.2.5-34.el7cp.x86_64
working as expected hence moving to verified

Comment 17 errata-xmlrpc 2017-03-14 15:49:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html


Note You need to log in before you can comment on or make changes to this bug.