Bug 1495521

Summary: Rbd-mirror: Re-sync request against the "master" cluster failed to delete the image and sync the image from the "slave" cluster.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Parikshith <pbyregow>
Component: RBD-MirrorAssignee: Jason Dillaman <jdillama>
Status: CLOSED ERRATA QA Contact: Parikshith <pbyregow>
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: bniver, ceph-eng-bugs, ceph-qe-bugs, hnallurv, kdreyer, pbyregow
Target Milestone: rc   
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.1-7.el7cp Ubuntu: ceph_12.2.1-10redhat1xenial Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-05 23:45:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Parikshith 2017-09-26 09:02:10 UTC
Description of problem:


Version-Release number of selected component (if applicable):
12.2.0-2.el7cp

How reproducible:


Steps to Reproduce:
1. A file is written to master, with a delay set on slave(10 mins).
2. Before the delay the master goes down abruptly (stopped the mirror service).
3. The data written on master is not yet synced to slave.

image-2:
  global_id:   97551f32-2ac3-4df7-93a4-95a73d93b3e8
  state:       up+replaying
  description: replaying, master_position=[object_number=7, tag_tid=2, entry_tid=25599], mirror_position=[], entries_behind_master=25602
  last_update: 2017-09-23 16:15:02

4. Force promoted the slave to become primary.

5. After 10 minutes, data got synced on slave(entries_behind_master became 0)   
    During this time it was in up+replaying' state , after the delay state changed to 'up+stopped'(description: force promoted)

image-2:
  global_id:   97551f32-2ac3-4df7-93a4-95a73d93b3e8
  state:       up+replaying
  description: replaying, master_position=[object_number=7, tag_tid=2, entry_tid=25599], mirror_position=[object_number=7, tag_tid=2, entry_tid=25599], entries_behind_master=0
  last_update: 2017-09-23 16:25:54
  
6. Brought back the master, demoted it and did a re-sync.

$rbd mirror image status data/image-2 --cluster master
image-2:
  global_id:   f84a899f-909e-4e23-9428-ee31c5ca14fa
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=3, entry_tid=3], mirror_position=[object_number=3, tag_tid=3, entry_tid=3], entries_behind_master=0
  last_update: 2017-09-23 16:27:58

7. Checked the size of the images on both clusters.(wired snap was created)

Master:
$sudo rbd du -p data --cluster master
warning: fast-diff map is not enabled for image-2. operation may be slow.
NAME                                                                                                                                                               PROVISIONED USED 
image-2@.rbd-mirror.bc392583-5662-4721-8726-55573351bd8f.a96731e9-d9a0-4c7b-8b96-e844e8502421                                       1024M 152M 
image-2                                                                                                                                                                    
                                                         1024M    0 
<TOTAL>                                                                                                                                                               
                                                          1024M 152M 

slave:
$sudo rbd du -p data --cluster slave
warning: fast-diff map is not enabled for image-2. operation may be slow.
NAME                                                                                                                                                           PROVISIONED USED 
image-2@.rbd-mirror.bc392583-5662-4721-8726-55573351bd8f.a96731e9-d9a0-4c7b-8b96-e844e8502421                                           1024M    0 
image-2                                                                                                                                                                      
                                                            1024M    0 
<TOTAL>                                                                                                                                                                    
                                                             1024M    0

Actual results:
After re-sync on "master" cluster it failed delete the image and sync the image from the "slave" cluster

Expected results:
primary and secondary images should have of same size.

Additional info:

Comment 2 Jason Dillaman 2017-09-26 20:04:23 UTC
Upstream master branch PR: https://github.com/ceph/ceph/pull/17979

Comment 13 errata-xmlrpc 2017-12-05 23:45:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387