Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1421311

Summary: [rbd-mirror] : renaming of image is not synced to secondary sites
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Rachana Patel <racpatel>
Component: RBDAssignee: Jason Dillaman <jdillama>
Status: CLOSED DUPLICATE QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.2CC: ceph-eng-bugs
Target Milestone: rc   
Target Release: 2.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-12 17:42:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachana Patel 2017-02-11 00:10:01 UTC
Description of problem:
=======================
Did rename of multiple images on primary site, all rename were synced to secondary site except one. In that case, rename was not synced to secondary site and on secondary sites description say 'failed to commit journal event'


Version-Release number of selected component (if applicable):
==============================================================
10.2.5-13.el7cp.x86_64


How reproducible:
=================
only once/intermittent


Steps to Reproduce:
===================
1. had a ceph cluster where one site is primary for mirrorring and 2 sites are secondary.(each site have one MON, 3 OSD and 1rbd-mirror node)
2. Enabled pool level mirroring on pool data1
3. created few images in pool data1. all images were synced to both secondaries
4. rename all those images on primary site.


Actual results:
===============
only one rename was not synced to seconary site

primary site:-
-------------
rename data1/dataset10 to data1/dataset10new


secondary site
--------------
[root@magna099 ubuntu]# rbd mirror image status data1/dataset10 --cluster slave2
dataset10:
global_id: 1aefcc7a-1f08-40be-9073-2715d49bdc9f
state: up+error
description: failed to commit journal event
last_update: 2017-02-09 19:53:34
[root@magna099 ubuntu]# rbd ls data1 --cluster slave2 | grep 10
dataset10
dataset101
dataset102

[root@magna100 ubuntu]# rbd mirror image status data1/dataset10new --cluster slave1
rbd: error opening image dataset10new: (2) No such file or directory
[root@magna100 ubuntu]# rbd mirror image status data1/dataset10 --cluster slave1
dataset10:
global_id: 1aefcc7a-1f08-40be-9073-2715d49bdc9f
state: up+error
description: failed to commit journal event
last_update: 2017-02-09 19:49:56


Expected results:
=================
rename should sync to secondary site


Additional info:

Comment 2 Jason Dillaman 2017-02-12 17:42:38 UTC
Issue occurred when a "snap protect" was used against an image that did not support the layering feature. This recorded an error in the journal which resulted in a split-brain as expected.

*** This bug has been marked as a duplicate of bug 1365034 ***

Comment 3 Jason Dillaman 2017-02-12 17:49:28 UTC
Journal records:

# journal_id: 377a238e1f29
89 {"tag_id":101,"commit_tid":1,"type":7,"entry":"AgEXAAAABwAAAAEAAAAAAAAABwAAAHNuYXAxMDA="}
93 {"tag_id":101,"commit_tid":2,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAADa\/\/\/\/"}
89 {"tag_id":102,"commit_tid":3,"type":7,"entry":"AgEWAAAABwAAAAEAAAAAAAAABgAAAHNuYXA5MA=="}
93 {"tag_id":102,"commit_tid":4,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAADa\/\/\/\/"}
89 {"tag_id":103,"commit_tid":5,"type":7,"entry":"AgEXAAAABwAAAAEAAAAAAAAABwAAAHNuYXAxMDA="}
93 {"tag_id":103,"commit_tid":6,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAADa\/\/\/\/"}
98 {"tag_id":104,"commit_tid":7,"type":10,"entry":"AgEcAAAACgAAAAEAAAAAAAAADAAAAGRhdGFzZXQxMG5ldw=="}
89 {"tag_id":104,"commit_tid":8,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAAAAAAAA"}
86 {"tag_id":105,"commit_tid":9,"type":11,"entry":"AgEUAAAACwAAAAEAAAAAAAAAAAAAgAcAAAA="}
90 {"tag_id":105,"commit_tid":10,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAAAAAAAA"}
87 {"tag_id":106,"commit_tid":11,"type":11,"entry":"AgEUAAAACwAAAAEAAAAAAAAAAAAAAAUAAAA="}
90 {"tag_id":106,"commit_tid":12,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAAAAAAAA"}
87 {"tag_id":107,"commit_tid":13,"type":11,"entry":"AgEUAAAACwAAAAEAAAAAAAAAAAAAgAcAAAA="}
90 {"tag_id":107,"commit_tid":14,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAAAAAAAA"}

The first uncommitted event entry is the request for snap protect (type 7), the second uncommitted event entry records the failure result code of "-ENOSYS" (last four bytes from base64 entry string are 0xDA 0xFF 0xFF 0xFF ---> -38 ---> -ENOSYS).