Description of problem: ======================= Did rename of multiple images on primary site, all rename were synced to secondary site except one. In that case, rename was not synced to secondary site and on secondary sites description say 'failed to commit journal event' Version-Release number of selected component (if applicable): ============================================================== 10.2.5-13.el7cp.x86_64 How reproducible: ================= only once/intermittent Steps to Reproduce: =================== 1. had a ceph cluster where one site is primary for mirrorring and 2 sites are secondary.(each site have one MON, 3 OSD and 1rbd-mirror node) 2. Enabled pool level mirroring on pool data1 3. created few images in pool data1. all images were synced to both secondaries 4. rename all those images on primary site. Actual results: =============== only one rename was not synced to seconary site primary site:- ------------- rename data1/dataset10 to data1/dataset10new secondary site -------------- [root@magna099 ubuntu]# rbd mirror image status data1/dataset10 --cluster slave2 dataset10: global_id: 1aefcc7a-1f08-40be-9073-2715d49bdc9f state: up+error description: failed to commit journal event last_update: 2017-02-09 19:53:34 [root@magna099 ubuntu]# rbd ls data1 --cluster slave2 | grep 10 dataset10 dataset101 dataset102 [root@magna100 ubuntu]# rbd mirror image status data1/dataset10new --cluster slave1 rbd: error opening image dataset10new: (2) No such file or directory [root@magna100 ubuntu]# rbd mirror image status data1/dataset10 --cluster slave1 dataset10: global_id: 1aefcc7a-1f08-40be-9073-2715d49bdc9f state: up+error description: failed to commit journal event last_update: 2017-02-09 19:49:56 Expected results: ================= rename should sync to secondary site Additional info:
Issue occurred when a "snap protect" was used against an image that did not support the layering feature. This recorded an error in the journal which resulted in a split-brain as expected. *** This bug has been marked as a duplicate of bug 1365034 ***
Journal records: # journal_id: 377a238e1f29 89 {"tag_id":101,"commit_tid":1,"type":7,"entry":"AgEXAAAABwAAAAEAAAAAAAAABwAAAHNuYXAxMDA="} 93 {"tag_id":101,"commit_tid":2,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAADa\/\/\/\/"} 89 {"tag_id":102,"commit_tid":3,"type":7,"entry":"AgEWAAAABwAAAAEAAAAAAAAABgAAAHNuYXA5MA=="} 93 {"tag_id":102,"commit_tid":4,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAADa\/\/\/\/"} 89 {"tag_id":103,"commit_tid":5,"type":7,"entry":"AgEXAAAABwAAAAEAAAAAAAAABwAAAHNuYXAxMDA="} 93 {"tag_id":103,"commit_tid":6,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAADa\/\/\/\/"} 98 {"tag_id":104,"commit_tid":7,"type":10,"entry":"AgEcAAAACgAAAAEAAAAAAAAADAAAAGRhdGFzZXQxMG5ldw=="} 89 {"tag_id":104,"commit_tid":8,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAAAAAAAA"} 86 {"tag_id":105,"commit_tid":9,"type":11,"entry":"AgEUAAAACwAAAAEAAAAAAAAAAAAAgAcAAAA="} 90 {"tag_id":105,"commit_tid":10,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAAAAAAAA"} 87 {"tag_id":106,"commit_tid":11,"type":11,"entry":"AgEUAAAACwAAAAEAAAAAAAAAAAAAAAUAAAA="} 90 {"tag_id":106,"commit_tid":12,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAAAAAAAA"} 87 {"tag_id":107,"commit_tid":13,"type":11,"entry":"AgEUAAAACwAAAAEAAAAAAAAAAAAAgAcAAAA="} 90 {"tag_id":107,"commit_tid":14,"type":3,"entry":"AgEYAAAAAwAAAAEAAAAAAAAAAQAAAAAAAAAAAAAA"} The first uncommitted event entry is the request for snap protect (type 7), the second uncommitted event entry records the failure result code of "-ENOSYS" (last four bytes from base64 entry string are 0xDA 0xFF 0xFF 0xFF ---> -38 ---> -ENOSYS).