Created attachment 1165928 [details] Master Node log Description of problem: I am hitting a split brain in Slave Node while more than one image is getting resyn. Version-Release number of selected component (if applicable): ceph version 10.2.1-12.el7cp How reproducible: Hit it once Steps to Reproduce: 1. Created an Image on Master Node, dont enable Journaling 2. Write some 10G data on the Image. 3. Once write complete enable the Journaling feature on Master Node ( Resync starts) 4. Disable journaling on an existing created image ( Before that it was synced with Slave Node ) this is a different image 5. Start bench-write on the Image, write some new data and then stop it. 6. Again enable the Journaling ( Resync starts) rbd feature enable RBD/testing3 journaling --cluster master Actual results: After enabling journaling again i am seeing split brain. Now both the images was trying to get synced Older image which was getting synced stopped at 55% ( refer step 3 ) New sync cried saying split-brain ( refer step 6 ) Expected results: There should not be a split-brain Additional info: Log of both the resync (new and old) from Slave Log of the master node -------------------------------------------------------------------------------- systemctl status -l ceph-rbd-mirror@master ● ceph-rbd-mirror - Ceph rbd mirror daemon Loaded: loaded (/usr/lib/systemd/system/ceph-rbd-mirror@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-06-08 15:14:30 IST; 1h 42min ago Main PID: 86496 (rbd-mirror) CGroup: /system.slice/system-ceph\x2drbd\x2dmirror.slice/ceph-rbd-mirror └─86496 /usr/bin/rbd-mirror -f --cluster master --id master --setuser ceph --setgroup ceph Jun 08 15:14:30 cephqe3.lab.eng.blr.redhat.com systemd[1]: Started Ceph rbd mirror daemon. Jun 08 15:14:30 cephqe3.lab.eng.blr.redhat.com systemd[1]: Starting Ceph rbd mirror daemon... Jun 08 16:29:03 cephqe3.lab.eng.blr.redhat.com rbd-mirror[86496]: 2016-06-08 16:29:03.217415 7f4fa57fa700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f4f7408bb10 handle_get_remote_tag_class: failed to retrieve remote client: (2) No such file or directory Jun 08 16:29:03 cephqe3.lab.eng.blr.redhat.com rbd-mirror[86496]: 2016-06-08 16:29:03.217475 7f4fd569f700 -1 rbd::mirror::ImageReplayer: 0x7f4f7400c650 [1/e923d0ee-37b7-483e-9621-ecb70c545eee] operator(): start failed: (2) No such file or directory Jun 08 16:29:03 cephqe3.lab.eng.blr.redhat.com rbd-mirror[86496]: 2016-06-08 16:29:03.230753 7f4fb7fff700 -1 JournalMetadata: operator(): failed to watch journal(2) No such file or directory Jun 08 16:29:03 cephqe3.lab.eng.blr.redhat.com rbd-mirror[86496]: 2016-06-08 16:29:03.230778 7f4fb7fff700 -1 JournalMetadata: failed to initialize immutable metadata: (2) No such file or directory systemctl status -l ceph-rbd-mirror@slave ● ceph-rbd-mirror - Ceph rbd mirror daemon Loaded: loaded (/usr/lib/systemd/system/ceph-rbd-mirror@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-06-08 10:58:10 UTC; 11min ago Main PID: 678 (rbd-mirror) CGroup: /system.slice/system-ceph\x2drbd\x2dmirror.slice/ceph-rbd-mirror └─678 /usr/bin/rbd-mirror -f --cluster slave --id slave --setuser ceph --setgroup ceph Jun 08 11:07:34 magna003 rbd-mirror[678]: 2016-06-08 11:07:34.512360 7f7baaffd700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f7b740019f0 handle_get_remote_tags: split-brain detected -- skipping image replay Jun 08 11:07:34 magna003 rbd-mirror[678]: 2016-06-08 11:07:34.827056 7f7bdaffd700 -1 rbd::mirror::ImageReplayer: 0x7f7b74004370 [1/696a499b-9cc1-44d5-8e08-2b581ef24aba] operator(): start failed: (17) File exists Jun 08 11:08:05 magna003 rbd-mirror[678]: 2016-06-08 11:08:05.371161 7f7baaffd700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f7b74006240 handle_get_remote_tags: split-brain detected -- skipping image replay Jun 08 11:08:05 magna003 rbd-mirror[678]: 2016-06-08 11:08:05.695272 7f7bdaffd700 -1 rbd::mirror::ImageReplayer: 0x7f7b74004370 [1/696a499b-9cc1-44d5-8e08-2b581ef24aba] operator(): start failed: (17) File exists Jun 08 11:08:49 magna003 rbd-mirror[678]: 2016-06-08 11:08:49.085661 7f7baaffd700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f7b74003160 handle_get_remote_tags: split-brain detected -- skipping image replay Jun 08 11:08:49 magna003 rbd-mirror[678]: 2016-06-08 11:08:49.549062 7f7bdaffd700 -1 rbd::mirror::ImageReplayer: 0x7f7b74004370 [1/696a499b-9cc1-44d5-8e08-2b581ef24aba] operator(): start failed: (17) File exists Jun 08 11:09:21 magna003 rbd-mirror[678]: 2016-06-08 11:09:21.401637 7f7baaffd700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f7b74003160 handle_get_remote_tags: split-brain detected -- skipping image replay Jun 08 11:09:30 magna003 rbd-mirror[678]: 2016-06-08 11:09:30.704921 7f7bdaffd700 -1 rbd::mirror::ImageReplayer: 0x7f7b74004370 [1/696a499b-9cc1-44d5-8e08-2b581ef24aba] operator(): start failed: (17) File exists Jun 08 11:09:48 magna003 rbd-mirror[678]: 2016-06-08 11:09:48.849965 7f7baaffd700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f7b74005d30 handle_get_remote_tags: split-brain detected -- skipping image replay Jun 08 11:09:49 magna003 rbd-mirror[678]: 2016-06-08 11:09:49.190766 7f7bdaffd700 -1 rbd::mirror::ImageReplayer: 0x7f7b74004370 [1/696a499b-9cc1-44d5-8e08-2b581ef24aba] operator(): start failed: (17) File exists
Created attachment 1165929 [details] New Resync Log This is a tar file, please rename to open
Created attachment 1165930 [details] Older Resync file This is a tar file, please rename to open
@Tanay: I am having a hard time understanding your test case. Can you provide the exact commands you ran?
@Tanay: also, how were these logs generated? The "older" log (which shows the in-progress sync @ 55%) just abruptly ends mid-sync. Did rbd-mirror crash?
Moving to CLOSED/NOTABUG for now since the only way to reproduce it was to put the system in an inconsistent state. In such a case, it is expected to see this behavior. If a similar issue appears after retesting BZ #1344274, we can re-evaluate and open a new BZ.