Bug 1343941
Summary: | Hitting a Split-Brain when multiple images are getting synced in parallel | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Tanay Ganguly <tganguly> | ||||||||
Component: | RBD | Assignee: | Jason Dillaman <jdillama> | ||||||||
Status: | CLOSED NOTABUG | QA Contact: | Tanay Ganguly <tganguly> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 2.0 | CC: | ceph-eng-bugs, hnallurv, hyelloji, kurs, mlawrenc, tganguly | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | 2.0 | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2016-06-14 11:41:17 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1343229 | ||||||||||
Attachments: |
|
Created attachment 1165929 [details]
New Resync Log
This is a tar file, please rename to open
Created attachment 1165930 [details]
Older Resync file
This is a tar file, please rename to open
@Tanay: I am having a hard time understanding your test case. Can you provide the exact commands you ran? @Tanay: also, how were these logs generated? The "older" log (which shows the in-progress sync @ 55%) just abruptly ends mid-sync. Did rbd-mirror crash? Moving to CLOSED/NOTABUG for now since the only way to reproduce it was to put the system in an inconsistent state. In such a case, it is expected to see this behavior. If a similar issue appears after retesting BZ #1344274, we can re-evaluate and open a new BZ. |
Created attachment 1165928 [details] Master Node log Description of problem: I am hitting a split brain in Slave Node while more than one image is getting resyn. Version-Release number of selected component (if applicable): ceph version 10.2.1-12.el7cp How reproducible: Hit it once Steps to Reproduce: 1. Created an Image on Master Node, dont enable Journaling 2. Write some 10G data on the Image. 3. Once write complete enable the Journaling feature on Master Node ( Resync starts) 4. Disable journaling on an existing created image ( Before that it was synced with Slave Node ) this is a different image 5. Start bench-write on the Image, write some new data and then stop it. 6. Again enable the Journaling ( Resync starts) rbd feature enable RBD/testing3 journaling --cluster master Actual results: After enabling journaling again i am seeing split brain. Now both the images was trying to get synced Older image which was getting synced stopped at 55% ( refer step 3 ) New sync cried saying split-brain ( refer step 6 ) Expected results: There should not be a split-brain Additional info: Log of both the resync (new and old) from Slave Log of the master node -------------------------------------------------------------------------------- systemctl status -l ceph-rbd-mirror@master ● ceph-rbd-mirror - Ceph rbd mirror daemon Loaded: loaded (/usr/lib/systemd/system/ceph-rbd-mirror@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-06-08 15:14:30 IST; 1h 42min ago Main PID: 86496 (rbd-mirror) CGroup: /system.slice/system-ceph\x2drbd\x2dmirror.slice/ceph-rbd-mirror └─86496 /usr/bin/rbd-mirror -f --cluster master --id master --setuser ceph --setgroup ceph Jun 08 15:14:30 cephqe3.lab.eng.blr.redhat.com systemd[1]: Started Ceph rbd mirror daemon. Jun 08 15:14:30 cephqe3.lab.eng.blr.redhat.com systemd[1]: Starting Ceph rbd mirror daemon... Jun 08 16:29:03 cephqe3.lab.eng.blr.redhat.com rbd-mirror[86496]: 2016-06-08 16:29:03.217415 7f4fa57fa700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f4f7408bb10 handle_get_remote_tag_class: failed to retrieve remote client: (2) No such file or directory Jun 08 16:29:03 cephqe3.lab.eng.blr.redhat.com rbd-mirror[86496]: 2016-06-08 16:29:03.217475 7f4fd569f700 -1 rbd::mirror::ImageReplayer: 0x7f4f7400c650 [1/e923d0ee-37b7-483e-9621-ecb70c545eee] operator(): start failed: (2) No such file or directory Jun 08 16:29:03 cephqe3.lab.eng.blr.redhat.com rbd-mirror[86496]: 2016-06-08 16:29:03.230753 7f4fb7fff700 -1 JournalMetadata: operator(): failed to watch journal(2) No such file or directory Jun 08 16:29:03 cephqe3.lab.eng.blr.redhat.com rbd-mirror[86496]: 2016-06-08 16:29:03.230778 7f4fb7fff700 -1 JournalMetadata: failed to initialize immutable metadata: (2) No such file or directory systemctl status -l ceph-rbd-mirror@slave ● ceph-rbd-mirror - Ceph rbd mirror daemon Loaded: loaded (/usr/lib/systemd/system/ceph-rbd-mirror@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-06-08 10:58:10 UTC; 11min ago Main PID: 678 (rbd-mirror) CGroup: /system.slice/system-ceph\x2drbd\x2dmirror.slice/ceph-rbd-mirror └─678 /usr/bin/rbd-mirror -f --cluster slave --id slave --setuser ceph --setgroup ceph Jun 08 11:07:34 magna003 rbd-mirror[678]: 2016-06-08 11:07:34.512360 7f7baaffd700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f7b740019f0 handle_get_remote_tags: split-brain detected -- skipping image replay Jun 08 11:07:34 magna003 rbd-mirror[678]: 2016-06-08 11:07:34.827056 7f7bdaffd700 -1 rbd::mirror::ImageReplayer: 0x7f7b74004370 [1/696a499b-9cc1-44d5-8e08-2b581ef24aba] operator(): start failed: (17) File exists Jun 08 11:08:05 magna003 rbd-mirror[678]: 2016-06-08 11:08:05.371161 7f7baaffd700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f7b74006240 handle_get_remote_tags: split-brain detected -- skipping image replay Jun 08 11:08:05 magna003 rbd-mirror[678]: 2016-06-08 11:08:05.695272 7f7bdaffd700 -1 rbd::mirror::ImageReplayer: 0x7f7b74004370 [1/696a499b-9cc1-44d5-8e08-2b581ef24aba] operator(): start failed: (17) File exists Jun 08 11:08:49 magna003 rbd-mirror[678]: 2016-06-08 11:08:49.085661 7f7baaffd700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f7b74003160 handle_get_remote_tags: split-brain detected -- skipping image replay Jun 08 11:08:49 magna003 rbd-mirror[678]: 2016-06-08 11:08:49.549062 7f7bdaffd700 -1 rbd::mirror::ImageReplayer: 0x7f7b74004370 [1/696a499b-9cc1-44d5-8e08-2b581ef24aba] operator(): start failed: (17) File exists Jun 08 11:09:21 magna003 rbd-mirror[678]: 2016-06-08 11:09:21.401637 7f7baaffd700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f7b74003160 handle_get_remote_tags: split-brain detected -- skipping image replay Jun 08 11:09:30 magna003 rbd-mirror[678]: 2016-06-08 11:09:30.704921 7f7bdaffd700 -1 rbd::mirror::ImageReplayer: 0x7f7b74004370 [1/696a499b-9cc1-44d5-8e08-2b581ef24aba] operator(): start failed: (17) File exists Jun 08 11:09:48 magna003 rbd-mirror[678]: 2016-06-08 11:09:48.849965 7f7baaffd700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7f7b74005d30 handle_get_remote_tags: split-brain detected -- skipping image replay Jun 08 11:09:49 magna003 rbd-mirror[678]: 2016-06-08 11:09:49.190766 7f7bdaffd700 -1 rbd::mirror::ImageReplayer: 0x7f7b74004370 [1/696a499b-9cc1-44d5-8e08-2b581ef24aba] operator(): start failed: (17) File exists