| Summary: | [rbd-mirror] : image is shown as non-primary image on both cluster and unable to write to image after promotion/demotion of image on slave cluster(demotion/promotion on master cluster) | ||
|---|---|---|---|
| Product: | Red Hat Ceph Storage | Reporter: | Rachana Patel <racpatel> |
| Component: | Documentation | Assignee: | Bara Ancincova <bancinco> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Rachana Patel <racpatel> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 2.1 | CC: | ceph-eng-bugs, hnallurv, jdillama, kdreyer, kurs, uboppana |
| Target Milestone: | rc | ||
| Target Release: | 2.1 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-28 09:37:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Discussed at the program meeting... there is no code to fix here, it is a documentation issue. Doc text is as per comment 12, hence moving to verified |
Description of problem: ======================== Demote image from master cluster and promote it from slave/remote cluster. Write to image using write-bench and then again demoted image.(command status was successful) but unable to promote image from master cluster. After 2-3 unsucceful try of promoting from master and demoting from slave cluster, it is showing image as primary on both cluster and unable to do write to image. Version-Release number of selected component (if applicable): ================================================================ 10.2.3-8.el7cp.x86_64 How reproducible: ================= Haven't tried. will update this later Steps to Reproduce: =================== 1. created 2 cluster- Master and slave- with 3 MON and 3 OSD node(9 OSD total).each cluster has one rbd-client. 2. setup rbd-mirroring between cluster 3. created image on master node . write to image and allowed it to sync to slave. 4. demoted image from matser cluster [root@magna025 ubuntu]# rbd mirror image demote pool1/bugre1 --cluster master Image demoted to non-primary [root@magna025 ubuntu]# rbd mirror image status pool1/bugre1 --cluster master bugre1: global_id: 2e0b038f-f863-4cc1-a6a0-36a0f7ded729 state: up+stopped description: remote image is non-primary or local image is primary last_update: 2016-10-24 19:02:55 5. promoted same image from slave cluster [root@magna078 ubuntu]# rbd mirror image promote pool1/bugre1 --cluster slave Image promoted to primary [root@magna078 ubuntu]# rbd mirror image status pool1/bugre1 --cluster slave bugre1: global_id: 2e0b038f-f863-4cc1-a6a0-36a0f7ded729 state: up+stopped description: remote image is non-primary or local image is primary last_update: 2016-10-24 19:04:28 6. write to image from slave cluster [root@magna078 ubuntu]# rbd bench-write pool1/bugre1 --io-total 100M --cluster slave bench-write io_size 4096 io_threads 16 bytes 104857600 pattern sequential SEC OPS OPS/SEC BYTES/SEC 1 1234 1247.82 5111090.57 2 1448 720.63 2951705.20 on master :- bugre1: global_id: 2e0b038f-f863-4cc1-a6a0-36a0f7ded729 state: up+replaying description: replaying, master_position=[object_number=2, tag_tid=4, entry_tid=2182], mirror_position=[object_number=3, tag_tid=4, entry_tid=63], entries_behind_master=2119 last_update: 2016-10-24 19:08:26 7. Demote image from slave cluster [root@magna078 ubuntu]# rbd mirror image demote pool1/bugre1 --cluster slave Image demoted to non-primary 8. promote same image from master cluster [root@magna025 ubuntu]# rbd mirror image promote pool1/bugre1 --cluster master rbd: error promoting image to primary 2016-10-24 19:10:21.043848 7faedb03bd80 -1 librbd: image is still primary within a remote cluster --> On failure checked status on slave cluster and redo demotion [root@magna078 ubuntu]# rbd mirror image status pool1/bugre1 --cluster slave bugre1: global_id: 2e0b038f-f863-4cc1-a6a0-36a0f7ded729 state: up+stopped description: remote image is non-primary or local image is primary last_update: 2016-10-24 19:10:28 [root@magna078 ubuntu]# rbd mirror image demote pool1/bugre1 --cluster slave rbd: error demoting image to non-primary 2016-10-24 19:10:52.691154 7f85d50fed80 -1 librbd: image is not currently the primary --> it failed..try twice again. --> tried to promote image from master cluster and that also failed(tried twice) [root@magna025 ubuntu]# rbd mirror image promote pool1/bugre1 --cluster master rbd: error promoting image to primary 2016-10-24 19:10:25.701081 7f614bf13d80 -1 librbd: image is still primary within a remote cluster [root@magna025 ubuntu]# rbd mirror image promote pool1/bugre1 --cluster master rbd: error promoting image to primary2016-10-24 19:11:08.763724 7f4628b2ad80 -1 librbd: image is still primary within a remote cluster Actual results: ================ after few min check status on both cluster and both is showing image as primary and writing to image is failing from both cluster [root@magna078 ubuntu]# rbd mirror image status pool1/bugre1 --cluster slave bugre1: global_id: 2e0b038f-f863-4cc1-a6a0-36a0f7ded729 state: up+stopped description: remote image is non-primary or local image is primary last_update: 2016-10-24 19:15:58 [root@magna025 ubuntu]# rbd mirror image status pool1/bugre1 --cluster master bugre1: global_id: 2e0b038f-f863-4cc1-a6a0-36a0f7ded729 state: up+stopped description: remote image is non-primary or local image is primary last_update: 2016-10-24 19:37:25 master cluster ============= [root@magna025 ubuntu]# rbd bench-write pool1/bugre1 --io-total 100M --cluster master bench-write io_size 4096 io_threads 16 bytes 104857600 pattern sequential SEC OPS OPS/SEC BYTES/SEC 2016-10-24 19:55:29.448677 7f7b8affd700 -1 librbd::journal::StandardPolicy: local image not promoted 2016-10-24 19:55:29.448686 7f7b8affd700 -1 librbd::exclusive_lock::AcquireRequest: failed to allocate journal tag: (1) Operation not permitted 2016-10-24 19:55:29.489704 7f7b8affd700 -1 librbd::ExclusiveLock: failed to acquire exclusive lock:(1) Operation not permitted ^C [root@magna025 ubuntu]# systemctl status -l ceph-rbd-mirror@master ● ceph-rbd-mirror - Ceph rbd mirror daemon Loaded: loaded (/usr/lib/systemd/system/ceph-rbd-mirror@.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2016-10-21 18:09:38 UTC; 3 days ago Main PID: 11347 (rbd-mirror) CGroup: /system.slice/system-ceph\x2drbd\x2dmirror.slice/ceph-rbd-mirror └─11347 /usr/bin/rbd-mirror -f --cluster master --id master --setuser ceph --setgroup ceph Oct 24 19:55:24 magna025 rbd-mirror[11347]: 2016-10-24 19:55:24.909158 7fd2cbfff700 -1 rbd::mirror::ImageReplayer: 0x7fd24c0939d0 [3/0ecb8cd7-8e05-4169-a947-955a364acae4] start: already running Oct 24 19:55:24 magna025 rbd-mirror[11347]: 2016-10-24 19:55:24.909161 7fd2cbfff700 -1 rbd::mirror::ImageReplayer: 0x7fd24c094f10 [3/d73bea26-e095-4cc2-b48c-63fda1773123] start: already running Oct 24 19:55:24 magna025 rbd-mirror[11347]: 2016-10-24 19:55:24.909165 7fd2cbfff700 -1 rbd::mirror::ImageReplayer: 0x7fd24c096450 [3/e3dc42e6-99ed-44b4-8de2-5a195ff897f5] start: already running Oct 24 19:55:24 magna025 rbd-mirror[11347]: 2016-10-24 19:55:24.909168 7fd2cbfff700 -1 rbd::mirror::ImageReplayer: 0x7fd24c097990 [3/1e1d4355-7655-4755-8219-de1fd6b70db1] start: already running Oct 24 19:55:24 magna025 rbd-mirror[11347]: 2016-10-24 19:55:24.909171 7fd2cbfff700 -1 rbd::mirror::ImageReplayer: 0x7fd24c098ed0 [3/d98a3b15-820b-4b57-9d8f-51c36be8cdf8] start: already running Oct 24 19:55:24 magna025 rbd-mirror[11347]: 2016-10-24 19:55:24.909174 7fd2cbfff700 -1 rbd::mirror::ImageReplayer: 0x7fd24c09b950 [3/b422e2f9-1323-4c7b-9318-d041381b8427] start: already running Oct 24 19:55:24 magna025 rbd-mirror[11347]: 2016-10-24 19:55:24.909177 7fd2cbfff700 -1 rbd::mirror::ImageReplayer: 0x7fd24c09ce90 [3/fb51b78f-c884-4287-ae0b-7b4791c1150f] start: already running Oct 24 19:55:24 magna025 rbd-mirror[11347]: 2016-10-24 19:55:24.909180 7fd2cbfff700 -1 rbd::mirror::ImageReplayer: 0x7fd24c08a460 [3/2e0b038f-f863-4cc1-a6a0-36a0f7ded729] start: already running Oct 24 19:55:24 magna025 rbd-mirror[11347]: 2016-10-24 19:55:24.909183 7fd2cbfff700 -1 rbd::mirror::ImageReplayer: 0x7fd24c0a2370 [3/faf4090f-2fff-4552-b8c7-52b7f1886e3f] start: already running Oct 24 19:55:24 magna025 rbd-mirror[11347]: 2016-10-24 19:55:24.909187 7fd2cbfff700 -1 rbd::mirror::ImageReplayer: 0x7fd24c0a38b0 [3/c69ef283-987e-4712-841a-3d554e435349] start: already running slave cluster: ============== [root@magna078 ubuntu]# rbd bench-write pool1/bugre1 --io-total 100M --cluster slave bench-write io_size 4096 io_threads 16 bytes 104857600 pattern sequential SEC OPS OPS/SEC BYTES/SEC 2016-10-24 19:54:54.810612 7fa0ed6f9700 -1 librbd::journal::StandardPolicy: local image not promoted 2016-10-24 19:54:54.810624 7fa0ed6f9700 -1 librbd::exclusive_lock::AcquireRequest: failed to allocate journal tag: (1) Operation not permitted 2016-10-24 19:54:54.839937 7fa0ed6f9700 -1 librbd::ExclusiveLock: failed to acquire exclusive lock:(1) Operation not permitted ^C [root@magna078 ubuntu]# systemctl status -l ceph-rbd-mirror@slave ● ceph-rbd-mirror - Ceph rbd mirror daemon Loaded: loaded (/usr/lib/systemd/system/ceph-rbd-mirror@.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2016-10-21 18:44:27 UTC; 3 days ago Main PID: 11803 (rbd-mirror) CGroup: /system.slice/system-ceph\x2drbd\x2dmirror.slice/ceph-rbd-mirror └─11803 /usr/bin/rbd-mirror -f --cluster slave --id slave --setuser ceph --setgroup ceph Oct 24 18:47:05 magna078 rbd-mirror[11803]: 2016-10-24 18:47:05.795258 7fcebcf90700 -1 rbd::mirror::image_replayer::CreateImageRequest: 0x7fce000bca90 handle_create_image: failed to create local image: (17) File exists Oct 24 18:47:05 magna078 rbd-mirror[11803]: 2016-10-24 18:47:05.795264 7fcebcf90700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7fceac02af30 handle_create_local_image: failed to create local image: (17) File exists Oct 24 18:47:05 magna078 rbd-mirror[11803]: 2016-10-24 18:47:05.797140 7fcebcf90700 -1 rbd::mirror::ImageReplayer: 0x7fce38020d20 [3/2e0b038f-f863-4cc1-a6a0-36a0f7ded729] operator(): start failed: (17) File exists Oct 24 19:02:55 magna078 rbd-mirror[11803]: 2016-10-24 19:02:55.426754 7fce6e7fc700 -1 rbd::mirror::image_sync::ImageCopyRequest: 0x7fce1c10e2d0 compute_snap_map: failed to locate snapshot: .rbd-mirror.0f454219-dd15-49ba-968a-f822833d46e8.db782658-bd31-443c-b95e-5ab40d38831e Oct 24 19:02:55 magna078 rbd-mirror[11803]: 2016-10-24 19:02:55.426765 7fce6e7fc700 -1 rbd::mirror::ImageSync: 0x7fcdd8105c10 handle_copy_image: failed to copy image: (2) No such file or directory Oct 24 19:02:55 magna078 rbd-mirror[11803]: 2016-10-24 19:02:55.426773 7fce6e7fc700 -1 rbd::mirror::image_replayer::BootstrapRequest: 0x7fceac055ee0 handle_image_sync: failed to sync remote image: (2) No such file or directory Oct 24 19:02:55 magna078 rbd-mirror[11803]: 2016-10-24 19:02:55.434806 7fcebcf90700 -1 rbd::mirror::ImageReplayer: 0x7fce38020d20 [3/2e0b038f-f863-4cc1-a6a0-36a0f7ded729] operator(): start failed: (2) No such file or directory Oct 24 19:19:05 magna078 rbd-mirror[11803]: 2016-10-24 19:19:05.905403 7fcebcf90700 -1 rbd::mirror::ImageReplayer: 0x7fce38020d20 [3/2e0b038f-f863-4cc1-a6a0-36a0f7ded729] start: already running Oct 24 19:21:05 magna078 rbd-mirror[11803]: 2016-10-24 19:21:05.922989 7fcebcf90700 -1 rbd::mirror::ImageReplayer: 0x7fce38020d20 [3/2e0b038f-f863-4cc1-a6a0-36a0f7ded729] start: already running Oct 24 19:45:06 magna078 rbd-mirror[11803]: 2016-10-24 19:45:06.060784 7fcebcf90700 -1 rbd::mirror::ImageReplayer: 0x7fce38020d20 [3/2e0b038f-f863-4cc1-a6a0-36a0f7ded729] start: already running Expected results: ================ - As mentioned in step 7 - demotion was successful so it should not show image as primary - should not show same status on both cluster (claiming self primary or other image as non primary) - User should be able to write on image.(if it is primary) Additional info: