Created attachment 1166230 [details] Crash Log Description of problem: Continuous bench-write and disabling of Journal from Master Node Version-Release number of selected component (if applicable): rbd-mirror-10.2.1-12.el7cp.x86_64 How reproducible: Once Steps to Reproduce: 1. Create an Image without enabling Journal 2. Write some data on to it. 3. Enable Journal to resync to Slave Node 4. Start bench-write on the Image, after a while Kill it. Repeat step 4, for 3-4 times. 5. While this is in progress disable Journal from Master Node Actual results: Seeing an Crash in Master Node Expected results: Disable should be graceful Additional info: Log attached ------------------------------------------------------------------------- -2> 2016-06-09 15:19:48.724907 7fc2c86f7700 1 -- 10.70.44.40:0/334889691 <== osd.3 10.70.44.50:6829/133371 16 ==== osd_op_reply(54 journal.136a2ae8944a [call] v0'0 uv1178 ondisk = 0) v7 ==== 140+0+385 (2263775508 0 3014552106) 0x7fc27c001940 con 0x7fc2b0014d70 -1> 2016-06-09 15:19:48.725028 7fc2edbd3d80 5 librbd::Operations: 0x7fc2f8775da0 snap_remove: snap_name=.rbd-mirror.3e563921-8f1c-45bd-bcd9-7fb0b4bfdc9a.c1691508-4630-4524-95e4-e9a8b0b79e3a 0> 2016-06-09 15:19:48.725769 7fc2edbd3d80 -1 *** Caught signal (Aborted) ** in thread 7fc2edbd3d80 thread_name:rbd ceph version 10.2.1-12.el7cp (939056d19a2a523223611ef08194666b41086b03) 1: (()+0x1feafa) [0x7fc2ede06afa] 2: (()+0xf100) [0x7fc2da1bf100] 3: (gsignal()+0x37) [0x7fc2d820c5f7] 4: (abort()+0x148) [0x7fc2d820dce8]
@Tanay: where is the full log? Are several processes sharing the same log file in your setup? It looks like the crash was in the rbd CLI while updating the features, not the rbd-mirror daemon as implied.
(In reply to Jason Dillaman from comment #2) > @Tanay: where is the full log? Are several processes sharing the same log > file in your setup? It looks like the crash was in the rbd CLI while > updating the features, not the rbd-mirror daemon as implied. I shared the full log, no i was not executing anything i waited for bench-write to complete then started disabling. Not sure about RBD CLI crash, the log looks like: -1> 2016-06-10 10:59:19.884312 7f879a61cd80 5 librbd::Operations: 0x7f87a58d10e0 snap_remove: snap_name=.rbd-mirror.3e563921-8f1c-45bd-bcd9-7fb0b4bfdc9a.edaf3ce8-fbfd-4fb9-9f75-1effbd754200 0> 2016-06-10 10:59:19.885214 7f879a61cd80 -1 *** Caught signal (Aborted) ** in thread 7f879a61cd80 thread_name:rbd
@Jason, I am able to reproduce it again with some simpler steps. 1. Create an Image without Journaling enabled ( PFA, i am using the script to create ) This Script i created to replicate the functionality of RBD_Import, it imports a Block Device. 2. Again write some data using bench-write. 3. Let the write complere, and sync begin at Slave Node ( It was some 35% complete) 4. Disable the Journal. After Disabling i am seeing the Crash from Master Node.
Created attachment 1166484 [details] Log and script This is a Tar File
Monti, This defect is for rbd mirroring. It needs to be fixed for 2.0. I am setting the target release as 2.0 and adding this to 2.0 GA tracker bz.
Monti, please change the target release to 2.0. Rules engine is pushing it to 2.1 if I tried changing from 2.1 to 2.0 (comment 7)
@Tanay: just want to explicitly confirm that this was a crash in the rbd CLI, not the rbd-mirror daemon (that is what the logs show) and whether or not re-running the rbd CLI command was successful.
It appears the rbd CLI will continue to crash until the sync is complete unless you delete the image's rbd-mirror snapshot before disabling journaling.
Upstream Jewel PR: https://github.com/ceph/ceph/pull/9654
Harish you need to change the flag ceph-2.Z ? to "" and ceph-2.0 to ?
Marking it as Verified. ceph version 10.2.2-5.el7cp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html