Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1344274 - crash while bench-write and disabling Journal in parallel
crash while bench-write and disabling Journal in parallel
Status: CLOSED ERRATA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RBD (Show other bugs)
2.0
x86_64 Linux
unspecified Severity urgent
: rc
: 2.0
Assigned To: Jason Dillaman
Tanay Ganguly
:
Depends On:
Blocks: 1343229
  Show dependency treegraph
 
Reported: 2016-06-09 06:01 EDT by Tanay Ganguly
Modified: 2017-07-31 16:59 EDT (History)
5 users (show)

See Also:
Fixed In Version: ceph-10.2.2-1.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-08-23 15:41:05 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Crash Log (86.55 KB, text/plain)
2016-06-09 06:01 EDT, Tanay Ganguly
no flags Details
Log and script (146.95 KB, application/x-gzip)
2016-06-10 01:55 EDT, Tanay Ganguly
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 16235 None None None 2016-06-10 13:25 EDT
Red Hat Product Errata RHBA-2016:1755 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 19:23:52 EDT

  None (edit)
Description Tanay Ganguly 2016-06-09 06:01:32 EDT
Created attachment 1166230 [details]
Crash Log

Description of problem:
Continuous bench-write and disabling of Journal from Master Node

Version-Release number of selected component (if applicable):
rbd-mirror-10.2.1-12.el7cp.x86_64

How reproducible:
Once

Steps to Reproduce:
1. Create an Image without enabling Journal
2. Write some data on to it.
3. Enable Journal to resync to Slave Node
4. Start bench-write on the Image, after a while Kill it.
 Repeat step 4, for 3-4 times.
5. While this is in progress disable Journal from Master Node

Actual results:
Seeing an Crash in Master Node

Expected results:
Disable should be graceful

Additional info:
Log attached

-------------------------------------------------------------------------
    -2> 2016-06-09 15:19:48.724907 7fc2c86f7700  1 -- 10.70.44.40:0/334889691 <== osd.3 10.70.44.50:6829/133371 16 ==== osd_op_reply(54 journal.136a2ae8944a [call] v0'0 uv1178 ondisk = 0) v7 ==== 140+0+385 (2263775508 0 3014552106) 0x7fc27c001940 con 0x7fc2b0014d70
    -1> 2016-06-09 15:19:48.725028 7fc2edbd3d80  5 librbd::Operations: 0x7fc2f8775da0 snap_remove: snap_name=.rbd-mirror.3e563921-8f1c-45bd-bcd9-7fb0b4bfdc9a.c1691508-4630-4524-95e4-e9a8b0b79e3a
     0> 2016-06-09 15:19:48.725769 7fc2edbd3d80 -1 *** Caught signal (Aborted) **
 in thread 7fc2edbd3d80 thread_name:rbd

 ceph version 10.2.1-12.el7cp (939056d19a2a523223611ef08194666b41086b03)
 1: (()+0x1feafa) [0x7fc2ede06afa]
 2: (()+0xf100) [0x7fc2da1bf100]
 3: (gsignal()+0x37) [0x7fc2d820c5f7]
 4: (abort()+0x148) [0x7fc2d820dce8]
Comment 2 Jason Dillaman 2016-06-09 08:12:55 EDT
@Tanay: where is the full log? Are several processes sharing the same log file in your setup?  It looks like the crash was in the rbd CLI while updating the features, not the rbd-mirror daemon as implied.
Comment 3 Tanay Ganguly 2016-06-10 01:48:41 EDT
(In reply to Jason Dillaman from comment #2)
> @Tanay: where is the full log? Are several processes sharing the same log
> file in your setup?  It looks like the crash was in the rbd CLI while
> updating the features, not the rbd-mirror daemon as implied.


I shared the full log, no i was not executing anything i waited for bench-write to complete then started disabling. 

Not sure about RBD CLI crash, the log looks like:

    -1> 2016-06-10 10:59:19.884312 7f879a61cd80  5 librbd::Operations: 0x7f87a58d10e0 snap_remove: snap_name=.rbd-mirror.3e563921-8f1c-45bd-bcd9-7fb0b4bfdc9a.edaf3ce8-fbfd-4fb9-9f75-1effbd754200
     0> 2016-06-10 10:59:19.885214 7f879a61cd80 -1 *** Caught signal (Aborted) **
 in thread 7f879a61cd80 thread_name:rbd
Comment 4 Tanay Ganguly 2016-06-10 01:51:23 EDT
@Jason,

I am able to reproduce it again with some simpler steps.

1. Create an Image without Journaling enabled ( PFA, i am using the script to create )
This Script i created to replicate the functionality of RBD_Import, it imports a Block Device.

2. Again write some data using bench-write.
3. Let the write complere, and sync begin at Slave Node ( It was some 35% complete)
4. Disable the Journal.


After Disabling i am seeing the Crash from Master Node.
Comment 5 Tanay Ganguly 2016-06-10 01:55 EDT
Created attachment 1166484 [details]
Log and script

This is a Tar File
Comment 6 Harish NV Rao 2016-06-10 02:43:14 EDT
Monti,
This defect is for rbd mirroring. It needs to be fixed for 2.0. I am setting the target release as 2.0 and adding this to 2.0 GA tracker bz.
Comment 8 Harish NV Rao 2016-06-10 10:43:18 EDT
Monti, please change the target release to 2.0. Rules engine is pushing it to 2.1 if I tried changing from 2.1 to 2.0 (comment 7)
Comment 9 Jason Dillaman 2016-06-10 10:50:06 EDT
@Tanay: just want to explicitly confirm that this was a crash in the rbd CLI, not the rbd-mirror daemon (that is what the logs show) and whether or not re-running the rbd CLI command was successful.
Comment 10 Jason Dillaman 2016-06-10 13:26:33 EDT
It appears the rbd CLI will continue to crash until the sync is complete unless you delete the image's rbd-mirror snapshot before disabling journaling.
Comment 11 Jason Dillaman 2016-06-12 15:28:16 EDT
Upstream Jewel PR: https://github.com/ceph/ceph/pull/9654
Comment 12 Gregory Meno 2016-06-14 14:14:42 EDT
Harish you need to change the flag ceph-2.Z ? to "" and ceph-2.0 to ?
Comment 14 Tanay Ganguly 2016-06-28 06:48:15 EDT
Marking it as Verified.

ceph version 10.2.2-5.el7cp
Comment 16 errata-xmlrpc 2016-08-23 15:41:05 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html

Note You need to log in before you can comment on or make changes to this bug.