Bug 1340998 - Seeing a BT while writing and re-sizing on a RBD Image in parallel, with Journaling Enabled
Summary: Seeing a BT while writing and re-sizing on a RBD Image in parallel, with Jour...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD
Version: 2.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 2.0
Assignee: Jason Dillaman
QA Contact: Tanay Ganguly
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-31 05:11 UTC by Tanay Ganguly
Modified: 2017-07-30 15:26 UTC (History)
6 users (show)

Fixed In Version: RHEL: ceph-10.2.2-1.el7cp Ubuntu: ceph_10.2.2-3redhat1xenial
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-23 19:40:16 UTC
Embargoed:


Attachments (Terms of Use)
RBD Log (141.28 KB, text/plain)
2016-05-31 05:11 UTC, Tanay Ganguly
no flags Details
Resize Script (486 bytes, text/x-python)
2016-05-31 05:12 UTC, Tanay Ganguly
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 15791 0 None None None 2016-06-06 16:27:30 UTC
Ceph Project Bug Tracker 16077 0 None None None 2016-06-06 16:34:09 UTC
Red Hat Product Errata RHBA-2016:1755 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 23:23:52 UTC

Description Tanay Ganguly 2016-05-31 05:11:28 UTC
Created attachment 1163026 [details]
RBD Log

Description of problem:
While reproducing BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1325932
I am hitting a crash, but this time i have enabled Journaling.

Version-Release number of selected component (if applicable):
ceph version 10.2.1-6.el7cp

How reproducible:
2 times

If its not getting reproduced easily, repeat the same steps
Start the bench-write and run resize in parallel.

Steps to Reproduce:
1. Create and Image, take snap, protect it, and take a clone.
rbd image 'NEW_CLone':
        size 2000 GB in 512000 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.1254862ae8944a
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
        flags: 
        parent: cephfs_data/NEW@snap1
        overlap: 2000 GB
        journal: 1254862ae8944a
        mirroring state: disabled

2. Run Resize script and bench-write in parallel.
rbd bench-write -p cephfs_data --image NEW_CLone --io-size 1024 --io-pattern rand

Actual results:
Seeing a Crash

Expected results:
There should not be a crash

Additional info:
Logs

-----------------------------------------------------------------

    -4> 2016-05-31 04:42:02.168457 7fb750ff9700 -1 librbd::AioCompletion: 0x7fb73c09f980 fail: (22) Invalid argument
    -3> 2016-05-31 04:42:02.168477 7fb750ff9700 -1 librbd::AioCompletion: completed invalid aio_type: 0
    -2> 2016-05-31 04:42:02.168482 7fb750ff9700 -1 librbd::journal::Replay: AIO modify op failed: (22) Invalid argument
    -1> 2016-05-31 04:42:02.168487 7fb750ff9700 -1 librbd::Journal: failed to commit journal event to disk: (22) Invalid argument
     0> 2016-05-31 04:42:02.169581 7fb750ff9700 -1 *** Caught signal (Aborted) **
 in thread 7fb750ff9700 thread_name:tp_librbd


---------------------------------------------------

Comment 2 Tanay Ganguly 2016-05-31 05:12:14 UTC
Created attachment 1163027 [details]
Resize Script

Comment 3 Jason Dillaman 2016-05-31 11:34:20 UTC
@Tanay: while it shouldn't crash, you really shouldn't be sending IO outside the bounds of the image (e.g. you shrink the image to a point where bench-write is writing outside the image extents).

Comment 4 Harish NV Rao 2016-05-31 12:48:15 UTC
@jason, this is not a graceful exit and gives a bad user experience. requesting this to be fixed in 2.0. resetting target release.

Comment 9 Jason Dillaman 2016-06-12 23:54:01 UTC
Upstream merged Jewel PR: https://github.com/ceph/ceph/pull/9611

Comment 13 Tanay Ganguly 2016-06-28 11:31:56 UTC
Marking it as Verified.

ceph version 10.2.2-5.el7cp

Comment 16 errata-xmlrpc 2016-08-23 19:40:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html


Note You need to log in before you can comment on or make changes to this bug.