Created attachment 1163026 [details] RBD Log Description of problem: While reproducing BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1325932 I am hitting a crash, but this time i have enabled Journaling. Version-Release number of selected component (if applicable): ceph version 10.2.1-6.el7cp How reproducible: 2 times If its not getting reproduced easily, repeat the same steps Start the bench-write and run resize in parallel. Steps to Reproduce: 1. Create and Image, take snap, protect it, and take a clone. rbd image 'NEW_CLone': size 2000 GB in 512000 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.1254862ae8944a format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling flags: parent: cephfs_data/NEW@snap1 overlap: 2000 GB journal: 1254862ae8944a mirroring state: disabled 2. Run Resize script and bench-write in parallel. rbd bench-write -p cephfs_data --image NEW_CLone --io-size 1024 --io-pattern rand Actual results: Seeing a Crash Expected results: There should not be a crash Additional info: Logs ----------------------------------------------------------------- -4> 2016-05-31 04:42:02.168457 7fb750ff9700 -1 librbd::AioCompletion: 0x7fb73c09f980 fail: (22) Invalid argument -3> 2016-05-31 04:42:02.168477 7fb750ff9700 -1 librbd::AioCompletion: completed invalid aio_type: 0 -2> 2016-05-31 04:42:02.168482 7fb750ff9700 -1 librbd::journal::Replay: AIO modify op failed: (22) Invalid argument -1> 2016-05-31 04:42:02.168487 7fb750ff9700 -1 librbd::Journal: failed to commit journal event to disk: (22) Invalid argument 0> 2016-05-31 04:42:02.169581 7fb750ff9700 -1 *** Caught signal (Aborted) ** in thread 7fb750ff9700 thread_name:tp_librbd ---------------------------------------------------
Created attachment 1163027 [details] Resize Script
@Tanay: while it shouldn't crash, you really shouldn't be sending IO outside the bounds of the image (e.g. you shrink the image to a point where bench-write is writing outside the image extents).
@jason, this is not a graceful exit and gives a bad user experience. requesting this to be fixed in 2.0. resetting target release.
Upstream merged Jewel PR: https://github.com/ceph/ceph/pull/9611
Marking it as Verified. ceph version 10.2.2-5.el7cp
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html