Bug 1334182

Summary: crash in librbd while when write size is large (~99429910 bytes)
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tanay Ganguly <tganguly>
Component: RBDAssignee: Jason Dillaman <jdillama>
Status: CLOSED ERRATA QA Contact: Tanay Ganguly <tganguly>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.0CC: ceph-eng-bugs, hnallurv, kdreyer, kurs
Target Milestone: rc   
Target Release: 2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-23 19:37:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Crash
none
Script none

Description Tanay Ganguly 2016-05-09 06:22:40 UTC
Created attachment 1155149 [details]
Crash

Description of problem:
While larger object size, seeing a crash

Version-Release number of selected component (if applicable):
10.0.2.1

How reproducible:
Always

Steps to Reproduce:
1. Create a RBD Image enabling all the features.
rbd image 'testing3':
        size 102400 MB in 25600 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.128b238e1f29
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, journaling
        flags: 
        journal: 128b238e1f29
        mirroring state: disabled

2. Write ~90M of data (PFA the script)
3. Seeing a crash

Actual results:
There should not be any crash

Expected results:
Seeing a crash

Additional info:
Log attached.

<rados.Ioctx object at 0x7f1ae13a3520>
107374182400
99429910
librbd/AioCompletion.cc: In function 'void librbd::AioCompletion::fail(CephContext*, int)' thread 7f1aba364700 time 2016-05-08 19:02:46.422238
librbd/AioCompletion.cc: 142: FAILED assert(pending_count == 0)
 ceph version 10.2.0-1.el7cp (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
 1: (()+0x2765b5) [0x7f1ad043d5b5]
 2: (()+0x749b7) [0x7f1ad023b9b7]
 3: (()+0xe8253) [0x7f1ad02af253]
 4: (()+0x721f9) [0x7f1ad02391f9]
 5: (()+0xa2ac4) [0x7f1ad0269ac4]
 6: (()+0x26713e) [0x7f1ad042e13e]
 7: (()+0x268010) [0x7f1ad042f010]
 8: (()+0x7dc5) [0x7f1ae0d10dc5]
 9: (clone()+0x6d) [0x7f1ae033528d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Comment 1 Tanay Ganguly 2016-05-09 06:25:14 UTC
Created attachment 1155151 [details]
Script

Comment 4 Ken Dreyer (Red Hat) 2016-05-10 13:00:05 UTC
From Jason's email today:
> This should be 2.z -- once bz 1331267 merges this error won't
> reproduce but it is still an issue.

Re-targeting to 2.1.

Comment 5 Jason Dillaman 2016-06-12 23:57:53 UTC
Merged, upstream Jewel PR: https://github.com/ceph/ceph/pull/9611

Comment 8 Tanay Ganguly 2016-06-21 11:42:17 UTC
Working Fine.

Marking it as Verified.
ceph version 10.2.2-5.el7cp

Comment 10 errata-xmlrpc 2016-08-23 19:37:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html