Bug 1253486 - ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
ffsb osd thrash test - osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of ...
Status: ON_QA
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RBD (Show other bugs)
1.3.0
Unspecified Unspecified
medium Severity medium
: rc
: 3.1
Assigned To: Jason Dillaman
Vasu Kulkarni
: TestOnly
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-13 15:42 EDT by Vasu Kulkarni
Modified: 2017-11-06 12:39 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 10889 None None None Never
Ceph Project Bug Tracker 12691 None None None Never
Ceph Project Bug Tracker 9779 None None None Never

  None (edit)
Description Vasu Kulkarni 2015-08-13 15:42:48 EDT
Description of problem:

I think this is a Z-stream kernel which should be now on CDN
Linux magna079 3.10.0-229.9.1.el7.x86_64 #1 SMP Sat Jul 4 17:21:43 EDT 2015 x86_64 x86_64 x86_64 GNU/Linu

Run OSD trhash ffsb test and in 1/3 runs I am seeing the following crash


Version-Release number of selected component (if applicable):


How reproducible:
1/3

Steps to Reproduce:
1. Run Krbd: osd thrash test


Actual results:

2015-08-12T20:37:51.883 INFO:tasks.ceph.osd.4.magna078.stderr:osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
2015-08-12T20:37:51.884 INFO:tasks.ceph.osd.4.magna078.stderr: ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
2015-08-12T20:37:51.885 INFO:tasks.ceph.osd.4.magna078.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xaf6715]
2015-08-12T20:37:51.886 INFO:tasks.ceph.osd.4.magna078.stderr: 2: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1914) [0x84dd24]
2015-08-12T20:37:51.887 INFO:tasks.ceph.osd.4.magna078.stderr: 3: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x45d7) [0x852677]
2015-08-12T20:37:51.887 INFO:tasks.ceph.osd.4.magna078.stderr: 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x68a) [0x7ee58a]
2015-08-12T20:37:51.888 INFO:tasks.ceph.osd.4.magna078.stderr: 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x405) [0x652185]
2015-08-12T20:37:51.889 INFO:tasks.ceph.osd.4.magna078.stderr: 6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x2e9) [0x6526a9]
2015-08-12T20:37:51.890 INFO:tasks.ceph.osd.4.magna078.stderr: 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x882) [0xae5ec2]
2015-08-12T20:37:51.890 INFO:tasks.ceph.osd.4.magna078.stderr: 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xae7fe0]
2015-08-12T20:37:51.891 INFO:tasks.ceph.osd.4.magna078.stderr: 9: (()+0x7df5) [0x7f34a4edbdf5]
2015-08-12T20:37:51.892 INFO:tasks.ceph.osd.4.magna078.stderr: 10: (clone()+0x6d) [0x7f34a39be1ad]
2015-08-12T20:37:51.893 INFO:tasks.ceph.osd.4.magna078.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-08-12T20:37:51.893 INFO:tasks.ceph.osd.4.magna078.stderr:2015-08-12 20:37:51.880548 7f34899e1700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)' thread 7f34899e1700 time 2015-08-12 20:37:51.878724
2015-08-12T20:37:51.894 INFO:tasks.ceph.osd.4.magna078.stderr:osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
2015-08-12T20:37:51.895 INFO:tasks.ceph.osd.4.magna078.stderr:
2015-08-12T20:37:51.896 INFO:tasks.ceph.osd.4.magna078.stderr: ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
2015-08-12T20:37:51.896 INFO:tasks.ceph.osd.4.magna078.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xaf6715]
2015-08-12T20:37:51.897 INFO:tasks.ceph.osd.4.magna078.stderr: 2: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1914) [0x84dd24]
2015-08-12T20:37:51.898 INFO:tasks.ceph.osd.4.magna078.stderr: 3: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x45d7) [0x852677]
2015-08-12T20:37:51.898 INFO:tasks.ceph.osd.4.magna078.stderr: 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x68a) [0x7ee58a]
2015-08-12T20:37:51.899 INFO:tasks.ceph.osd.4.magna078.stderr: 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x405) [0x652185]
2015-08-12T20:37:51.900 INFO:tasks.ceph.osd.4.magna078.stderr: 6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x2e9) [0x6526a9]
2015-08-12T20:37:51.901 INFO:tasks.ceph.osd.4.magna078.stderr: 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x882) [0xae5ec2]
2015-08-12T20:37:51.901 INFO:tasks.ceph.osd.4.magna078.stderr: 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xae7fe0]
2015-08-12T20:37:51.902 INFO:tasks.ceph.osd.4.magna078.stderr: 9: (()+0x7df5) [0x7f34a4edbdf5]
2015-08-12T20:37:51.903 INFO:tasks.ceph.osd.4.magna078.stderr: 10: (clone()+0x6d) [0x7f34a39be1ad]
2015-08-12T20:37:51.904 INFO:tasks.ceph.osd.4.magna078.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-08-12T20:37:51.905 INFO:tasks.ceph.osd.4.magna078.stderr:
2015-08-12T20:37:51.957 INFO:tasks.ceph.osd.4.magna078.stderr:  -209> 2015-08-12 20:37:51.878713 7f34899e1700 -1 osd.4 pg_epoch: 197 pg[0.e( v 194'11837 (106'8625,194'11837] local-les=181 n=198 ec=1 les/c 181/165 180/180/173) [3,5]/[4,5] r=0 lpr=180 pi=164-179/4 rops=5 bft=3 crt=188'11821 lcod 194'11836 mlcod 194'11836 active+remapped+backfilling] bad op order, already applied 199013 > this 198728
2015-08-12T20:37:51.958 INFO:tasks.ceph.osd.4.magna078.stderr:   -21> 2015-08-12 20:37:51.880548 7f34899e1700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)' thread 7f34899e1700 time 2015-08-12 20:37:51.878724
2015-08-12T20:37:51.959 INFO:tasks.ceph.osd.4.magna078.stderr:osd/ReplicatedPG.cc: 2348: FAILED assert(0 == "out of order op")
2015-08-12T20:37:51.960 INFO:tasks.ceph.osd.4.magna078.stderr:
2015-08-12T20:37:51.961 INFO:tasks.ceph.osd.4.magna078.stderr: ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
2015-08-12T20:37:51.962 INFO:tasks.ceph.osd.4.magna078.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xaf6715]
2015-08-12T20:37:51.962 INFO:tasks.ceph.osd.4.magna078.stderr: 2: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1914) [0x84dd24]
2015-08-12T20:37:51.963 INFO:tasks.ceph.osd.4.magna078.stderr: 3: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x45d7) [0x852677]
2015-08-12T20:37:51.964 INFO:tasks.ceph.osd.4.magna078.stderr: 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x68a) [0x7ee58a]
2015-08-12T20:37:51.965 INFO:tasks.ceph.osd.4.magna078.stderr: 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x405) [0x652185]
2015-08-12T20:37:51.966 INFO:tasks.ceph.osd.4.magna078.stderr: 6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x2e9) [0x6526a9]
2015-08-12T20:37:51.967 INFO:tasks.ceph.osd.4.magna078.stderr: 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x882) [0xae5ec2]
2015-08-12T20:37:51.968 INFO:tasks.ceph.osd.4.magna078.stderr: 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xae7fe0]
2015-08-12T20:37:51.968 INFO:tasks.ceph.osd.4.magna078.stderr: 9: (()+0x7df5) [0x7f34a4edbdf5]
2015-08-12T20:37:51.969 INFO:tasks.ceph.osd.4.magna078.stderr: 10: (clone()+0x6d) [0x7f34a39be1ad]
2015-08-12T20:37:51.970 INFO:tasks.ceph.osd.4.magna078.stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2015-08-12T20:37:51.971 INFO:tasks.ceph.osd.4.magna078.stderr:
2015-08-12T20:37:51.973 INFO:tasks.ceph.osd.4.magna078.stderr:terminate called after throwing an instance of 'ceph::FailedAssertion'
2015-08-12T20:37:51.973 INFO:tasks.ceph.osd.4.magna078.stderr:*** Caught signal (Aborted) **
2015-08-12T20:37:51.974 INFO:tasks.ceph.osd.4.magna078.stderr: in thread 7f34899e1700
2015-08-12T20:37:51.975 INFO:tasks.ceph.osd.4.magna078.stderr: ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
2015-08-12T20:37:51.975 INFO:tasks.ceph.osd.4.magna078.stderr: 1: ceph-osd() [0x9f63f2]
2015-08-12T20:37:51.976 INFO:tasks.ceph.osd.4.magna078.stderr: 2: (()+0xf130) [0x7f34a4ee3130]
2015-08-12T20:37:51.977 INFO:tasks.ceph.osd.4.magna078.stderr: 3: (gsignal()+0x37) [0x7f34a38fd5d7]
2015-08-12T20:37:51.977 INFO:tasks.ceph.osd.4.magna078.stderr: 4: (abort()+0x148) [0x7f34a38fecc8]
2015-08-12T20:37:51.978 INFO:tasks.ceph.osd.4.magna078.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f34a42019b5]
2015-08-12T20:37:51.979 INFO:tasks.ceph.osd.4.magna078.stderr: 6: (()+0x5e926) [0x7f34a41ff926]
2015-08-12T20:37:51.979 INFO:tasks.ceph.osd.4.magna078.stderr: 7: (()+0x5e953) [0x7f34a41ff953]
2015-08-12T20:37:51.980 INFO:tasks.ceph.osd.4.magna078.stderr: 8: (()+0x5eb73) [0x7f34a41ffb73]
2015-08-12T20:37:51.981 INFO:tasks.ceph.osd.4.magna078.stderr: 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xaf690a]
2015-08-12T20:37:51.981 INFO:tasks.ceph.osd.4.magna078.stderr: 10: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x1914) [0x84dd24]
2015-08-12T20:37:51.982 INFO:tasks.ceph.osd.4.magna078.stderr: 11: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x45d7) [0x852677]
2015-08-12T20:37:51.983 INFO:tasks.ceph.osd.4.magna078.stderr: 12: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x68a) [0x7ee58a]
2015-08-12T20:37:51.983 INFO:tasks.ceph.osd.4.magna078.stderr: 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x405) [0x652185]
2015-08-12T20:37:51.984 INFO:tasks.ceph.osd.4.magna078.stderr: 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x2e9) [0x6526a9]
2015-08-12T20:37:51.985 INFO:tasks.ceph.osd.4.magna078.stderr: 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x882) [0xae5ec2]
2015-08-12T20:37:51.985 INFO:tasks.ceph.osd.4.magna078.stderr: 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xae7fe0]
2015-08-12T20:37:51.986 INFO:tasks.ceph.osd.4.magna078.stderr: 17: (()+0x7df5) [0x7f34a4edbdf5]
2015-08-12T20:37:51.986 INFO:tasks.ceph.osd.4.magna078.stderr: 18: (clone()+0x6d) [0x7f34a39be1ad]
2015-08-12T20:37:51.987 INFO:tasks.ceph.osd.4.magna078.stderr:2015-08-12 20:37:51.961266 7f34899e1700 -1 *** Caught signal (Aborted) **

Expected results:


Additional info:
Comment 2 Ken Dreyer (Red Hat) 2015-12-11 17:04:48 EST
After discussing this issue with Sam, and reviewing the upstream source, apparently this is an intentional assert in the OSD and Sam wants to keep it. It reflects a bug in kRBD.

Josh, can you fix this or re-assign as appropriate?
Comment 3 Josh Durgin 2015-12-11 17:11:39 EST
This is should be fixed by the osd client sync Ilya's working on. It's purely a kernel client change, I'm not sure how we want to mark that here.

We hadn't been able to reproduce in a way that told us enough information to fix it in the existing kernel osd client yet.
Comment 4 Ken Dreyer (Red Hat) 2015-12-17 15:40:20 EST
Ok, thanks. Is there a BZ tracking Ilya's osd client sync?
Comment 5 Josh Durgin 2015-12-17 19:32:13 EST
No bz for it yet, just the upstream ticket.
Comment 7 Ken Dreyer (Red Hat) 2016-08-03 15:48:58 EDT
RHEL 7.3 kRBD bug is 1344930
Comment 8 Ken Dreyer (Red Hat) 2017-08-08 16:12:19 EDT
Sounds like we just need a re-test here with RHEL 7.4.

Note You need to log in before you can comment on or make changes to this bug.