Created attachment 1029884 [details] Core dump Description of problem: Seeing a crash with Parallel resize and rbd bench-write Version-Release number of selected component (if applicable): ceph version 0.94.1 How reproducible: 100 % Steps to Reproduce: 1. Create a rbd image. FlattenClone_new6': size 11462 MB in 2866 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.a4aea515f007c format: 2 features: layering, exclusive, object map flags: parent: Tanay-RBD/Flatten6@Flattensnap6 overlap: 10240 MB 2. Start the resize operation on that image. #!/bin/python import os import random import time size=11000 i=0 new_size=0 sh_size=0 while i < 50: x=random.randint(1,500) new_size=size + x cmd1 = 'rbd resize Tanay-RBD/FlattenClone_new6 --size %s' %new_size print 'cmd is %s' %cmd1 os.system(cmd1) time.sleep(5) x=random.randint(1,100) sh_size= new_size - x cmd2 = 'rbd resize Tanay-RBD/FlattenClone_new6 --size %s --allow-shrink' %sh_size print 'cmd2 is %s' %cmd2 os.system(cmd2) i = i +1 3. Once it starts, start writing IO on the same rbd image. rbd bench-write --io-pattern rand Tanay-RBD/FlattenClone_new6 Actual results: Seeing a rbd crash, i think again there is some locking contention. snippet: -1> 2015-05-26 06:49:32.404746 7feb4ffff700 1 -- 10.12.27.17:0/1019198 <== osd.38 10.12.27.18:6816/2589 314 ==== watch-notify(notify (1) cookie 63247104 notify 46956877447660 ret 0) v3 ==== 84+0+0 (33845051 68 0 0) 0x7feb44003340 con 0x3c4a9d0 0> 2015-05-26 06:49:32.408340 7feb6dd5c700 -1 *** Caught signal (Aborted) ** in thread 7feb6dd5c700 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff) 1: rbd() [0x5a06f2] 2: (()+0xf130) [0x7feb76335130] 3: (gsignal()+0x37) [0x7feb7536e5d7] 4: (abort()+0x148) [0x7feb7536fcc8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7feb75c729b5] 6: (()+0x5e926) [0x7feb75c70926] 7: (()+0x5e953) [0x7feb75c70953] 8: (()+0x5eb73) [0x7feb75c70b73] 9: (()+0x15e80a) [0x7feb7959980a] 10: (()+0x110d15) [0x7feb7954bd15] 11: (()+0x53bad) [0x7feb7948ebad] 12: (()+0x497cc) [0x7feb794847cc] 13: (()+0x4b1ac) [0x7feb794861ac] 14: (()+0x4730c) [0x7feb7948230c] 15: (()+0x44bfa) [0x7feb7947fbfa] 16: (()+0x3f149) [0x7feb7947a149] 17: (()+0x488dc) [0x7feb794838dc] 18: (()+0x3dc69) [0x7feb79478c69] 19: (()+0x3eafa) [0x7feb79479afa] 20: (()+0x3f149) [0x7feb7947a149] 21: (()+0x39f006) [0x7feb797da006] 22: (()+0x3f149) [0x7feb7947a149] 23: (()+0xb30e8) [0x7feb794ee0e8] 24: (()+0x7df5) [0x7feb7632ddf5] 25: (clone()+0x6d) [0x7feb7542f1ad] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 500 max_new 1000 log_file --- end dump of recent events --- Expected results: There should be no crash Additional info: BT and coredump attached.
Created attachment 1029885 [details] bt
*** This bug has been marked as a duplicate of bug 1223731 ***