Bug 1225005 - Seeing a rbd crash while write + resize operations
Summary: Seeing a rbd crash while write + resize operations
Keywords:
Status: CLOSED DUPLICATE of bug 1223731
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD
Version: 1.3.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: 1.3.0
Assignee: Josh Durgin
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-26 12:02 UTC by Tanay Ganguly
Modified: 2017-07-30 15:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-26 13:59:33 UTC
Embargoed:


Attachments (Terms of Use)
Core dump (3.75 MB, application/x-gzip)
2015-05-26 12:02 UTC, Tanay Ganguly
no flags Details
bt (107.76 KB, text/plain)
2015-05-26 12:02 UTC, Tanay Ganguly
no flags Details

Description Tanay Ganguly 2015-05-26 12:02:00 UTC
Created attachment 1029884 [details]
Core dump

Description of problem:
Seeing a crash with Parallel resize and rbd bench-write

Version-Release number of selected component (if applicable):
ceph version 0.94.1

How reproducible:
100 %

Steps to Reproduce:
1. Create a rbd image.

FlattenClone_new6':
        size 11462 MB in 2866 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.a4aea515f007c
        format: 2
        features: layering, exclusive, object map
        flags:
        parent: Tanay-RBD/Flatten6@Flattensnap6
        overlap: 10240 MB
2. Start the resize operation on that image.

#!/bin/python
import os
import random
import time
size=11000
i=0
new_size=0
sh_size=0
while i < 50:
        x=random.randint(1,500)
        new_size=size + x
        cmd1 = 'rbd resize Tanay-RBD/FlattenClone_new6 --size %s' %new_size
        print 'cmd is %s' %cmd1
        os.system(cmd1)
        time.sleep(5)
        x=random.randint(1,100)
        sh_size= new_size - x
        cmd2 = 'rbd resize Tanay-RBD/FlattenClone_new6 --size %s --allow-shrink' %sh_size
        print 'cmd2 is %s' %cmd2
        os.system(cmd2)
        i = i +1
3. Once it starts, start writing IO on the same rbd image.
rbd bench-write --io-pattern rand Tanay-RBD/FlattenClone_new6

Actual results:
Seeing a rbd crash, i think again there is some locking contention.

snippet:
    -1> 2015-05-26 06:49:32.404746 7feb4ffff700  1 -- 10.12.27.17:0/1019198 <== osd.38 10.12.27.18:6816/2589 314 ==== watch-notify(notify (1) cookie 63247104 notify 46956877447660 ret 0) v3 ==== 84+0+0 (33845051
68 0 0) 0x7feb44003340 con 0x3c4a9d0
     0> 2015-05-26 06:49:32.408340 7feb6dd5c700 -1 *** Caught signal (Aborted) **
 in thread 7feb6dd5c700

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: rbd() [0x5a06f2]
 2: (()+0xf130) [0x7feb76335130]
 3: (gsignal()+0x37) [0x7feb7536e5d7]
 4: (abort()+0x148) [0x7feb7536fcc8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7feb75c729b5]
 6: (()+0x5e926) [0x7feb75c70926]
 7: (()+0x5e953) [0x7feb75c70953]
 8: (()+0x5eb73) [0x7feb75c70b73]
 9: (()+0x15e80a) [0x7feb7959980a]
 10: (()+0x110d15) [0x7feb7954bd15]
 11: (()+0x53bad) [0x7feb7948ebad]
 12: (()+0x497cc) [0x7feb794847cc]
 13: (()+0x4b1ac) [0x7feb794861ac]
 14: (()+0x4730c) [0x7feb7948230c]
 15: (()+0x44bfa) [0x7feb7947fbfa]
 16: (()+0x3f149) [0x7feb7947a149]
 17: (()+0x488dc) [0x7feb794838dc]
 18: (()+0x3dc69) [0x7feb79478c69]
 19: (()+0x3eafa) [0x7feb79479afa]
 20: (()+0x3f149) [0x7feb7947a149]
 21: (()+0x39f006) [0x7feb797da006]
 22: (()+0x3f149) [0x7feb7947a149]
 23: (()+0xb30e8) [0x7feb794ee0e8]
 24: (()+0x7df5) [0x7feb7632ddf5]
 25: (clone()+0x6d) [0x7feb7542f1ad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none   
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush  
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer 
   0/ 1 timer  
   0/ 1 filer  
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados  
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client 
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc   
   1/ 5 paxos  
   0/ 5 tp
   1/ 5 auth   
   1/ 5 crypto 
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok   
   1/ 1 throttle
   0/ 0 refs   
   1/ 5 xio
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent       500
  max_new         1000
  log_file
--- end dump of recent events ---

Expected results:
There should be no crash

Additional info:
BT and coredump attached.

Comment 2 Tanay Ganguly 2015-05-26 12:02:34 UTC
Created attachment 1029885 [details]
bt

Comment 3 Jason Dillaman 2015-05-26 13:59:33 UTC

*** This bug has been marked as a duplicate of bug 1223731 ***


Note You need to log in before you can comment on or make changes to this bug.