Bug 1170657 - Crash starting libvirt rbd pool
Summary: Crash starting libvirt rbd pool
Keywords:
Status: CLOSED DUPLICATE of bug 1144794
Alias: None
Product: Fedora
Classification: Fedora
Component: ceph
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Boris Ranto
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-04 14:40 UTC by Cole Robinson
Modified: 2014-12-23 12:32 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-12-23 12:32:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Cole Robinson 2014-12-04 14:40:23 UTC
FWIW this is my first time trying libvirt + ceph integration.

$ rpm -q libvirt-daemon ceph
libvirt-daemon-1.2.9.1-2.fc21.x86_64
ceph-0.80.7-1.fc21.x86_64

$ cat test.xml
<pool type='rbd'>
  <name>f20-ceph</name>
  <source>
    <host name='192.168.124.101'/>
    <name>rbd</name>
  </source>
</pool>

$ sudo virsh pool-define test.xml
$ sudo virsh pool-start f20-ceph
<libvirt crashes>

Backtrace:

#0  __GI___pthread_rwlock_unlock (rwlock=0x7fffc4003810)
    at pthread_rwlock_unlock.c:34
#1  0x00007fffe09c5b42 in ceph::HeartbeatMap::~HeartbeatMap() ()
   from /lib64/librados.so.2
#2  0x00007fffe09b32b3 in CephContext::~CephContext() ()
   from /lib64/librados.so.2
#3  0x00007fffe08f1fa0 in librados::RadosClient::~RadosClient() ()
   from /lib64/librados.so.2
#4  0x00007fffe08f21a9 in librados::RadosClient::~RadosClient() ()
   from /lib64/librados.so.2
#5  0x00007fffe1bffdec in virStorageBackendRBDCloseRADOSConn (
    ptr=ptr@entry=0x7fffe6527ac0) at storage/storage_backend_rbd.c:259
#6  0x00007fffe1c00a49 in virStorageBackendRBDRefreshPool (
    conn=<optimized out>, pool=0x7fffd80ecfb0)
    at storage/storage_backend_rbd.c:414
#7  0x00007fffe1bef97d in storagePoolCreate (obj=0x7fffc4000930, 
    flags=<optimized out>) at storage/storage_driver.c:798
#8  0x00007ffff75621ee in virStoragePoolCreate (
    pool=pool@entry=0x7fffc4000930, flags=0) at libvirt.c:13029
#9  0x000055555558659e in remoteDispatchStoragePoolCreate (
    server=<optimized out>, msg=<optimized out>, args=0x7fffc40008c0, 
    rerr=0x7fffe6527cb0, client=<optimized out>) at remote_dispatch.h:12779
#10 remoteDispatchStoragePoolCreateHelper (server=<optimized out>, 
    client=<optimized out>, msg=<optimized out>, rerr=0x7fffe6527cb0, 
    args=0x7fffc40008c0, ret=<optimized out>) at remote_dispatch.h:12757
#11 0x00007ffff75b5689 in virNetServerProgramDispatchCall (msg=0x555555800760, 
    client=0x555555803380, server=0x5555557e8ca0, prog=0x5555557fcb80)
    at rpc/virnetserverprogram.c:437
#12 virNetServerProgramDispatch (prog=0x5555557fcb80, 
    server=server@entry=0x5555557e8ca0, client=0x555555803380, 
    msg=0x555555800760) at rpc/virnetserverprogram.c:307
#13 0x0000555555599ce8 in virNetServerProcessMsg (msg=<optimized out>, 
    prog=<optimized out>, client=<optimized out>, srv=0x5555557e8ca0)
    at rpc/virnetserver.c:172
#14 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x5555557e8ca0)
    at rpc/virnetserver.c:193
#15 0x00007ffff74b8f6e in virThreadPoolWorker (
    opaque=opaque@entry=0x5555557dcd40) at util/virthreadpool.c:145
#16 0x00007ffff74b890e in virThreadHelper (data=<optimized out>)
    at util/virthread.c:197
#17 0x00007ffff42bf52a in start_thread (arg=0x7fffe6528700)
    at pthread_create.c:310
#18 0x00007ffff3ffb77d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Comment 1 Darryl Bond 2014-12-19 05:02:57 UTC
I suspect it is a bit wider than libvirt, segfaults with any ceph rbd command:
# rbd --cluster backup ls
rbdtest
rbdtest1
zztest
*** Caught signal (Segmentation fault) **
 in thread 7fa8161c4780
 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: rbd() [0x424922]
 2: /lib64/libpthread.so.0() [0x3a9bc100d0]
 3: (pthread_rwlock_unlock()+0x13) [0x3a9bc0c153]
 4: (librados::RadosClient::~RadosClient()+0x122) [0x3625faa032]
 5: (librados::RadosClient::~RadosClient()+0x9) [0x3625faa1a9]
 6: (librados::Rados::shutdown()+0x36) [0x3625f877d6]
 7: (main()+0x4c7) [0x40f447]
 8: (__libc_start_main()+0xf0) [0x3a9b41ffe0]
 9: rbd() [0x4162e7]
2014-12-19 14:59:35.975456 7fa8161c4780 -1 *** Caught signal (Segmentation fault) **
 in thread 7fa8161c4780

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: rbd() [0x424922]
 2: /lib64/libpthread.so.0() [0x3a9bc100d0]
 3: (pthread_rwlock_unlock()+0x13) [0x3a9bc0c153]
 4: (librados::RadosClient::~RadosClient()+0x122) [0x3625faa032]
 5: (librados::RadosClient::~RadosClient()+0x9) [0x3625faa1a9]
 6: (librados::Rados::shutdown()+0x36) [0x3625f877d6]
 7: (main()+0x4c7) [0x40f447]
 8: (__libc_start_main()+0xf0) [0x3a9b41ffe0]
 9: rbd() [0x4162e7]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -76> 2014-12-19 14:59:35.964919 7fa8161c4780  5 asok(0x132e4d0) register_command perfcounters_dump hook 0x132e2f0
   -75> 2014-12-19 14:59:35.964925 7fa8161c4780  5 asok(0x132e4d0) register_command 1 hook 0x132e2f0
   -74> 2014-12-19 14:59:35.964927 7fa8161c4780  5 asok(0x132e4d0) register_command perf dump hook 0x132e2f0
   -73> 2014-12-19 14:59:35.964932 7fa8161c4780  5 asok(0x132e4d0) register_command perfcounters_schema hook 0x132e2f0
   -72> 2014-12-19 14:59:35.964933 7fa8161c4780  5 asok(0x132e4d0) register_command 2 hook 0x132e2f0
   -71> 2014-12-19 14:59:35.964936 7fa8161c4780  5 asok(0x132e4d0) register_command perf schema hook 0x132e2f0
   -70> 2014-12-19 14:59:35.964940 7fa8161c4780  5 asok(0x132e4d0) register_command config show hook 0x132e2f0
   -69> 2014-12-19 14:59:35.964942 7fa8161c4780  5 asok(0x132e4d0) register_command config set hook 0x132e2f0
   -68> 2014-12-19 14:59:35.964944 7fa8161c4780  5 asok(0x132e4d0) register_command config get hook 0x132e2f0
   -67> 2014-12-19 14:59:35.964946 7fa8161c4780  5 asok(0x132e4d0) register_command log flush hook 0x132e2f0
   -66> 2014-12-19 14:59:35.964948 7fa8161c4780  5 asok(0x132e4d0) register_command log dump hook 0x132e2f0
   -65> 2014-12-19 14:59:35.964949 7fa8161c4780  5 asok(0x132e4d0) register_command log reopen hook 0x132e2f0
   -64> 2014-12-19 14:59:35.966748 7fa8161c4780 10 monclient(hunting): build_initial_monmap
   -63> 2014-12-19 14:59:35.966797 7fa8161c4780  1 librados: starting msgr at :/0
   -62> 2014-12-19 14:59:35.966809 7fa8161c4780  1 librados: starting objecter
   -61> 2014-12-19 14:59:35.966837 7fa8161c4780  1 -- :/0 messenger.start
   -60> 2014-12-19 14:59:35.966856 7fa8161c4780  1 librados: setting wanted keys
   -59> 2014-12-19 14:59:35.966858 7fa8161c4780  1 librados: calling monclient init
   -58> 2014-12-19 14:59:35.966872 7fa8161c4780 10 monclient(hunting): init
   -57> 2014-12-19 14:59:35.966873 7fa8161c4780  5 adding auth protocol: cephx
   -56> 2014-12-19 14:59:35.966874 7fa8161c4780 10 monclient(hunting): auth_supported 2 method cephx
   -55> 2014-12-19 14:59:35.966960 7fa8161c4780  2 auth: KeyRing::load: loaded key file /etc/ceph/backup.client.admin.keyring
   -54> 2014-12-19 14:59:35.966999 7fa8161c4780 10 monclient(hunting): _reopen_session rank -1 name 
   -53> 2014-12-19 14:59:35.967018 7fa8161c4780 10 monclient(hunting): picked mon.noname-c con 0x134dca0 addr 10.6.60.103:6789/0
   -52> 2014-12-19 14:59:35.967039 7fa8161c4780 10 monclient(hunting): _send_mon_message to mon.noname-c at 10.6.60.103:6789/0
   -51> 2014-12-19 14:59:35.967042 7fa8161c4780  1 -- :/1013543 --> 10.6.60.103:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x134ec60 con 0x134dca0
   -50> 2014-12-19 14:59:35.967047 7fa8161c4780 10 monclient(hunting): renew_subs
   -49> 2014-12-19 14:59:35.967049 7fa8161c4780 10 monclient(hunting): authenticate will time out at 2014-12-19 15:04:35.967049
   -48> 2014-12-19 14:59:35.967815 7fa812f7c700  1 -- 10.6.168.29:0/1013543 learned my addr 10.6.168.29:0/1013543
   -47> 2014-12-19 14:59:35.968556 7fa814f80700  1 -- 10.6.168.29:0/1013543 <== mon.2 10.6.60.103:6789/0 1 ==== mon_map v1 ==== 485+0+0 (753345867 0 0) 0x7fa804000bd0 con 0x134dca0
   -46> 2014-12-19 14:59:35.968594 7fa814f80700 10 monclient(hunting): handle_monmap mon_map v1
   -45> 2014-12-19 14:59:35.968600 7fa814f80700 10 monclient(hunting):  got monmap 1, mon.noname-c is now rank -1
   -44> 2014-12-19 14:59:35.968602 7fa814f80700 10 monclient(hunting): dump:
epoch 1
fsid 4e3aa077-8708-4b78-911c-f6e2fc9e3d82
last_changed 0.000000
created 0.000000
0: 10.6.60.101:6789/0 mon.ceph1
1: 10.6.60.102:6789/0 mon.ceph2
2: 10.6.60.103:6789/0 mon.ceph3

   -43> 2014-12-19 14:59:35.968611 7fa814f80700  1 -- 10.6.168.29:0/1013543 <== mon.2 10.6.60.103:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (1855063449 0 0) 0x7fa8040010c0 con 0x134dca0
   -42> 2014-12-19 14:59:35.968623 7fa814f80700 10 monclient(hunting): my global_id is 6053
   -41> 2014-12-19 14:59:35.968694 7fa814f80700 10 monclient(hunting): _send_mon_message to mon.ceph3 at 10.6.60.103:6789/0
   -40> 2014-12-19 14:59:35.968697 7fa814f80700  1 -- 10.6.168.29:0/1013543 --> 10.6.60.103:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- ?+0 0x7fa8080015b0 con 0x134dca0
   -39> 2014-12-19 14:59:35.969300 7fa814f80700  1 -- 10.6.168.29:0/1013543 <== mon.2 10.6.60.103:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (2636603957 0 0) 0x7fa8040010c0 con 0x134dca0
   -38> 2014-12-19 14:59:35.969349 7fa814f80700 10 monclient(hunting): _send_mon_message to mon.ceph3 at 10.6.60.103:6789/0
   -37> 2014-12-19 14:59:35.969353 7fa814f80700  1 -- 10.6.168.29:0/1013543 --> 10.6.60.103:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- ?+0 0x7fa808003760 con 0x134dca0
   -36> 2014-12-19 14:59:35.969986 7fa814f80700  1 -- 10.6.168.29:0/1013543 <== mon.2 10.6.60.103:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 393+0+0 (130919758 0 0) 0x7fa8040008c0 con 0x134dca0
   -35> 2014-12-19 14:59:35.970025 7fa814f80700  1 monclient(hunting): found mon.ceph3
   -34> 2014-12-19 14:59:35.970027 7fa814f80700 10 monclient: _send_mon_message to mon.ceph3 at 10.6.60.103:6789/0
   -33> 2014-12-19 14:59:35.970029 7fa814f80700  1 -- 10.6.168.29:0/1013543 --> 10.6.60.103:6789/0 -- mon_subscribe({monmap=0+}) v2 -- ?+0 0x134f030 con 0x134dca0
   -32> 2014-12-19 14:59:35.970061 7fa8161c4780  5 monclient: authenticate success, global_id 6053
   -31> 2014-12-19 14:59:35.970081 7fa8161c4780  5 asok(0x132e4d0) register_command objecter_requests hook 0x134ef40
   -30> 2014-12-19 14:59:35.970126 7fa8161c4780 10 monclient: renew_subs
   -29> 2014-12-19 14:59:35.970128 7fa8161c4780 10 monclient: _send_mon_message to mon.ceph3 at 10.6.60.103:6789/0
   -28> 2014-12-19 14:59:35.970142 7fa8161c4780  1 -- 10.6.168.29:0/1013543 --> 10.6.60.103:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x134f030 con 0x134dca0
   -27> 2014-12-19 14:59:35.970148 7fa8161c4780 10 monclient: renew_subs
   -26> 2014-12-19 14:59:35.970149 7fa8161c4780 10 monclient: _send_mon_message to mon.ceph3 at 10.6.60.103:6789/0
   -25> 2014-12-19 14:59:35.970150 7fa8161c4780  1 -- 10.6.168.29:0/1013543 --> 10.6.60.103:6789/0 -- mon_subscribe({monmap=2+,osdmap=0}) v2 -- ?+0 0x13506b0 con 0x134dca0
   -24> 2014-12-19 14:59:35.970178 7fa8161c4780  1 librados: init done
   -23> 2014-12-19 14:59:35.970435 7fa814f80700  1 -- 10.6.168.29:0/1013543 <== mon.2 10.6.60.103:6789/0 5 ==== mon_map v1 ==== 485+0+0 (753345867 0 0) 0x7fa8040013b0 con 0x134dca0
   -22> 2014-12-19 14:59:35.970444 7fa814f80700 10 monclient: handle_monmap mon_map v1
   -21> 2014-12-19 14:59:35.970450 7fa814f80700 10 monclient:  got monmap 1, mon.ceph3 is now rank 2
   -20> 2014-12-19 14:59:35.970452 7fa814f80700 10 monclient: dump:
epoch 1
fsid 4e3aa077-8708-4b78-911c-f6e2fc9e3d82
last_changed 0.000000
created 0.000000
0: 10.6.60.101:6789/0 mon.ceph1
1: 10.6.60.102:6789/0 mon.ceph2
2: 10.6.60.103:6789/0 mon.ceph3

   -19> 2014-12-19 14:59:35.970461 7fa814f80700  1 -- 10.6.168.29:0/1013543 <== mon.2 10.6.60.103:6789/0 6 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (864314294 0 0) 0x7fa804001650 con 0x134dca0
   -18> 2014-12-19 14:59:35.970465 7fa814f80700 10 monclient: handle_subscribe_ack sent 2014-12-19 14:59:35.967048 renew after 2014-12-19 15:02:05.967048
   -17> 2014-12-19 14:59:35.971107 7fa814f80700  1 -- 10.6.168.29:0/1013543 <== mon.2 10.6.60.103:6789/0 7 ==== osd_map(531..531 src has 1..531) v3 ==== 18619+0+0 (964452946 0 0) 0x7fa8040008c0 con 0x134dca0
   -16> 2014-12-19 14:59:35.971198 7fa814f80700  1 -- 10.6.168.29:0/1013543 <== mon.2 10.6.60.103:6789/0 8 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (864314294 0 0) 0x7fa804005cb0 con 0x134dca0
   -15> 2014-12-19 14:59:35.971203 7fa814f80700 10 monclient: handle_subscribe_ack sent 0.000000, ignoring
   -14> 2014-12-19 14:59:35.971246 7fa8161c4780  1 -- 10.6.168.29:0/1013543 --> 10.6.60.101:6803/3501 -- osd_op(client.6053.0:1 rbd_directory [read 0~0] 2.30a98c1c ack+read e531) v4 -- ?+0 0x1351140 con 0x1351730
   -13> 2014-12-19 14:59:35.973241 7fa814f80700  1 -- 10.6.168.29:0/1013543 <== osd.17 10.6.60.101:6803/3501 1 ==== osd_op_reply(1 rbd_directory [read 0~53] v0'0 uv12 ondisk = 0) v6 ==== 180+0+53 (2004202641 0 3088624428) 0x7fa800000c30 con 0x1351730
   -12> 2014-12-19 14:59:35.973308 7fa8161c4780  1 -- 10.6.168.29:0/1013543 --> 10.6.60.101:6803/3501 -- osd_op(client.6053.0:2 rbd_directory [call rbd.dir_list] 2.30a98c1c ack+read e531) v4 -- ?+0 0x1351140 con 0x1351730
   -11> 2014-12-19 14:59:35.974438 7fa814f80700  1 -- 10.6.168.29:0/1013543 <== osd.17 10.6.60.101:6803/3501 2 ==== osd_op_reply(2 rbd_directory [call] v0'0 uv12 ondisk = 0) v6 ==== 180+0+4 (3904276009 0 0) 0x7fa800000c30 con 0x1351730
   -10> 2014-12-19 14:59:35.974531 7fa8161c4780  1 -- 10.6.168.29:0/1013543 mark_down 0x1351730 -- 0x13514a0
    -9> 2014-12-19 14:59:35.974613 7fa811c77700  2 -- 10.6.168.29:0/1013543 >> 10.6.60.101:6803/3501 pipe(0x13514a0 sd=4 :48468 s=4 pgs=496 cs=1 l=1 c=0x1351730).reader couldn't read tag, (0) Success
    -8> 2014-12-19 14:59:35.974623 7fa8161c4780 10 monclient: shutdownshutdown
    -7> 2014-12-19 14:59:35.974629 7fa811c77700  2 -- 10.6.168.29:0/1013543 >> 10.6.60.101:6803/3501 pipe(0x13514a0 sd=4 :48468 s=4 pgs=496 cs=1 l=1 c=0x1351730).fault (0) Success
    -6> 2014-12-19 14:59:35.974651 7fa8161c4780  1 -- 10.6.168.29:0/1013543 mark_down 0x134dca0 -- 0x134e5b0
    -5> 2014-12-19 14:59:35.974686 7fa8161c4780  5 asok(0x132e4d0) unregister_command objecter_requests
    -4> 2014-12-19 14:59:35.974703 7fa8161c4780  1 -- 10.6.168.29:0/1013543 mark_down_all
    -3> 2014-12-19 14:59:35.974700 7fa812e7b700  2 -- 10.6.168.29:0/1013543 >> 10.6.60.103:6789/0 pipe(0x134e5b0 sd=3 :37066 s=4 pgs=720 cs=1 l=1 c=0x134dca0).fault (0) Success
    -2> 2014-12-19 14:59:35.975107 7fa8161c4780  1 -- 10.6.168.29:0/1013543 shutdown complete.
    -1> 2014-12-19 14:59:35.975113 7fa8161c4780  1 librados: shutdown
     0> 2014-12-19 14:59:35.975456 7fa8161c4780 -1 *** Caught signal (Segmentation fault) **
 in thread 7fa8161c4780

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: rbd() [0x424922]
 2: /lib64/libpthread.so.0() [0x3a9bc100d0]
 3: (pthread_rwlock_unlock()+0x13) [0x3a9bc0c153]
 4: (librados::RadosClient::~RadosClient()+0x122) [0x3625faa032]
 5: (librados::RadosClient::~RadosClient()+0x9) [0x3625faa1a9]
 6: (librados::Rados::shutdown()+0x36) [0x3625f877d6]
 7: (main()+0x4c7) [0x40f447]
 8: (__libc_start_main()+0xf0) [0x3a9b41ffe0]
 9: rbd() [0x4162e7]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent       500
  max_new         1000
  log_file 
--- end dump of recent events ---
Segmentation fault (core dumped)

Comment 2 Boris Ranto 2014-12-22 08:12:17 UTC
This appears to be a duplicate of bz1144794. It should be fixed in latest ceph release (version 1:0.80.7-2.fc21.x86_64) available in the testing repo. Could you please retest with the updated packages?

Comment 3 Darryl Bond 2014-12-22 21:05:55 UTC
Ceph 1:0.80.7-2.fc21 rbd working for me.

Comment 4 Boris Ranto 2014-12-23 12:32:01 UTC
Thanks for testing, closing as duplicate of the bz.

*** This bug has been marked as a duplicate of bug 1144794 ***


Note You need to log in before you can comment on or make changes to this bug.