Description of problem: qemu-kvm crashes with double free or corruption in cephx code after hotfix in bz1296722 (gdb) bt #0 0x00007fa2519a05d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fa2519a1cc8 in __GI_abort () at abort.c:90 #2 0x00007fa2519e0e07 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fa251ae98c8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007fa2519e81fd in malloc_printerr (ptr=<optimized out>, str=0x7fa251ae99a0 "double free or corruption (!prev)", action=3) at malloc.c:4972 #4 _int_free (av=0x7fa251d25760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3804 #5 0x00007fa25bf7a424 in PK11_DestroyContext (context=0x7fa23c594870, freeit=1) at pk11cxt.c:68 #6 0x00007fa255cb0b7f in nss_aes_operation (op=op@entry=261, mechanism=<optimized out>, key=<optimized out>, param=<optimized out>, in=..., out=..., error=0x7fa247b23c30) at auth/Crypto.cc:246 #7 0x00007fa255cb163a in CryptoAESKeyHandler::decrypt (this=<optimized out>, in=..., out=..., error=<optimized out>) at auth/Crypto.cc:320 #8 0x00007fa255ca18cc in decrypt (cct=0x7fa247b23c30, error=0x7fa247b23c30, out=..., in=..., this=0x7fa247b23960) at auth/Crypto.h:114 #9 decode_decrypt_enc_bl<ceph::buffer::list> (cct=cct@entry=0x7fa25e00f930, t=..., key=..., bl_enc=..., error="") at auth/cephx/CephxProtocol.h:436 #10 0x00007fa255ca2160 in decode_decrypt<ceph::buffer::list> (cct=0x7fa25e00f930, t=..., key=..., iter=..., error="") at auth/cephx/CephxProtocol.h:474 #11 0x00007fa255c9c0ac in CephXTicketHandler::verify_service_ticket_reply (this=this@entry=0x7fa23c001d98, secret=..., indata=...) at auth/cephx/CephxProtocol.cc:162 #12 0x00007fa255c9db9b in CephXTicketManager::verify_service_ticket_reply (this=this@entry=0x7fa23c001a00, secret=..., indata=...) at auth/cephx/CephxProtocol.cc:276 #13 0x00007fa255c91d11 in CephxClientHandler::handle_response (this=0x7fa23c001950, ret=<optimized out>, indata=...) at auth/cephx/CephxClientHandler.cc:118 #14 0x00007fa255b2f3d1 in MonClient::handle_auth (this=this@entry=0x7fa25e015610, m=m@entry=0x7f9e24d1ad50) at mon/MonClient.cc:507 #15 0x00007fa255b312e9 in MonClient::ms_dispatch (this=0x7fa25e015610, m=0x7f9e24d1ad50) at mon/MonClient.cc:281 #16 0x00007fa255c2760a in ms_deliver_dispatch (m=0x7f9e24d1ad50, this=0x7fa25e02efd0) at msg/Messenger.h:567 #17 DispatchQueue::entry (this=0x7fa25e02f198) at msg/simple/DispatchQueue.cc:185 #18 0x00007fa255c5525d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/simple/DispatchQueue.h:103 #19 0x00007fa25b4bfdf5 in start_thread (arg=0x7fa247b25700) at pthread_create.c:308 #20 0x00007fa251a611ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Version-Release number of selected component (if applicable): Red Hat Ceph Storage 1.3 with hotfix in bz1296722 ceph-common-0.94.1-19.el7cp.0.hotfix.bz1296722.x86_64
(gdb) f 6 #6 0x00007fa255cb0b7f in nss_aes_operation (op=op@entry=261, mechanism=<optimized out>, key=<optimized out>, param=<optimized out>, in=..., out=..., error=0x7fa247b23c30) at auth/Crypto.cc:246 246 PK11_DestroyContext(ectx, PR_TRUE); (gdb) p *ectx $9 = {operation = 0, key = 0x7fa23c11b9a0, slot = 0x7fa25e0258a0, session = 19383812, sessionLock = 0x7fa23c526bc0, ownSession = 1, cx = 0x0, savedData = 0x0, savedLength = 140334768384896, param = 0x7fa23c011c30, init = 0, type = 4229, fortezzaHack = 0} (gdb) p *ectx->sessionLock $10 = {mutex = {__data = {__lock = 1012046864, __count = 32674, __owner = 1009215584, __nusers = 32674, __kind = -1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\020\234R<\242\177\000\000`h'<\242\177\000\000\377\377\377\377", '\000' <repeats 19 times>, __align = 140334773476368}, notified = {length = 0, cv = {{cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}}, link = 0x0}, locked = 0, owner = 140334964299520} (gdb) - From this bt pattern it seems in frame 6 it is showing session is held by thread __owner = 1009215584 and there is no thread exist in this core full bt pattern of this thread id. - It seems it is a garbage value which is passed by ceph code that is why it is crashing
Verified : RBD sanity + Qemu Regression runs.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0721.html