Bug 1327540

Summary: qemu-kvm crashes with double free or corruption in cephx code after hotfix in bz1296722
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vikhyat Umrao <vumrao>
Component: RADOSAssignee: Ali Maredia <amaredia>
Status: CLOSED ERRATA QA Contact: Vasu Kulkarni <vakulkar>
Severity: high Docs Contact:
Priority: high    
Version: 1.3.2CC: bhubbard, ceph-eng-bugs, chhudson, dzafman, flucifre, jbiao, kchai, kdreyer, sjust, tganguly, vakulkar
Target Milestone: rc   
Target Release: 1.3.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHEL: ceph-0.94.5-12.el7cp Ubuntu: ceph_0.94.5-6redhat1trusty Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-06 18:40:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vikhyat Umrao 2016-04-15 10:37:51 UTC
Description of problem:
qemu-kvm crashes with double free or corruption in cephx code after hotfix in bz1296722


(gdb) bt
#0  0x00007fa2519a05d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fa2519a1cc8 in __GI_abort () at abort.c:90
#2  0x00007fa2519e0e07 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fa251ae98c8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007fa2519e81fd in malloc_printerr (ptr=<optimized out>, str=0x7fa251ae99a0 "double free or corruption (!prev)", action=3) at malloc.c:4972
#4  _int_free (av=0x7fa251d25760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3804
#5  0x00007fa25bf7a424 in PK11_DestroyContext (context=0x7fa23c594870, freeit=1) at pk11cxt.c:68
#6  0x00007fa255cb0b7f in nss_aes_operation (op=op@entry=261, mechanism=<optimized out>, key=<optimized out>, param=<optimized out>, in=..., out=..., error=0x7fa247b23c30) at auth/Crypto.cc:246
#7  0x00007fa255cb163a in CryptoAESKeyHandler::decrypt (this=<optimized out>, in=..., out=..., error=<optimized out>) at auth/Crypto.cc:320
#8  0x00007fa255ca18cc in decrypt (cct=0x7fa247b23c30, error=0x7fa247b23c30, out=..., in=..., this=0x7fa247b23960) at auth/Crypto.h:114
#9  decode_decrypt_enc_bl<ceph::buffer::list> (cct=cct@entry=0x7fa25e00f930, t=..., key=..., bl_enc=..., error="") at auth/cephx/CephxProtocol.h:436
#10 0x00007fa255ca2160 in decode_decrypt<ceph::buffer::list> (cct=0x7fa25e00f930, t=..., key=..., iter=..., error="") at auth/cephx/CephxProtocol.h:474
#11 0x00007fa255c9c0ac in CephXTicketHandler::verify_service_ticket_reply (this=this@entry=0x7fa23c001d98, secret=..., indata=...) at auth/cephx/CephxProtocol.cc:162
#12 0x00007fa255c9db9b in CephXTicketManager::verify_service_ticket_reply (this=this@entry=0x7fa23c001a00, secret=..., indata=...) at auth/cephx/CephxProtocol.cc:276
#13 0x00007fa255c91d11 in CephxClientHandler::handle_response (this=0x7fa23c001950, ret=<optimized out>, indata=...) at auth/cephx/CephxClientHandler.cc:118
#14 0x00007fa255b2f3d1 in MonClient::handle_auth (this=this@entry=0x7fa25e015610, m=m@entry=0x7f9e24d1ad50) at mon/MonClient.cc:507
#15 0x00007fa255b312e9 in MonClient::ms_dispatch (this=0x7fa25e015610, m=0x7f9e24d1ad50) at mon/MonClient.cc:281
#16 0x00007fa255c2760a in ms_deliver_dispatch (m=0x7f9e24d1ad50, this=0x7fa25e02efd0) at msg/Messenger.h:567
#17 DispatchQueue::entry (this=0x7fa25e02f198) at msg/simple/DispatchQueue.cc:185
#18 0x00007fa255c5525d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/simple/DispatchQueue.h:103
#19 0x00007fa25b4bfdf5 in start_thread (arg=0x7fa247b25700) at pthread_create.c:308
#20 0x00007fa251a611ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113



Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 1.3 with hotfix in bz1296722
ceph-common-0.94.1-19.el7cp.0.hotfix.bz1296722.x86_64

Comment 2 Vikhyat Umrao 2016-04-15 10:40:26 UTC
(gdb) f 6
#6  0x00007fa255cb0b7f in nss_aes_operation (op=op@entry=261, mechanism=<optimized out>, key=<optimized out>, param=<optimized out>, in=..., out=..., error=0x7fa247b23c30) at auth/Crypto.cc:246
246	    PK11_DestroyContext(ectx, PR_TRUE);

(gdb) p *ectx
$9 = {operation = 0, key = 0x7fa23c11b9a0, slot = 0x7fa25e0258a0, session = 19383812, sessionLock = 0x7fa23c526bc0, ownSession = 1, cx = 0x0, savedData = 0x0, savedLength = 140334768384896, 
  param = 0x7fa23c011c30, init = 0, type = 4229, fortezzaHack = 0}

(gdb) p *ectx->sessionLock
$10 = {mutex = {__data = {__lock = 1012046864, __count = 32674, __owner = 1009215584, __nusers = 32674, __kind = -1, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = "\020\234R<\242\177\000\000`h'<\242\177\000\000\377\377\377\377", '\000' <repeats 19 times>, __align = 140334773476368}, notified = {length = 0, cv = {{cv = 0x0, times = 0}, {cv = 0x0, times = 0}, 
      {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}, {cv = 0x0, times = 0}}, link = 0x0}, locked = 0, owner = 140334964299520}
(gdb) 

- From this bt pattern it seems in frame 6 it is showing session is held by thread __owner = 1009215584 and there is no thread exist in this core full bt pattern of this thread id.

- It seems it is a garbage value which is passed by ceph code that is why it is crashing

Comment 36 Vasu Kulkarni 2016-04-28 17:47:24 UTC
Verified : RBD sanity + Qemu Regression runs.

Comment 40 errata-xmlrpc 2016-05-06 18:40:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0721.html