Description of problem: Looks like http://tracker.ceph.com/issues/6480 *** Error in `/usr/libexec/qemu-kvm': invalid fastbin entry (free): 0x00007fe37806cde0 *** Program terminated with signal 6, Aborted. #0 0x00007fe52d47c5d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007fe52d47c5d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fe52d47dcc8 in __GI_abort () at abort.c:90 #2 0x00007fe52d4bce07 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fe52d5c58c8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007fe52d4c41fd in malloc_printerr (ptr=<optimized out>, str=0x7fe52d5c3001 "invalid fastbin entry (free)", action=3) at malloc.c:4972 #4 _int_free (av=0x7fe378000020, p=<optimized out>, have_lock=0) at malloc.c:3804 #5 0x00007fe537a722b0 in PK11_GetBestSlotMultipleWithAttributes (type=type@entry=0x7fe3d20d6168, mechanismInfoFlags=mechanismInfoFlags@entry=0x0, keySize=keySize@entry=0x0, mech_count=mech_count@entry=1, wincx=0x0) at pk11slot.c:2119 #6 0x00007fe537a7233f in PK11_GetBestSlot (type=4229, wincx=<optimized out>) at pk11slot.c:2142 #7 0x00007fe53178b4bc in nss_aes_operation (op=260, secret=..., in=..., out=..., error="") at auth/Crypto.cc:110 #8 0x00007fe53178a220 in CryptoKey::encrypt (this=this@entry=0x7fe3780750e8, cct=cct@entry=0x7fe539a25930, in=..., out=..., error="") at auth/Crypto.cc:358 #9 0x00007fe531782f8f in encode_encrypt_enc_bl<ceph::buffer::list> (error="", out=..., key=..., t=..., cct=0x7fe539a25930) at auth/cephx/CephxProtocol.h:465 #10 encode_encrypt<ceph::buffer::list> (cct=0x7fe539a25930, t=..., key=..., out=..., error="") at auth/cephx/CephxProtocol.h:490 #11 0x00007fe53178241e in CephxSessionHandler::sign_message (this=0x7fe3780750d0, m=0x7fe1900c94f0) at auth/cephx/CephxSessionHandler.cc:48 #12 0x00007fe53171b036 in Pipe::writer (this=0x7fe1900673d0) at msg/simple/Pipe.cc:1812 #13 0x00007fe5317273fd in Pipe::Writer::entry (this=<optimized out>) at msg/simple/Pipe.h:62 #14 0x00007fe536f95df5 in start_thread (arg=0x7fe3d20d7700) at pthread_create.c:308 #15 0x00007fe52d53d1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Version-Release number of selected component (if applicable): ceph-common-0.94.1-13.el7cp.x86_64 How reproducible: Intermittent, but seems to repeat in the same instances under Openstack rather than affect all instances. Additional info: This looks like a race and may indicate a threading issue with libnss but that has not been positively identified as yet. https://github.com/ceph/ceph/commit/973cd1c00a7811e95ff0406a90386f6ead5491c4 is an optimization for Infernalis which should stop this issue being seen, backporting it may be a solution.
For the record, the patches Josh cherry-picked for this issue are: auth: return error code from encrypt/decrypt; make error string optional auth: optimize crypto++ key context auth/Crypto: optimize libnss key auth: refactor crypto key context auth/cephx: optimize signature check auth/cephx: move signature calc into helper auth/Crypto: avoid memcpy on libnss crypto operation auth: make CryptoHandler implementations totally private which are part of https://github.com/ceph/ceph/pull/3896/commits Let's file an upstream ticket to ensure these get backported to Hammer upstream as well.
(In reply to Ken Dreyer (Red Hat) from comment #12) > Let's file an upstream ticket to ensure these get backported to Hammer > upstream as well. http://tracker.ceph.com/issues/6480 attached under "External Trackers"
Ubuntu build with this patch is ceph_0.94.3.3-1redhat1trusty
(In reply to Ken Dreyer (Red Hat) from comment #22) > Ubuntu build with this patch is ceph_0.94.3.3-1redhat1trusty I had to bump the version number, so it's ceph_0.94.3.3-2redhat1trusty
Marking this Bug as Verified as this was tested part of 1.3.1 Async Release.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:0133
I have checked the errata and issue is fixed in version : ceph-0.94.3-6.el7cp