Bug 657254

Summary: imapx backend still doing stuff when camel_shutdown() is called
Product: [Fedora] Fedora Reporter: ritz <rkhadgar>
Component: evolutionAssignee: Matthew Barnes <mbarnes>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 14CC: dcbw, djuran, emaldona, kdudka, kengert, lucilanga, mamers.sdtb, mbarnes, mcrha, nicolas.mailhot
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 06:18:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Milan Crha 2010-11-25 10:50:18 UTC
Same/similar as bug #655327? I do not think this is evolution issue, unless some memory corruption caused by it. Thus I moved the other bug to nss itself, same as I would move this one.

Comment 2 Milan Crha 2010-12-15 09:39:47 UTC
*** Bug 663215 has been marked as a duplicate of this bug. ***

Comment 3 Milan Crha 2010-12-15 09:40:29 UTC
The above duplicate contains nice investigation on the crash, which might be useful for nss developers.

Comment 4 mario rathinho 2011-01-08 22:30:19 UTC
Package: evolution-2.91.4-1.fc15
Architecture: i686
OS Release: Fedora release 15 (Rawhide)


How to reproduce
-----
1. Just Quit evolution
2.  
3.


Comment
-----
Just quit evolution and after these step a crash report was generated.

Comment 5 Dan Williams 2011-01-10 20:12:28 UTC
I'm still getting this every time I quit evolution.  I ran under valgrind:

==12257== Invalid read of size 8
==12257==    at 0x335F4341D9: pk11_Finalize (pk11cxt.c:957)
==12257==    by 0x335F4350EC: PK11_DigestBegin (pk11cxt.c:611)
==12257==    by 0x33604098F1: ssl3_ComputeRecordMAC (ssl3con.c:1913)
==12257==    by 0x3360409E1E: ssl3_SendRecord (ssl3con.c:2074)
==12257==    by 0x336040AFAE: ssl3_SendApplicationData (ssl3con.c:2357)
==12257==    by 0x336041E9A9: ssl_SecureSend (sslsecur.c:1241)
==12257==    by 0x3360422711: ssl_Write (sslsock.c:1652)
==12257==    by 0x321405DAD1: write_to_prfd (camel-tcp-stream-raw.c:392)
==12257==    by 0x32104515B9: camel_stream_write (camel-stream.c:166)
==12257==    by 0x32104515B9: camel_stream_write (camel-stream.c:166)
==12257==    by 0x3210451E2A: camel_stream_printf (camel-stream.c:336)
==12257==    by 0x1BA39398: imapx_command_start (camel-imapx-server.c:855)
==12257==  Address 0xc63cdb0 is not stack'd, malloc'd or (recently) free'd
==12257== 
==12257== 
==12257== Process terminating with default action of signal 11 (SIGSEGV)
==12257==  Access not within mapped region at address 0xC63CDB0
==12257==    at 0x335F4341D9: pk11_Finalize (pk11cxt.c:957)
==12257==    by 0x335F4350EC: PK11_DigestBegin (pk11cxt.c:611)
==12257==    by 0x33604098F1: ssl3_ComputeRecordMAC (ssl3con.c:1913)
==12257==    by 0x3360409E1E: ssl3_SendRecord (ssl3con.c:2074)
==12257==    by 0x336040AFAE: ssl3_SendApplicationData (ssl3con.c:2357)
==12257==    by 0x336041E9A9: ssl_SecureSend (sslsecur.c:1241)
==12257==    by 0x3360422711: ssl_Write (sslsock.c:1652)
==12257==    by 0x321405DAD1: write_to_prfd (camel-tcp-stream-raw.c:392)
==12257==    by 0x32104515B9: camel_stream_write (camel-stream.c:166)
==12257==    by 0x32104515B9: camel_stream_write (camel-stream.c:166)
==12257==    by 0x3210451E2A: camel_stream_printf (camel-stream.c:336)
==12257==    by 0x1BA39398: imapx_command_start (camel-imapx-server.c:855)
==12257==  If you believe this happened as a result of a stack
==12257==  overflow in your program's main thread (unlikely but
==12257==  possible), you can try to increase the size of the
==12257==  main thread stack using the --main-stacksize= flag.
==12257==  The main thread stack size used in this run was 8388608.

Comment 6 Dan Williams 2011-01-10 20:18:19 UTC
Valgrind probably covers up some of the errors, but this time the offending line is close to the CipherOp from the original crash:

    case CKA_SIGN:
-->	crv=PK11_GETTAB(context->slot)->C_SignFinal(context->session,
	                                            buffer, &count);
	break;

so it's still trying to access the ops table and crashing because that's not valid...

Comment 7 Dan Williams 2011-01-10 21:47:37 UTC
I believe that the SECMODModule that provides the function list is getting unloaded too early, causing functionList to be a dangling pointer.  I've checked reference counting on the PK11SlotInfo and the slot that contains the functionList is clearly still referenced when the segfault occurs.  That leaves only the actual module that provides the functionList being unloaded as the cause.  To that end, running evolution as such:

NSS_DISABLE_UNLOAD=1 evolution

allows evolution to cleanly exit.

Comment 8 Dan Williams 2011-01-10 21:59:10 UTC
Further investigation appears to indicate the Evolution is calling nss_Shutdown() before it's done with crypto ops, which unloads all modules and thus causes the function table to be bogus for any crypto ops coming later:

#0  SECMOD_UnloadModule (mod=0x6c76f0) at pk11load.c:576
#1  0x00007ffff788bd00 in SECMOD_SlotDestroyModule (module=0x6c76f0, fromSlot=<value optimized out>) at pk11util.c:872
#2  0x00007ffff788c122 in SECMOD_DestroyModuleListElement (element=<value optimized out>) at pk11util.c:889
#3  0x00007ffff788c5a5 in SECMOD_DestroyModuleList (list=<value optimized out>) at pk11util.c:905
#4  0x00007ffff788c645 in SECMOD_Shutdown () at pk11util.c:104
#5  0x00007ffff784dac0 in nss_Shutdown () at nssinit.c:1037
#6  0x0000003210458ea5 in camel_shutdown () at camel.c:248
#7  0x0000003216c3dc09 in mail_backend_ready_to_quit (activity=0xc94a80 [EActivity]) at e-mail-backend.c:229
#8  0x0000003216c3e50c in mail_backend_prepare_for_quit_cb (shell=<value optimized out>, activity=0xc94a80 [EActivity], backend=
    0x64c340 [EMailShellBackend]) at e-mail-backend.c:282
#9  0x000000320c00eace in g_closure_invoke (closure=0xa551e0, return_value=0x0, n_param_values=2, param_values=0xe25690, invocation_hint=
    0x7fffffffcbe0) at gclosure.c:767
#10 0x000000320c02101b in signal_emit_unlocked_R (node=<value optimized out>, detail=0, instance=0x645c40, emission_return=0x0, instance_and_params=
    0xe25690) at gsignal.c:3252
#11 0x000000320c02ab4a in g_signal_emit_valist (instance=<value optimized out>, signal_id=<value optimized out>, detail=<value optimized out>, 
    var_args=0x7fffffffcdd0) at gsignal.c:2983
#12 0x000000320c02acf3 in g_signal_emit (instance=<value optimized out>, signal_id=<value optimized out>, detail=<value optimized out>)
    at gsignal.c:3040
#13 0x0000003215c19ed8 in shell_prepare_for_quit (shell=0x645c40 [EShell]) at e-shell.c:372
#14 0x0000003215c1ae22 in e_shell_quit (shell=0x645c40 [EShell], reason=E_SHELL_QUIT_ACTION) at e-shell.c:1935

Comment 9 Dan Williams 2011-01-10 22:17:41 UTC
Back to evolution...

The imapx backend is still doing stuff in the background when camel_shutdown() is called, which does the NSS cleanup.  After camel_shutdown() anything involving NSS is likely to fail, since camel_shutdown() calls NSS_Shutdown().

Something in evo should be making sure that all outstanding operations are completed before calling camel_shutdown().

Comment 10 Dan Williams 2011-01-10 22:19:30 UTC
*** Bug 655327 has been marked as a duplicate of this bug. ***

Comment 11 Milan Crha 2011-01-11 08:29:30 UTC
Thanks for all the investigation. I was looking on this last week, thinking it might be evo's fault, but I didn't get so far as you (my knowledge of nss is very close to zero). I also can confirm this on imap provider (not imapx), it's just about ssl connection being used.

I'm moving this upstream [1], because I can reproduce it with 2.91.5 as well.

[1] https://bugzilla.gnome.org/show_bug.cgi?id=638808

Comment 12 Milan Crha 2011-01-14 07:28:28 UTC
*** Bug 669335 has been marked as a duplicate of this bug. ***

Comment 13 David Juran 2011-05-18 17:32:18 UTC
reopening since at least abrt believes that this crash happens also with evolution-2.32.2-1 which has the patch from https://bugzilla.gnome.org/show_bug.cgi?id=638808 applied

Comment 14 Milan Crha 2011-05-19 06:18:39 UTC
Thanks for the update. Let's reopen the upstream bug, it's the place where the fix comes from. Please CC there yourself for any further information.