Description of problem: http://permalink.gmane.org/gmane.comp.web.curl.library/25764 Version-Release number of selected component (if applicable): nss-3.12.4-3.fc11.i586 How reproducible: occasionally, depends also on count of threads Actual results: Invalid write within the PEM module probably causes SEGFAULT within SSL_ForceHandshake(). Expected results: No SEGFAULT, no invalid writes. Additional info: Formerly reported at https://bugzilla.redhat.com/504257#c7
The invalid writes come from the code added by attachment #344296 [details] - one of the patches for bug #501080.
Here is a scratch build of NSS with the mentioned patch partially reverted. It should not trigger the invalid writes: http://koji.fedoraproject.org/koji/taskinfo?taskID=1798840 Unfortunately it brings back some memory leaks fixed before. So it looks like we need to free the memory somewhere else. For now I am not sure if the invalid writes are actually related to the SEGFAULT or not.
As for the memory leaks, they seem to be real. It allocates the memory three times per one call of ssl3_SignHashes() while storing the nss_ZAlloc() return value to the same pointer (the same field in the same internal object): #1 0x00007ffff7dc9792 in pem_PopulateModulusExponent (io=...) at prsa.c:165 #2 0x00007ffff7dc767f in pem_FetchPrivKeyAttribute (io=..., type=...) at pobject.c:318 #3 0x00007ffff7dc7f2a in pem_FetchAttribute (io=..., type=...) at pobject.c:541 #4 0x00007ffff7dc86c2 in pem_mdObject_GetAttributeSize (mdObject=..., fwObject=..., mdSession=..., fwSession=..., mdToken=..., fwToken=..., mdInstance=..., fwInstance=..., attribute=..., pError=...) at pobject.c:830 #5 0x00007ffff7dce14c in nssCKFWObject_GetAttributeSize (fwObject=..., attribute=..., pError=...) at object.c:581 #6 0x00007ffff7dd782a in NSSCKFWC_GetAttributeValue (fwInstance=..., hSession=..., hObject=..., pTemplate=..., ulCount=...) at wrap.c:2322 #7 0x00007ffff7dc424a in pemC_GetAttributeValue (hSession=..., hObject=..., pTemplate=..., ulCount=...) at ../../../../../dist/public/nss/nssck.api:666 #8 0x00007ffff6f0d5a6 in PK11_GetAttributes (arena=..., slot=..., obj=..., attr=..., count=...) at pk11obj.c:225 #9 0x00007ffff6eefc2f in PK11_GetPrivateModulusLen (key=...) at pk11akey.c:840 #10 0x00007ffff6f0e19b in PK11_SignatureLen (key=...) at pk11obj.c:562 #11 0x00007ffff7498316 in ssl3_SignHashes (hash=..., key=..., buf=..., isTLS=...) at ssl3con.c:817 #1 0x00007ffff7dc9bbe in pem_mdCryptoOperationRSAPriv_Create (proto=..., mdMechanism=..., mdKey=..., pError=...) at prsa.c:283 #2 0x00007ffff7dca05c in pem_mdMechanismRSA_SignInit (mdMechanism=..., fwMechanism=..., pMechanism=..., mdSession=..., fwSession=..., mdToken=..., fwToken=..., mdInstance=..., fwInstance=..., mdKey=..., fwKey=..., pError=...) at prsa.c:601 #3 0x00007ffff7ddc95c in nssCKFWMechanism_SignInit (fwMechanism=..., pMechanism=..., fwSession=..., fwObject=...) at mechanism.c:659 #4 0x00007ffff7dd95e2 in NSSCKFWC_SignInit (fwInstance=..., hSession=..., pMechanism=..., hKey=...) at wrap.c:3749 #5 0x00007ffff7dc46b0 in pemC_SignInit (hSession=..., pMechanism=..., hKey=...) at ../../../../../dist/public/nss/nssck.api:1115 #6 0x00007ffff6f0e913 in PK11_Sign (key=..., sig=..., hash=...) at pk11obj.c:760 #7 0x00007ffff74983f3 in ssl3_SignHashes (hash=..., key=..., buf=..., isTLS=...) at ssl3con.c:851 #1 0x00007ffff7dc9792 in pem_PopulateModulusExponent (io=...) at prsa.c:165 #2 0x00007ffff7dc767f in pem_FetchPrivKeyAttribute (io=..., type=...) at pobject.c:318 #3 0x00007ffff7dc7f2a in pem_FetchAttribute (io=..., type=...) at pobject.c:541 #4 0x00007ffff7dc9d69 in pem_mdCryptoOperationRSA_GetFinalLength (mdOperation=..., fwOperation=..., mdSession=..., fwSession=..., mdToken=..., fwToken=..., mdInstance=..., fwInstance=..., pError=...) at prsa.c:353 #5 0x00007ffff7ddd8ef in nssCKFWCryptoOperation_GetFinalLength (fwOperation=..., pError=...) at crypto.c:178 #6 0x00007ffff7dd1013 in nssCKFWSession_UpdateFinal (fwSession=..., type=..., state=..., inBuf=..., inBufLen=..., outBuf=..., outBufLen=...) at session.c:2219 #7 0x00007ffff7dd9781 in NSSCKFWC_Sign (fwInstance=..., hSession=..., pData=..., ulDataLen=..., pSignature=..., pulSignatureLen=...) at wrap.c:3819 #8 0x00007ffff7dc4700 in pemC_Sign (hSession=..., pData=..., ulDataLen=..., pSignature=..., pulSignatureLen=...) at ../../../../../dist/public/nss/nssck.api:1141 #9 0x00007ffff6f0e9b7 in PK11_Sign (key=..., sig=..., hash=...) at pk11obj.c:768 #10 0x00007ffff74983f3 in ssl3_SignHashes (hash=..., key=..., buf=..., isTLS=...) at ssl3con.c:851
(In reply to comment #3) Oops, once again, now for humans... #1 pem_PopulateModulusExponent at prsa.c:165 #2 pem_FetchPrivKeyAttribute at pobject.c:318 #3 pem_FetchAttribute at pobject.c:541 #4 pem_mdObject_GetAttributeSize at pobject.c:830 #5 nssCKFWObject_GetAttributeSize at object.c:581 #6 NSSCKFWC_GetAttributeValue at wrap.c:2322 #7 pemC_GetAttributeValue at ... #8 PK11_GetAttributes at pk11obj.c:225 #9 PK11_GetPrivateModulusLen at pk11akey.c:840 #10 PK11_SignatureLen at pk11obj.c:562 #11 ssl3_SignHashes at ssl3con.c:817 #1 pem_mdCryptoOperationRSAPriv_Create at prsa.c:283 #2 pem_mdMechanismRSA_SignInit at prsa.c:601 #3 nssCKFWMechanism_SignInit at mechanism.c:659 #4 NSSCKFWC_SignInit at wrap.c:3749 #5 pemC_SignInit at ... #6 PK11_Sign at pk11obj.c:760 #7 ssl3_SignHashes at ssl3con.c:851 #1 pem_PopulateModulusExponent at prsa.c:165 #2 pem_FetchPrivKeyAttribute at pobject.c:318 #3 pem_FetchAttribute at pobject.c:541 #4 pem_mdCryptoOperationRSA_GetFinalLength at prsa.c:353 #5 nssCKFWCryptoOperation_GetFinalLength at crypto.c:178 #6 nssCKFWSession_UpdateFinal at session.c:2219 #7 NSSCKFWC_Sign at wrap.c:3819 #8 pemC_Sign at ... #9 PK11_Sign at pk11obj.c:768 #10 ssl3_SignHashes at ssl3con.c:851
I ran this build under valgrind, the invalid writes seem to be gone. However I still saw the core dump and a valgrind invalid read error: ==10352== Thread 60: ==10352== Invalid read of size 4 ==10352== at 0x47F8BA1: ssl_DefRecv (ssldef.c:91) ==10352== by 0x47F38C5: ssl3_GatherCompleteHandshake (ssl3gthr.c:90) ==10352== by 0x47F64BA: ssl_GatherRecord1stHandshake (sslcon.c:1258) ==10352== by 0x47FC6F4: ssl_Do1stHandshake (sslsecur.c:151) ==10352== by 0x47FDDA6: SSL_ForceHandshake (sslsecur.c:407) ==10352== by 0x47FDE76: SSL_ForceHandshakeWithTimeout (sslsecur.c:428) ==10352== by 0x437BBC7: Curl_nss_connect (nss.c:1214) ==10352== by 0x4372181: Curl_ssl_connect (sslgen.c:185) ==10352== by 0x43514F8: Curl_http_connect (http.c:1792) ==10352== by 0x43589AD: Curl_protocol_connect (url.c:3056) ==10352== by 0x435DC7E: Curl_connect (url.c:4690) ==10352== by 0x4366A03: Curl_perform (transfer.c:2481) ==10352== Address 0x8 is not stack'd, malloc'd or (recently) free'd ==10352== ==10352== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==10352== Access not within mapped region at address 0x8 ==10352== at 0x47F8BA1: ssl_DefRecv (ssldef.c:91) ==10352== by 0x47F38C5: ssl3_GatherCompleteHandshake (ssl3gthr.c:90) ==10352== by 0x47F64BA: ssl_GatherRecord1stHandshake (sslcon.c:1258) ==10352== by 0x47FC6F4: ssl_Do1stHandshake (sslsecur.c:151) ==10352== by 0x47FDDA6: SSL_ForceHandshake (sslsecur.c:407) ==10352== by 0x47FDE76: SSL_ForceHandshakeWithTimeout (sslsecur.c:428) ==10352== by 0x437BBC7: Curl_nss_connect (nss.c:1214) ==10352== by 0x4372181: Curl_ssl_connect (sslgen.c:185) ==10352== by 0x43514F8: Curl_http_connect (http.c:1792) ==10352== by 0x43589AD: Curl_protocol_connect (url.c:3056) ==10352== by 0x435DC7E: Curl_connect (url.c:4690) ==10352== by 0x4366A03: Curl_perform (transfer.c:2481)
(In reply to comment #5) > I ran this build under valgrind, the invalid writes seem to be gone. However I > still saw the core dump and a valgrind invalid read error: I've just opened a separate bug for the SIGSEGV issue: https://bugzilla.redhat.com/534176#c1
I can reproduce this with the code that contains all of our fixes previously. ==19026== Thread 32: ==19026== Invalid write of size 1 ==19026== at 0x4028B8F: memcpy (mc_replace_strmem.c:402) ==19026== by 0x721C3AA: nsslibc_memcpy (string3.h:52) ==19026== by 0x720C728: pem_PopulateModulusExponent (prsa.c:211) ==19026== by 0x720BB86: pem_FetchPrivKeyAttribute (pobject.c:318) ==19026== by 0x720BF5F: pem_FetchAttribute (pobject.c:541) ==19026== by 0x720C199: pem_mdCryptoOperationRSA_GetFinalLength (prsa.c:352) ==19026== by 0x721AA5C: nssCKFWCryptoOperation_GetFinalLength (crypto.c:178) ==19026== by 0x72102D9: nssCKFWSession_UpdateFinal (session.c:2219) ==19026== by 0x72156DB: NSSCKFWC_Sign (wrap.c:3819) ==19026== by 0x7207E92: pemC_Sign (nssck.api:1141) ==19026== by 0x4885D4D: PK11_Sign (pk11obj.c:768) ==19026== by 0x47F9BF1: ssl3_SignHashes (ssl3con.c:851) ==19026== Address 0x770ea4f is 39 bytes inside a block of size 72 free'd ==19026== at 0x4026BCA: free (vg_replace_malloc.c:323) ==19026== by 0x49A1D66: PR_Free (in /lib/libnspr4.so) ==19026== by 0x721B77E: nss_ZFreeIf (arena.c:975) ==19026== by 0x720C6F6: pem_PopulateModulusExponent (prsa.c:207) ==19026== by 0x720BB86: pem_FetchPrivKeyAttribute (pobject.c:318) ==19026== by 0x720BF5F: pem_FetchAttribute (pobject.c:541) ==19026== by 0x720C199: pem_mdCryptoOperationRSA_GetFinalLength (prsa.c:352) ==19026== by 0x721AA5C: nssCKFWCryptoOperation_GetFinalLength (crypto.c:178) ==19026== by 0x72102D9: nssCKFWSession_UpdateFinal (session.c:2219) ==19026== by 0x72156DB: NSSCKFWC_Sign (wrap.c:3819) ==19026== by 0x7207E92: pemC_Sign (nssck.api:1141) ==19026== by 0x4885D4D: PK11_Sign (pk11obj.c:768) ==19026== Let me know what else I can do to help.
The code looks sane: nss_ZFreeIf(io->u.key.key.exponent2.data); io->u.key.key.exponent2.data = (void *) nss_ZAlloc(NULL, lpk->u.rsa.exponent2.len); io->u.key.key.exponent2.size = lpk->u.rsa.exponent2.len; nsslibc_memcpy(io->u.key.key.exponent2.data, lpk->u.rsa.exponent2.data, lpk->u.rsa.exponent2.len); It seems to free, allocate, then copy into the new buffer. Kamil, do you see anything wrong here?
(In reply to comment #7) > I can reproduce this with the code that contains all of our fixes previously. It does not surprise me. That's actually why the bug is still opened. Something wrong is still going on here... > ==19026== Thread 32: > ==19026== Invalid write of size 1 Does the invalid write cause SIGSEGV and/or malfunction? > Let me know what else I can do to help. A minimal example creating that threads and causing the failure would be nice, though it seems like a tough challenge. (In reply to comment #8) > The code looks sane: > > nss_ZFreeIf(io->u.key.key.exponent2.data); I've added the line above - look at the attachment #344296 [details]. If I remove the line, it will cause a memory leak instead, which I already did in the scratch build from comment #2. > It seems to free, allocate, then copy into the new buffer. > > Kamil, do you see anything wrong here? There is basically nothing wrong in the code you listed. However Rob thinks the scenario "free, allocate" should never happen (bug #501080 comment #6). I too have a hard time to see the backtrace from comment #3. Rob, any idea?
What I meant was: there isn't a way to re-load a key so there should be nothing to free, it's a no-op. Shouldn't hurt anything though.
Rob, you don't understand me. I am saying it is *not* no-op. I've traced the way to re-load key with gdb and captured as the backtraces in comment #3 (and more readable in comment #4). I know it *is* reloaded, but only can't explain why... And let me note the so called "no-op" causes invalid write on the key reload :-)
> Does the invalid write cause SIGSEGV and/or malfunction? No it does not but it seems scary to me. I'm more than happy to help you reproduce and debug. I'll have a look at the code more and see if I can find exactly what is going on.
Just to clarify, the weird "key reloading" is 100% reproducible (no needs for threading). The invalid writes occur occasionally. So I would start with explaining/fixing the mentioned reallocation, which is (per Rob's reply) not supposed to be done ... and perhaps solving that may fix the bug as well.
Thanks Kamil. I don't know the NSS code very well so if you have some suggestions I'd be more than happy to help.
> I don't know the NSS code very well... Nor do I. That's why I CC'd the two gurus actually :-) Let's give them some time and I believe they'll bring some light on that.
> Nor do I. That's why I CC'd the two gurus actually :-) Let's give them some > time and I believe they'll bring some light on that. Sounds great Kamil. Let me know if/how I can help.
What this code does is takes the DER-encoded private key (lpk) and pulls out its components and copies them into the object passed in (io). I guess my assumption was that for a signing operation you'd do this once and the same object would be used more than once, rather than re-cracking the key over and over. The code preceeding this in pobject.c is: case CKA_MODULUS: if (0 == kp->modulus.size) { pem_PopulateModulusExponent(io); } So there is no modulus set in the object, I wonder if it is safe to assume the rest of the key is empty in the object as well (which is why I wondered why we needed the ZFreeIf()).
> The code preceeding this in pobject.c is: > > case CKA_MODULUS: > if (0 == kp->modulus.size) { > pem_PopulateModulusExponent(io); > } However that's not the only way to reallocate the data. Look once again at that backtrace. The second allocation does not occur in pem_PopulateModulusExponent() at all. It happens in pem_mdCryptoOperationRSAPriv_Create() instead ... and yes on the same internal object and on the same field inside. Any idea?
Ok, pushing the boundaries of my understanding here but I note a few odd things: - The modulus is copied in pem_mdCryptoOperationRSAPriv_Create() before decoding the key and the other parts of the key aren't set (the primes and exponents). - Not sure why lpk is re-created in pem_PopulateModulusExponent as it should be cached in the object assuming it was created by pem_mdCryptoOperationRSAPriv_Create() Seems like one could call pem_PopulateModulusExponent() from pem_mdCryptoOperationRSAPriv_Create()
I am switching the version back to rawhide to protect the bug from upcoming bug zapper, and reassigning back to NSS maintainers since they're going to care about the PEM module now.
upstream bug https://bugzilla.mozilla.org/show_bug.cgi?id=402712
(In reply to comment #21) > upstream bug https://bugzilla.mozilla.org/show_bug.cgi?id=402712 Please ignore the link above, I've mistakenly placed it here. Sorry.
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle. Changing version to '14'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
(In reply to comment #19) > Ok, pushing the boundaries of my understanding here but I note a few odd > things: > > - The modulus is copied in pem_mdCryptoOperationRSAPriv_Create() before > decoding the key and the other parts of the key aren't set (the primes and > exponents). > - Not sure why lpk is re-created in pem_PopulateModulusExponent as it should be > cached in the object assuming it was created by > pem_mdCryptoOperationRSAPriv_Create() I see that pem_FetchPubKeyAttribute and pem_FetchPrivKeyAttribute callpem_mdCryptoOperationRSAPriv_Create when they test their ineternal object and fiend that pieces haven't been set. pem_PopulateModulusExponent can fail (due mem alloc failure) that is ignored. > > Seems like one could call pem_PopulateModulusExponent() from > pem_mdCryptoOperationRSAPriv_Create() If we do that notice that pem_PopulateModulusExponent frees the lpk before returning whereas mdCryptoOperationRSAPriv_Create saves it. I'm quite confused.
What confuses me is looking at the calling chain(s) leading to pem_PopulateModulusExponent. Wouldn't it be nice if we could assume that pem_mdCryptoOperationRSAPriv_Create() is the first trigger of pem_PopulateModulusExponent? Cache it and not having to recreate it. The problem is that pem_Initilailize takes parameters and that will trigger loading certs and keys into memory and into the internal global objects list. So it's gets called even before any client tries to start a crypto operation.
Yes, my question as stated is a bit confusing. Let me restate it. Many external calls can trigger the PEM module to populate the key. Is there one call that we can be sure would be the very first one to cause the populating of the key?
pem_PopluateModulus is a private pem call. A simple 'grep' in the current pem tree will tell you when it's called. According the the patch you've attacked upstream, I only see it called in pem_FetchPubKeyAttribute() and pem_FetchPrivKeyAttribute(). These functions can be called for many top level C_ calls, including C_FindObjects(). It'll be almost assured that one of these functions are called before anyone tries to call pem_mdCryptoOperationRSAPriv_Create(). It's all moot. pem_PopulateModulus frees the key at the end because it is essentially through with it. pem_mdCryptoOperationRSAPriv_Create() creates a private key to use in an RSA operation (Sign or decrypt). It's freed later when you destroy the operation (not that the opertion context inherits it. There may be some advantage to creating the private key once and storing it in the i/o object, but we probably should answer a couple of unclear questions: 1) why is are the components being populated more than once. I can see a race condition where two different threads try to fetch the same key. It is probably worth a lock to prevent it, but Kamel says it happens even in a single threaded environment, which make me wonder if something else is going one. 2) Is pem_PopulateModulus really being called twice for the same object (NOTE: it's always called twice for the same key, once for the private key instance and once for the public key instance --- even though it's backed by the private key in both cases). 3) what could cause the invalid write? Even a race condition seems unlikely lpk is it's own private copy allocated separately for each thread, so the .len can't change. .data could (you switch from the allocation from one thread to allocation on the next thread), that shouldn't cause an invalid write. bob
This message is a notice that Fedora 14 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 14. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At this time, all open bugs with a Fedora 'version' of '14' have been closed as WONTFIX. (Please note: Our normal process is to give advanced warning of this occurring, but we forgot to do that. A thousand apologies.) Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, feel free to reopen this bug and simply change the 'version' to a later Fedora version. Bug Reporter: Thank you for reporting this issue and we are sorry that we were unable to fix it before Fedora 14 reached end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" (top right of this page) and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping