Red Hat Bugzilla – Bug 501138
Tracker: bugs introduced by migration to NSS
Last modified: 2010-06-14 11:30:08 EDT
Description of problem:
The curl package is broken by migration to NSS. This bug consolidates all bugs need to be resolved to make curl working.
Version-Release number of selected component (if applicable):
All current Fedora releases are affected.
Users need to compile OpenSSL based curl by hand.
NSS based curl fully equivalent to OpenSSL based curl
There are patches for almost all bugs. They just need to be reviewed and applied.
Created attachment 344307 [details]
patch using UnloadUserModule()/DestroyModule() to free internal objects
The attached patch makes curl leak-less in most cases. The PEM module is loaded on new connection and unloaded/destroyed on connection close. Unloading the module is now AFAIK the only way to free internal objects (now means Bug 501080 is resolved). I've conducted some testing against curl/nss with all proposed patches applied. It gives me zero memory leaks when using CA certificate only.
Rob, do you think the patch could break anything? Do we have another way to free internal objects?
Well, I would contend that this bug is a bit misleading since curl is not completely broken with the NSS/pem code and most users do in fact not recompile with OpenSSL (or if they are they aren't filing bugs against curl to this effect).
As to freeing objects, it depends. Why do you need to free them, because you're done with the certificate/key? I suppose a good enhancement would be to run through the list of certs and keys when trying to add new ones to be sure it isn't already loaded. I doubt it does this currently.
(In reply to comment #2)
> Well, I would contend that this bug is a bit misleading since curl is not
> completely broken with the NSS/pem code and most users do in fact not recompile
> with OpenSSL (or if they are they aren't filing bugs against curl to this
I didn't say "completely" :-) It is sufficient for one-shot applications. But each application performing SSL transfers in a loop crashes on memory exhaustion sooner or later. I am not talking now about memory leaks on application's exit, it's minor problem. I am talking about the fast memory allocation during application's run-time (IMO caused primarily by the bug 501118).
> As to freeing objects, it depends. Why do you need to free them, because you're
> done with the certificate/key? I suppose a good enhancement would be to run
> through the list of certs and keys when trying to add new ones to be sure it
> isn't already loaded. I doubt it does this currently.
I can't see any code freeing internal objects and/or the global array for pointers to them in the current version of PEM. Is it intentionally? I think the patch attached to bug 501080 does the right think anyway. I was only not sure with the patch for curl for loading/unloading the module per each connection. Do we have another way to free certificates which are no more used?
I think the right thing to do would be to remove the token. That should trigger all the appropriate PKCS#11 calls to free objects. Not sure it will actually do the rights thing though.
Does it mean the test case in bug 501080 is missing something? It gives me zero memory leaks with the patch applied. (not counting undestroyed mutex resolved by bug 501058 or by adding params="" to configstring)
I've prepared scratch builds of leak-less curl/nss so it can be easily tested together. Note the bug fix for #501191 is not included yet.
This tracking bug depends on 500636. It looks like a circular dependency to me. I think 500636, which is a libcurl bug and refers to nss work which this bug tracks, should depend on this bug instead. What do you think?
Yes, it makes sense. Fixing now...
Shall we make scratch builds which include the 501191 patches as well?
It depends on how long will it take to release the update. Since I have no feedback for the current scratch builds yet, I am in no hurry to create a new one, but feel free to build it :-)
Just for summary. All nss' patches are reviewed already and now waiting for Rob's r+, am I right?
Do you have the SRPMs for your scratch builds somewhere koji's garbage collected the scratch builds and I missed seeing this bug to test before.
Created attachment 346105 [details]
spec files and patches used for scratch builds
It sounds like good enough reason to create new scratch builds (including the bug fixes which came afterwards):
The SRPMs are too big to attach, so I am only attaching the spec files and patches used for them. Maybe I can place the SRPMs somewhere on web if I find a proper place for them. Note there is still present a SIGSEGV bug while using a client certificate stored on a smartcard. I'll fix it later this week:
Any feedback is welcome. Thanks in advance!
The packages work for a simple C test case but fails with a simple pycurl test case after one successful iteration with the following traceback:
[badger@clingman curl]$ ./break-pycurl.py
Traceback (most recent call last):
File "./break-pycurl.py", line 12, in <module>
pycurl.error: (60, 'Peer certificate cannot be authenticated with known CA certificates')
This is the same symptom as https://bugzilla.redhat.com/show_bug.cgi?id=488791
Created attachment 346129 [details]
Simple pycurl test case
Toshio, thanks for testing it! I can confirm your results with the py test case. I'll try to solve it properly this time. Meanwhile I suggest you using the new (testing) build of nss with the stable build of curl. The test passes in that case and the memory leak situation should be much better.
Thanks Kamil! I can confirm that the new build of nss and current stable build of curl works and leaks *much* less than before.
Created attachment 346145 [details]
more efficient test case
Well, the patch for curl was a really bad idea. It is not thread-safe enough, that's why pycurl is broken by the patch. We need to implement a thread-safe way to free PEM internal objects and maybe also reduce their count if possible (as Rob proposed).
I am attaching a test case which gives me better performance since it allows curl to reuse already opened connections. It also appends a call of the global_cleanup(). The test case gives me not so bad memory leak summary (per 2000 connections) with valgrind:
definitely lost: 120 bytes in 2 blocks.
possibly lost: 13,872 bytes in 49 blocks.
still reachable: 936,942 bytes in 478 blocks.
suppressed: 0 bytes in 0 blocks.
Note it also reports 575 occurrences of invalid memory access, but it seems to be caused by the py binding. The py equipment can be a bit out-dated on my box.
Yeah, reusing the curl object can be nice. I coded my test case to use multiple curl objects, however, because I'm using this in a library which is called by a multithreaded app. Keeping a single curl object to do all the requests would necessitate locking around the object to prevent c.perform() being called in one thread while a different thread is changing the parameters sent via c.setopt(). Instead, the library creates a curl object whenever it needs to make a request. Less efficient but simpler code for this application.
Created attachment 346271 [details]
the original Toshio's test including c.close() and pycurl.global_cleanup()
In that case the original test case will be closer to reality. I am only changing it to call c.close() and pycurl.global_cleanup(). It gives me the same leak summary on application exit.
I know it's still a bit hungry on memory during the run time, but it will take same additional time to fix it. So I'll ask NSS guys to release the update including only these 6 bug fixes for now.
Toshio, ready to test yet another improvement of pem/nss?
Looks good. All the test cases I've tried have worked. Memory usage stays constant and https requests are being made as expected.
Toshio, thanks for testing it!
The patch review process will follow at bug 509705 if anybody interested. I am sure this is not the last patch for pem/nss from me. I can still see a lot of opened questions there.
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '10'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 10's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 10 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.