Bug 1688958
Summary: | Memory leak: libcurl leaks 120 bytes on each connection [rhel-7.9.z] | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Dmitry Oksenchuk <oksenchuk> | ||||||
Component: | nss | Assignee: | nss-nspr-maint <nss-nspr-maint> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Alicja Kario <hkario> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.6 | CC: | dconsoli, dueno, hkario, jreznik, kdudka, nmavrogi, nss-nspr-maint, rrelyea, ssorce | ||||||
Target Milestone: | rc | Keywords: | Triaged, ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | nss-3.53.1-3.el7_9 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1802547 (view as bug list) | Environment: | |||||||
Last Closed: | 2020-09-29 21:18:42 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1804262 | ||||||||
Bug Blocks: | 1802547 | ||||||||
Attachments: |
|
Description
Dmitry Oksenchuk
2019-03-14 19:03:16 UTC
Thanks for the analysis! I have not looked at all the details you provided and the corresponding source code. However you are using libcurl-7.29.0-51.el7 together with nss-pem-1.0.3-5.el7. This combination of builds is known to cause severe performance regression: bug #1659108 Could you please try to upgrade to nss-pem-1.0.3-5.el7_6.1, which went out 3 days ago? https://access.redhat.com/errata/RHBA-2019:0500 Thank you Kamil! nss-pem-1.0.3-5.el7_6.1 is not available in CentOS yet. I have tried nss-pem-1.0.4.20181221.153303.gbc177c0.gl_list-1.el7 mentioned in bug #1659108 and it still leaks. I will try to write a small example that reproduces the problem. Created attachment 1544742 [details]
curl_test.cpp: small example that reproduces the issue
Created attachment 1544755 [details]
massif.out.3203.txt: profiling information gathered by Massif for 10000 requests
I attached a small example that reproduces the problem. The code is based on curl_http_transport.cc: https://github.com/google/google-api-cpp-client/blob/master/src/googleapis/client/transport/curl_http_transport.cc. To compile and run the example execute: $ g++ -o curl_test -lcurl -O1 -ggdb -std=c++11 curl_test.cpp $ ./curl_test 10000 # number of https requests Also I attached memory profiling information gathered by Massif. If you compare snapshots at the beginning of the program execution and at the end, you will notice that more and more bytes are allocated by the following functions: ->42.14% (1,337,344B) 0x6AB610B: PL_ArenaAllocate (plarena.c:127) | ->40.07% (1,271,808B) 0xB89A47C: nss_zalloc_arena_locked (in /usr/lib64/libnsspem.so) | | ->40.07% (1,271,808B) 0xB89A56E: nss_ZAlloc (in /usr/lib64/libnsspem.so) | | ->19.36% (614,400B) 0xB88DD5B: nssCKFWMutex_Create (in /usr/lib64/libnsspem.so) | | | ->19.36% (614,400B) 0xB88D567: nssCKFWInstance_CreateMutex (in /usr/lib64/libnsspem.so) | | | ->19.29% (612,352B) 0xB88DF4B: nssCKFWObject_Create (in /usr/lib64/libnsspem.so) | | | | ->19.29% (612,352B) 0xB88F4FE: nssCKFWSession_CreateObject (in /usr/lib64/libnsspem.so) | | | | ->19.29% (612,352B) 0xB894359: NSSCKFWC_CreateObject (in /usr/lib64/libnsspem.so) | | | | ->19.29% (612,352B) 0x65A2210: PK11_CreateNewObject (pk11obj.c:404) | | | | ->19.29% (612,352B) 0x65A40E6: pk11_CreateGenericObjectHelper (pk11obj.c:1642) | | | | ->19.29% (612,352B) 0x4E375DB: nss_create_object.isra.10 (nss.c:405) | | | | ->19.29% (612,352B) 0x4E37697: nss_load_cert (nss.c:457) | | | | ->19.29% (612,352B) 0x4E705BB: nss_connect_common (nss.c:1243) | | | | ->19.29% (612,352B) 0x4E66AEC: Curl_ssl_connect_nonblocking (sslgen.c:229) | | | | ->19.29% (612,352B) 0x4E3DDDB: Curl_http_connect (http.c:1352) | | | | ->19.29% (612,352B) 0x4E4D733: Curl_protocol_connect (url.c:3328) | | | | ->19.29% (612,352B) 0x4E60E59: multi_runsingle (multi.c:1159) | | | | ->19.29% (612,352B) 0x4E613AF: curl_multi_perform (multi.c:1757) | | | | ->19.29% (612,352B) 0x4E58621: curl_easy_perform (easy.c:480) | | | | ->19.29% (612,352B) 0x4016D2: main (curl_test.cpp:60) | | | | | ->19.36% (614,400B) 0xB88DEEE: nssCKFWObject_Create (in /usr/lib64/libnsspem.so) | | | ->19.36% (614,400B) 0xB88F4FE: nssCKFWSession_CreateObject (in /usr/lib64/libnsspem.so) | | | ->19.36% (614,400B) 0xB894359: NSSCKFWC_CreateObject (in /usr/lib64/libnsspem.so) | | | ->19.36% (614,400B) 0x65A2210: PK11_CreateNewObject (pk11obj.c:404) | | | ->19.36% (614,400B) 0x65A40E6: pk11_CreateGenericObjectHelper (pk11obj.c:1642) | | | ->19.36% (614,400B) 0x4E375DB: nss_create_object.isra.10 (nss.c:405) | | | ->19.36% (614,400B) 0x4E37697: nss_load_cert (nss.c:457) | | | ->19.36% (614,400B) 0x4E705BB: nss_connect_common (nss.c:1243) | | | ->19.36% (614,400B) 0x4E66AEC: Curl_ssl_connect_nonblocking (sslgen.c:229) | | | ->19.36% (614,400B) 0x4E3DDDB: Curl_http_connect (http.c:1352) | | | ->19.36% (614,400B) 0x4E4D733: Curl_protocol_connect (url.c:3328) | | | ->19.36% (614,400B) 0x4E60E59: multi_runsingle (multi.c:1159) | | | ->19.36% (614,400B) 0x4E613AF: curl_multi_perform (multi.c:1757) | | | ->19.36% (614,400B) 0x4E58621: curl_easy_perform (easy.c:480) | | | ->19.36% (614,400B) 0x4016D2: main (curl_test.cpp:60) At the end of the program execution 1,271,808 bytes were allocated for 10,000 requests - about 127 bytes per request. I haven't found any workaround. There are three possible fixes in my opinion: 1. Switch to openssl hoping it doesn't leak. 2. Don't use arena in nss-pem. But it may bring a performance degradation. 3. Try to find object before creating one in nss_load_cert(). Every time the object is created for system certificate file, so we definitely can use previously created one. Let me know if you need any additional information regarding this issue. Sorry for the delay! I have been busy with security issues elsewhere... (In reply to Dmitry Oksenchuk from comment #7) > There are three possible fixes in my opinion: > 1. Switch to openssl hoping it doesn't leak. This is now implemented in Fedora 27+ per the following change: https://fedoraproject.org/wiki/Changes/libcurlBackToOpenSSL Unfortunately, this is not an option for RHEL-7. > 2. Don't use arena in nss-pem. But it may bring a performance degradation. Do you have any patch in mind? As I understand it, the leaked memory is not allocated in nss-pem code directly. It is nss code what allocates the mutex, isn't it? > 3. Try to find object before creating one in nss_load_cert(). Every time the > object is created for system certificate file, so we definitely can use > previously created one. This is actually what nss-pem does internally -- see the AddObjectIfNeeded() function, introduced while fixing the following bug: bug #509705 Unfortunately, nss breaks when pem_CreateObject() returns a single address multiple times, because CKFW hashes the addresses returned out of the C_CreateObject() callback. So, to work around this limitation, nss-pem pretends object creation even when it internally re-uses an already existing object. This is tracked as the following bug: bug #733685 (In reply to Kamil Dudka from comment #8) > As I understand it, the leaked memory is not allocated in nss-pem code > directly. It is nss code what allocates the mutex, isn't it? Note that, despite /usr/lib64/libnsspem.so appears in the backtrace next to nssCKFW* symbols, the actual code is not maintained in nss-pem. It is nss code that nss-pem links _statically_ via -lnssckfw. (In reply to Kamil Dudka from comment #8) > > 2. Don't use arena in nss-pem. But it may bring a performance degradation. > > Do you have any patch in mind? > > As I understand it, the leaked memory is not allocated in nss-pem code > directly. It is nss code what allocates the mutex, isn't it? I was thinking about passing NULL instead of arena to nssCKFWObject_Create() in nssCKFWSession_CreateObject(). nss_ZNEW() creates object on the heap if arena is NULL. However, this looks like a dirty hack because nssCKFWSession_CreateObject() returns CKR_GENERAL_ERROR if arena is NULL. (In reply to Kamil Dudka from comment #8) > > 3. Try to find object before creating one in nss_load_cert(). Every time the > > object is created for system certificate file, so we definitely can use > > previously created one. > > This is actually what nss-pem does internally -- see the AddObjectIfNeeded() > function, introduced while fixing the following bug: > > bug #509705 > > Unfortunately, nss breaks when pem_CreateObject() returns a single address > multiple times, because CKFW hashes the addresses returned out of the > C_CreateObject() callback. So, to work around this limitation, nss-pem > pretends object creation even when it internally re-uses an already existing > object. This is tracked as the following bug: > > bug #733685 Maybe PK11_FindGenericObjects() / PK11_GetNextGenericObject() could be used in nss_load_cert() to find previously created object. nss_create_object() in this case should use PK11_CreateGenericObject() instead of PK11_CreateManagedGenericObject() to leave the object alive for future search. This brings back leak fixed by the Managed function but it will be one leak per certificate file instead of one leak per https request. Comment for PK11_CreateGenericObject() says that Mozilla uses the interface this way: * The old interface is preserved because applications like Mozilla purposefully * leak the reference to be found later with PK11_FindGenericObjects. New * applications should use the new interface PK11_CreateManagedGenericObject. (In reply to Dmitry Oksenchuk from comment #10) > I was thinking about passing NULL instead of arena to nssCKFWObject_Create() > in nssCKFWSession_CreateObject(). nss_ZNEW() creates object on the heap if > arena is NULL. I do not understand how allocating the objects on heap instead of arena could ever help here. Unless you explicitly free the allocated objects somewhere (which you can do for arena-allocated objects, too), it would only turn the "run-time" memory leak into regular memory leak observable by valgrind. On of the advantages of using arenas is that all the allocated memory is released once the arena is destroyed. In this case, the arena is created in nssCKFWSession_Create() and destroyed in nssCKFWSession_Destroy(). The arena is likely not destroyed soon enough in the mentioned scenario, so the allocated memory grows. Still it is better to free the memory later than never. > However, this looks like a dirty hack because nssCKFWSession_CreateObject() > returns CKR_GENERAL_ERROR if arena is NULL. As already explained, this code is not part of nss-pem. It would need to be discussed with nss maintainers. > Maybe PK11_FindGenericObjects() / PK11_GetNextGenericObject() could be used > in nss_load_cert() to find previously created object. This interface would not work for curl. As it works now, the PK11_CreateGenericObject() interface is kind of abused. curl just passes in a file name, nss-pem opens the file, parses it, and checks (using brute force) whether an object with the same DER contents is already loaded. In order to check anything at curl's side, it would be necessary to let curl process the certificate by itself first before passing it to nss-pem. I am afraid it would kill the only advantage of using nss-pem in curl. (In reply to Kamil Dudka from comment #11) > I do not understand how allocating the objects on heap instead of arena > could ever help here. Unless you explicitly free the allocated objects > somewhere (which you can do for arena-allocated objects, too), it would only > turn the "run-time" memory leak into regular memory leak observable by > valgrind. It is explicitly freed by Curl_llist_destroy(): nss_ZFreeIf (arena.c:955) nssCKFWMutex_Destroy (mutex.c:143) nssCKFWObject_Destroy (object.c:248) NSSCKFWC_DestroyObject (wrap.c:2093) pemC_DestroyObject (nssck.api:582) PK11_DestroyObject (pk11obj.c:55) PK11_DestroyGenericObject (pk11obj.c:1590) Curl_llist_remove (llist.c:132) Curl_llist_destroy (llist.c:149) The problem is that nss_ZFreeIf() just zeros memory without freeing it if arena is not NULL: https://github.com/servo/nss/blob/master/lib/base/arena.c#L1018 I don't know when the arena is supposed to be destroyed but it seems like it's destroyed only on exit. (In reply to Dmitry Oksenchuk from comment #12) > The problem is that nss_ZFreeIf() just zeros memory without freeing it if > arena is not NULL: > https://github.com/servo/nss/blob/master/lib/base/arena.c#L1018 Indeed. Sorry for missing this important fact. Then the correct fix would be to make nss_ZFreeIf() work as documented: /* * NSS_ZFreeIf * * If the specified pointer is non-null, then the region of memory * to which it points -- which must have been allocated with * nss_ZAlloc -- will be zeroed and released. This routine * returns a PRStatus value; if successful, it will return PR_SUCCESS. * If unsuccessful, it will set an error on the error stack and return * PR_FAILURE. * * The error may be one of the following values: * NSS_ERROR_INVALID_POINTER * * Return value: * PR_SUCCESS * PR_FAILURE */ Returning PR_SUCCESS while keeping the memory allocated violates the contract. Upstream patch for nss has been checked in, the NSS code will land as part of the NSS rebase. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: nss and nspr security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4076 |