1057388 – Unbounded memory usage involving libcurl and NSS [rhel-6]

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1057388 - Unbounded memory usage involving libcurl and NSS [rhel-6]

Summary: Unbounded memory usage involving libcurl and NSS [rhel-6]

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	nss
Sub Component:
Version:	6.4
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Bob Relyea
QA Contact:	Alicja Kario
Docs Contact:
URL:
Whiteboard:
Depends On:	rhel7-libcurl-unbounded-mem-issue 1510247
Blocks:	1374441 1461138
TreeView+	depends on / blocked

Reported:	2014-01-24 00:40 UTC by Richard Yao
Modified:	2021-06-10 10:41 UTC (History)
CC List:	19 users (show)
Fixed In Version:	nss-3.27.1-3.el6
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	rhel7-libcurl-unbounded-mem-issue (view as bug list)
Environment:
Last Closed:	2017-08-15 18:04:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Reproducer for issue (based on comment #4) (3.68 KB, text/plain) 2015-09-07 10:02 UTC, john.haxby@oracle.com	no flags	Details
PK11_DestroyGenericObject patch (452 bytes, patch) 2015-09-07 10:49 UTC, john.haxby@oracle.com	no flags	Details \| Diff
valgrind.txt (252.08 KB, text/plain) 2016-01-25 22:33 UTC, Paulo Andrade	no flags	Details
A single threaded program demonstrating the issue. (7.28 KB, application/zip) 2016-03-04 08:12 UTC, Sandis Neilands	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Mozilla Foundation	1202413	0	P2	RESOLVED	because PK11_CreateGenericObject() leaks by design, add non-leaking PK11_CreateManagedGenericObject() API	2021-01-15 07:34:30 UTC
Red Hat Bugzilla	733685	0	medium	CLOSED	CreateObject() pretends creation of reused objects because CKFW refuses previously seen objects	2021-02-22 00:41:40 UTC

Internal Links: 733685

Description Richard Yao 2014-01-24 00:40:32 UTC

Description of problem:

s3backer is a FUSE filesystem that exposes an Amazon S3 bucket as a file in your local filesystem. It relies on libcurl for communication with Amazon. RHEL's libcurl relies on nss as its TLS provider. Subjecting s3backer to prolonged stress on RHEL/CentOS 6.4 with the stock libcurl will result in unbounded memory consumption. Analysis with valgrind's memcheck tool shows no significant memory leaks. However, recompiling libcurl to use OpenSSL as its TLS provider instead of NSS makes the issue go away.

Version-Release number of selected component (if applicable):
nss-3.15.3-3.el6_5.x86_64
libcurl-7.19.7-37.el6_4.x86_64

How reproducible:

It happens every time.

Steps to Reproduce:

1. Spin up a RHEL/CentOS 6.x AWS instance and create an Amazon S3 bucket. In my case, the bucket has a 1M object size. Note that the only thing that is probably necessary here is a S3-like service, but to ensure reproducibility, I am providing precise parameters.
2. Install s3backer.
3. Mount the S3 bucket with a command such as the following; note sensitive details such as keys have been sanitized:

sudo s3backer --readAhead=0 --blockSize=1M --size=500G --listBlocks --vhost --baseURL=https://s3.amazonaws.com/ --timeout=15 --blockCacheThreads=10 --accessId=<accessId> --accessKey=<accessKey> --block
CacheSize=10 <bucket> <mountpoint>

4. Run dd on the mounted s3 bucket:

sudo dd if=/dev/urandom of=<mountpoint>/file bs=1M count=102400 iflag=nonblock,noatime oflag=nonblock,noatime

The workload where I originally observed this involved running a file system on top of s3backer that is not supported by Redhat. However, the same behavior can be reproduced by running dd on top of s3backer. It becomes increasingly obvious when using /proc/$PID/status to view memory usage over the span of a 2-4 hours. While the system running s3backer need not be on AWS, access to S3 from outside Amazon is quite slow and I highly recommend placing it there. A long running program that exercises the same code paths in libcurl and nss as s3backer should also exhibit this problem.

Actual results:

VmRSS slowly grows until either the dd stops or the OOM killer triggers.

Expected results:

VmRSS should stabilize with VmHWM not changing after the first hour or so.

Additional info:

Analysis with Valgrind's massif tool suggests shows multiple backtraces containing NSS in the same snapshot and the share of their memory usage grows as a function of instructions. Here is an excerpt of one of the backtraces:

| ->21.93% (13,056,000B) 0xE91C59A: ???
| | ->21.93% (13,056,000B) 0xE9218C7: ???
| |   ->21.93% (13,056,000B) 0xE928520: ???
| |     ->21.93% (13,056,000B) 0x37C56470C8: PK11_CreateNewObject (pk11obj.c:378)
| |       ->21.93% (13,056,000B) 0x37C5647361: PK11_CreateGenericObject (pk11obj.c:1415)
| |         ->21.93% (13,056,000B) 0x37C823F49E: nss_create_object (nss.c:349)
| |           ->21.93% (13,056,000B) 0x37C823F625: nss_load_cert (nss.c:383)
| |             ->21.93% (13,056,000B) 0x37C8240E0E: Curl_nss_connect (nss.c:1095)
| |               ->21.93% (13,056,000B) 0x37C8238480: Curl_ssl_connect (sslgen.c:185)
| |                 ->21.93% (13,056,000B) 0x37C8216EC9: Curl_http_connect (http.c:1796)
| |                   ->21.93% (13,056,000B) 0x37C821D680: Curl_protocol_connect (url.c:3077)
| |                     ->21.93% (13,056,000B) 0x37C8223B3A: Curl_connect (url.c:4743)
| |                       ->21.93% (13,056,000B) 0x37C822BBAE: Curl_perform (transfer.c:2523)
| |                         ->21.93% (13,056,000B) 0x409E30: http_io_perform_io (http_io.c:1437)
| |                           ->18.49% (11,009,280B) 0x40CB36: http_io_write_block (http_io.c:1348)
| |                           | ->18.49% (11,009,280B) 0x407422: ec_protect_write_block (ec_protect.c:433)
| |                           |   ->18.49% (11,009,280B) 0x404CFD: block_cache_worker_main (block_cache.c:1090)
| |                           |     ->18.49% (11,009,280B) 0x379120784F: start_thread (pthread_create.c:301)
| |                           |       ->18.49% (11,009,280B) 0x3790AE894B: clone (clone.S:115)

Recompiling libcurl with OpenSSL as its TLS provider makes the issue go away, which strongly suggests a problem either in NSS or in how libcurl uses NSS.

I am not familiar with the NSS codebase and I am concerned about the possibility of introducing a security hole should I attempt to patch this myself. Redhat employees in #rhel on freenode informed me that Redhat employs multiple NSS developers and suggested that I file a report here for them.

Comment 2 Charles Stanley 2014-03-17 22:00:37 UTC

I have experienced this exact issue while developing a proprietary application. 

->03.41% (5,580,460B) 0x11C1A628: ???
  ->01.37% (2,236,192B) 0x11C0A7CA: ???
  | ->01.37% (2,236,192B) 0x11C0FB57: ???
  |   ->01.37% (2,236,192B) 0x11C167B0: ???
  |     ->01.37% (2,236,192B) 0x7901888: PK11_CreateNewObject (pk11obj.c:378)
  |       ->01.37% (2,236,192B) 0x7901B21: PK11_CreateGenericObject (pk11obj.c:1504)
  |         ->01.37% (2,236,192B) 0x52EB49E: nss_create_object (nss.c:349)
  |           ->01.37% (2,236,192B) 0x52EB625: nss_load_cert (nss.c:383)
  |             ->01.37% (2,236,192B) 0x52ECE0E: Curl_nss_connect (nss.c:1095)
  |               ->01.37% (2,236,192B) 0x52E4480: Curl_ssl_connect (sslgen.c:185)
  |                 ->01.37% (2,236,192B) 0x52C2EC9: Curl_http_connect (http.c:1796)
  |                   ->01.37% (2,236,192B) 0x52C9680: Curl_protocol_connect (url.c:3077)
  |                     ->01.37% (2,236,192B) 0x52CFB3A: Curl_connect (url.c:4743)
  |                       ->01.37% (2,236,192B) 0x52D7BAE: Curl_perform (transfer.c:2523)
  |                         ->01.09% (1,783,232B) 0x5B6B5D: CSwiftCurl::login() (swiftcurl.C:186)
  |                         | ->01.09% (1,783,232B) in 2 places, all below massif's threshold (01.00%)
  |                         |   
  |                         ->00.28% (452,960B) in 1+ places, all below ms_print's threshold (01.00%)

Some additional details which may be of use:

I have noticed that the rate of the leak is a function of the number of threads performing simultaneous requests. If I use a single thread, the leak never materializes. 

I have traced the curl code and it appears to be calling PK11_DestroyGenericObject correctly. I strongly suspect that this is a race condition in PK11_DestroyGenericObject.


The following C program seems to produce a leak with a similar call stack somewhat reliably:
//=========leak-check.c=======
#include <curl/curl.h> 
#include <stdio.h> 

#define NUM_THREADS 200
#define NUM_GETS_PER_THREAD 5

void do_nothing(void *data) { 
; 
} 

void *thread_main() { 
    CURL *c; 
    int i; 
    c = curl_easy_init(); 
    for (i = 0; i < NUM_GETS_PER_THREAD; i++) { 
        curl_easy_setopt(c, CURLOPT_URL, 
                "https://bugzilla.redhat.com"); 
        curl_easy_setopt(c, CURLOPT_WRITEFUNCTION, do_nothing); 
        curl_easy_perform(c); 
        curl_easy_reset(c);
    } 
    curl_easy_cleanup(c);
    return 0; 
} 
int main() {

    curl_global_init(CURL_GLOBAL_ALL); 
    int i;
    pthread_t all_threads[NUM_THREADS];
    int all_args[NUM_THREADS];


    for (i = 0; i < NUM_THREADS; i++) {
        all_args[i]=i;  
            pthread_create(&(all_threads[i]),0, thread_main, (void*) NULL);

    }   
    for (i = 0; i < NUM_THREADS; i++) {
            pthread_join(all_threads[i], NULL);

    }   
    

    curl_global_cleanup();
}
//============================


$ gcc -lcurl leak-test.c -o leak-test
$ time valgrind --num-callers=20 --leak-check=full ./leak-test                                                                                                                                                                                                                              
==13090== Memcheck, a memory error detector
==13090== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==13090== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==13090== Command: ./leak-test
==13090== 
==13090== 
==13090== HEAP SUMMARY:
==13090==     in use at exit: 33,523 bytes in 512 blocks
==13090==   total heap usage: 1,449,900 allocs, 1,449,388 frees, 1,293,593,227 bytes allocated
==13090== 
==13090== 400 bytes in 10 blocks are possibly lost in loss record 79 of 96
==13090==    at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==13090==    by 0x761556B: _PR_Getfd (prfdcach.c:112)
==13090==    by 0x762CDF4: pt_SetMethods (ptio.c:3301)
==13090==    by 0x762D1C8: PR_OpenFile (ptio.c:3579)
==13090==    by 0x8AB76F44: ???
==13090==    by 0x8AB7475A: ???
==13090==    by 0x8AB79B58: ???
==13090==    by 0x8AB807B1: ???
==13090==    by 0x6CDC889: PK11_CreateNewObject (pk11obj.c:378)
==13090==    by 0x6CDCB22: PK11_CreateGenericObject (pk11obj.c:1504)
==13090==    by 0x4E6E49F: nss_create_object (nss.c:349)
==13090==    by 0x4E6E626: nss_load_cert (nss.c:383)
==13090==    by 0x4E6FE0F: Curl_nss_connect (nss.c:1095)
==13090==    by 0x4E67481: Curl_ssl_connect (sslgen.c:185)
==13090==    by 0x4E45ECA: Curl_http_connect (http.c:1796)
==13090==    by 0x4E4C681: Curl_protocol_connect (url.c:3077)
==13090==    by 0x4E52B3B: Curl_connect (url.c:4743)
==13090==    by 0x4E5ABAF: Curl_perform (transfer.c:2523)
==13090==    by 0x400987: thread_main (in /usr/local/projects/accesslogger/leak-test)
==13090==    by 0x541E9D0: start_thread (in /lib64/libpthread-2.12.so)
==13090== 
==13090== 480 bytes in 10 blocks are possibly lost in loss record 80 of 96
==13090==    at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==13090==    by 0x7615559: _PR_Getfd (prfdcach.c:109)
==13090==    by 0x762CDF4: pt_SetMethods (ptio.c:3301)
==13090==    by 0x762D1C8: PR_OpenFile (ptio.c:3579)
==13090==    by 0x8AB76F44: ???
==13090==    by 0x8AB7475A: ???
==13090==    by 0x8AB79B58: ???
==13090==    by 0x8AB807B1: ???
==13090==    by 0x6CDC889: PK11_CreateNewObject (pk11obj.c:378)
==13090==    by 0x6CDCB22: PK11_CreateGenericObject (pk11obj.c:1504)
==13090==    by 0x4E6E49F: nss_create_object (nss.c:349)
==13090==    by 0x4E6E626: nss_load_cert (nss.c:383)
==13090==    by 0x4E6FE0F: Curl_nss_connect (nss.c:1095)
==13090==    by 0x4E67481: Curl_ssl_connect (sslgen.c:185)
==13090==    by 0x4E45ECA: Curl_http_connect (http.c:1796)
==13090==    by 0x4E4C681: Curl_protocol_connect (url.c:3077)
==13090==    by 0x4E52B3B: Curl_connect (url.c:4743)
==13090==    by 0x4E5ABAF: Curl_perform (transfer.c:2523)
==13090==    by 0x400987: thread_main (in /usr/local/projects/accesslogger/leak-test)
==13090==    by 0x541E9D0: start_thread (in /lib64/libpthread-2.12.so)
==13090== 
==13090== 520 bytes in 13 blocks are possibly lost in loss record 88 of 96
==13090==    at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==13090==    by 0x761556B: _PR_Getfd (prfdcach.c:112)
==13090==    by 0x762CDF4: pt_SetMethods (ptio.c:3301)
==13090==    by 0x762CF71: PR_ImportTCPSocket (ptio.c:4575)
==13090==    by 0x4E70048: Curl_nss_connect (nss.c:1335)
==13090==    by 0x4E67481: Curl_ssl_connect (sslgen.c:185)
==13090==    by 0x4E45ECA: Curl_http_connect (http.c:1796)
==13090==    by 0x4E4C681: Curl_protocol_connect (url.c:3077)
==13090==    by 0x4E52B3B: Curl_connect (url.c:4743)
==13090==    by 0x4E5ABAF: Curl_perform (transfer.c:2523)
==13090==    by 0x400987: thread_main (in /usr/local/projects/accesslogger/leak-test)
==13090==    by 0x541E9D0: start_thread (in /lib64/libpthread-2.12.so)
==13090==    by 0x516BB6C: clone (in /lib64/libc-2.12.so)
==13090== 624 bytes in 13 blocks are possibly lost in loss record 89 of 96
==13090==    at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==13090==    by 0x7615559: _PR_Getfd (prfdcach.c:109)
==13090==    by 0x762CDF4: pt_SetMethods (ptio.c:3301)
==13090==    by 0x762CF71: PR_ImportTCPSocket (ptio.c:4575)
==13090==    by 0x4E70048: Curl_nss_connect (nss.c:1335)
==13090==    by 0x4E67481: Curl_ssl_connect (sslgen.c:185)
==13090==    by 0x4E45ECA: Curl_http_connect (http.c:1796)
==13090==    by 0x4E4C681: Curl_protocol_connect (url.c:3077)
==13090==    by 0x4E52B3B: Curl_connect (url.c:4743)
==13090==    by 0x4E5ABAF: Curl_perform (transfer.c:2523)
==13090==    by 0x400987: thread_main (in /usr/local/projects/accesslogger/leak-test)
==13090==    by 0x541E9D0: start_thread (in /lib64/libpthread-2.12.so)
==13090==    by 0x516BB6C: clone (in /lib64/libc-2.12.so)
==13090== 
==13090== 7,160 bytes in 179 blocks are possibly lost in loss record 95 of 96
==13090==    at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==13090==    by 0x761556B: _PR_Getfd (prfdcach.c:112)
==13090==    by 0x762CDF4: pt_SetMethods (ptio.c:3301)
==13090==    by 0x762E000: PR_Socket (ptio.c:3503)
==13090==    by 0x4E6F637: Curl_nss_connect (nss.c:1200)
==13090==    by 0x4E67481: Curl_ssl_connect (sslgen.c:185)
==13090==    by 0x4E45ECA: Curl_http_connect (http.c:1796)
==13090==    by 0x4E4C681: Curl_protocol_connect (url.c:3077)
==13090==    by 0x4E52B3B: Curl_connect (url.c:4743)
==13090==    by 0x4E5ABAF: Curl_perform (transfer.c:2523)
==13090==    by 0x400987: thread_main (in /usr/local/projects/accesslogger/leak-test)
==13090==    by 0x541E9D0: start_thread (in /lib64/libpthread-2.12.so)
==13090==    by 0x516BB6C: clone (in /lib64/libc-2.12.so)
==13090== 
==13090== 8,592 bytes in 179 blocks are possibly lost in loss record 96 of 96
==13090==    at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==13090==    by 0x7615559: _PR_Getfd (prfdcach.c:109)
==13090==    by 0x762CDF4: pt_SetMethods (ptio.c:3301)
==13090==    by 0x762E000: PR_Socket (ptio.c:3503)
==13090==    by 0x4E6F637: Curl_nss_connect (nss.c:1200)
==13090==    by 0x4E67481: Curl_ssl_connect (sslgen.c:185)
==13090==    by 0x4E45ECA: Curl_http_connect (http.c:1796)
==13090==    by 0x4E4C681: Curl_protocol_connect (url.c:3077)
==13090==    by 0x4E52B3B: Curl_connect (url.c:4743)
==13090==    by 0x4E5ABAF: Curl_perform (transfer.c:2523)
==13090==    by 0x400987: thread_main (in /usr/local/projects/accesslogger/leak-test)
==13090==    by 0x541E9D0: start_thread (in /lib64/libpthread-2.12.so)
==13090==    by 0x516BB6C: clone (in /lib64/libc-2.12.so)
==13090== 
==13090== LEAK SUMMARY:
==13090==    definitely lost: 0 bytes in 0 blocks
==13090==    indirectly lost: 0 bytes in 0 blocks
==13090==      possibly lost: 17,776 bytes in 404 blocks
==13090==    still reachable: 15,747 bytes in 108 blocks
==13090==         suppressed: 0 bytes in 0 blocks
==13090== Reachable blocks (those to which a pointer was found) are not shown.
==13090== To see them, rerun with: --leak-check=full --show-reachable=yes
==13090== 
==13090== For counts of detected and suppressed errors, rerun with: -v
==13090== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 12 from 9)

real    3m11.968s
user    0m53.931s
sys     1m49.370s



Increasing either the NUM_THREADS or the NUM_GETS_PER_THREAD seems to increase the size of the leak.

Comment 3 Charles Stanley 2014-03-17 22:06:24 UTC

Also, the above was run on RHEL6.5:
$ rpm -q nss libcurl
nss-3.15.3-6.el6_5.x86_64
libcurl-7.19.7-37.el6_4.x86_64

Comment 4 Charles Stanley 2014-03-18 14:09:13 UTC

A dive into the NSS code and I think I've found the race:
==== nss-3.15.1/nss/lib/pk11wrap/pk11obj.c:1421 ====

/*
 * remove an object from the list. If the object isn't already in
 * a list unlink becomes a noop.
 */
SECStatus
PK11_UnlinkGenericObject(PK11GenericObject *object)
{
    if (object->prev != NULL) {
        object->prev->next = object->next;
    }
    if (object->next != NULL) {
        object->next->prev = object->prev;
    }

    object->next = NULL;
    object->prev = NULL;
    return SECSuccess;
}

/*
 * This function removes a single object from the list and destroys it.
 * For an already unlinked object there is no difference between
 * PK11_DestroyGenericObject and PK11_DestroyGenericObjects
 */
SECStatus 
PK11_DestroyGenericObject(PK11GenericObject *object)
{
    if (object == NULL) {
        return SECSuccess;
    }

    PK11_UnlinkGenericObject(object);
    if (object->slot) {
        PK11_FreeSlot(object->slot);
    }
    PORT_Free(object);
    return SECSuccess;
}

========


Imagine the following execution:

Thread1                                     Thread2
PK11_LinkGenericObject(some_obj)            PK11_LinkGenericObject(some_obj->next)
    if (object->prev != NULL) {                 if (object->prev != NULL) {
        object->prev->next = object->next;        object->prev->next = object->next;
    }                                           }
// Both threads update their predecessors next pointer now some_object->prev->next is pointing to some_object->next (rather than some_object->next->next like it should)
    if (object->next != NULL) {                 if (object->next != NULL) {
        object->next->prev = object->prev;          object->next->prev = object->prev;
    }                                           }

    object->next = NULL;                        object->next = NULL;
    object->prev = NULL;                        object->prev = NULL;
    return SECSuccess;                          return SECSuccess;

// Now some_obj->prev->next->next = NULL, we lost the tail end of the list


Now, I'm far from an expert on the NSS code, and there may be some other synchronization here that I'm missing, but my understanding is that you can't do this lockfree without CAS. If I come up with a patch, I'll be sure and post it here.

Comment 5 Charles Stanley 2014-03-18 14:17:45 UTC

Correction above:
s/PK11_LinkGenericObject(some_obj)/PK11_UnlinkGenericObject(some_obj)/g

Comment 7 Rakesh 2014-05-26 13:13:20 UTC

I am also facing same problem , I have an medium ec2 instance in AWS with s3 backer configured.
Once the s3 backer starts it keeps on consuming system memory till maximum, and we have to reboot the instance to get it freed.
Please let me if there is any fix or solution to this problem.

Additional posts:-https://code.google.com/p/s3backer/issues/detail?id=47

Instance details:-
OS :Centos 
CentOS Linux release 6.0 (Final) x86_64
S3backer :- 
/usr/local/s3backer/bin/s3backer --blockSize=64k --size=50g --ssl --vhost --listBlocks --debug-http mymachine-production-backup /s3backer



[root@ip-10-XXX-XXX-X ~]# rpm -qa | grep -e nss -e libcurl
openssh-server-5.3p1-20.el6_0.3.x86_64
openssl-devel-1.0.1e-16.el6_5.7.x86_64
libcurl-devel-7.19.7-26.el6_2.4.x86_64
nss-sysinit-3.14.0.0-12.el6.x86_64
openssh-clients-5.3p1-20.el6_0.3.x86_64
openssl-1.0.1e-16.el6_5.7.x86_64
nss-util-3.14.0.0-2.el6.x86_64
nss-3.14.0.0-12.el6.x86_64
openssh-5.3p1-20.el6_0.3.x86_64
libcurl-7.19.7-26.el6_2.4.x86_64
nss-softokn-freebl-3.12.9-11.el6.x86_64
nss-tools-3.14.0.0-12.el6.x86_64
nss-softokn-3.12.9-11.el6.x86_64

Comment 8 Bob Relyea 2014-06-06 23:17:51 UTC

This is correct. Currently thread own their own generic object lists.
If they share those list, they need to protect them themselves. NSS does not use global generic object lists (unlike, say keys, where keys live in global lists within NSS).

Comment 9 Charles Stanley 2014-06-07 05:36:09 UTC

So we should file a bug with libcurl then?

Comment 10 Elio Maldonado Batiz 2014-06-09 17:14:09 UTC

(In reply to Charles Stanley from comment #9)
> So we should file a bug with libcurl then?

You could do

Comment 11 Elio Maldonado Batiz 2014-06-09 17:16:49 UTC

meant to type: you could do that or reassign this bug to libcurl.

Comment 13 Kamil Dudka 2014-06-10 10:19:49 UTC

(In reply to Bob Relyea from comment #8)
> This is correct. Currently thread own their own generic object lists.
> If they share those list, they need to protect them themselves. NSS does not
> use global generic object lists (unlike, say keys, where keys live in global
> lists within NSS).

libcurl maintains its own list of objects per connection (which itself is never used by multiple threads at the same time).  What exactly should be protected by libcurl at the level of NSS API?

This bug is about calling PK11_DestroyGenericObject() concurrently, but it is the only problem that can happen here?

Is it fine to call PK11_DestroyGenericObject() while another thread is executing PK11_CreateGenericObject()?

Is it actually safe to call PK11_CreateGenericObject() concurrently?

Is it fine to call PK11_DestroyGenericObject() while another thread is executing other PK11_*() functions?

Is it fine to call PK11_DestroyGenericObject() while another thread is executing SSL_ForceHandshake()?

Bob, could please describe what exactly needs to be synchronized here?

Comment 14 Bob Relyea 2014-07-02 16:31:15 UTC

You can call PK11_XXXGeneric object on multiple threads, but if you share the data structure returned by PK11_CreateGenericObject() with other threads you need to protect it on on various calls PK11_XXXGeneric calls. So it's not calling PK11_DestroyGenericObject() on multiple threads that is the issue, it's calling PK11_DestroyGenericObject() with the same object on multiple threads that's an issue.

It's reasonable to assume that this should work, and it's a reasonable RFE to have NSS protect the generic object data structure, but currently it's expected that you wouldn't share generic objects across threads (the initial purpose was to get it, use it, free it all on the same stack).

bob

Comment 15 Kamil Dudka 2014-07-02 19:32:02 UTC

(In reply to Bob Relyea from comment #14)
> You can call PK11_XXXGeneric object on multiple threads, but if you share
> the data structure returned by PK11_CreateGenericObject() with other threads
> you need to protect it on on various calls PK11_XXXGeneric calls. So it's
> not calling PK11_DestroyGenericObject() on multiple threads that is the
> issue, it's calling PK11_DestroyGenericObject() with the same object on
> multiple threads that's an issue.
> 
> It's reasonable to assume that this should work, and it's a reasonable RFE
> to have NSS protect the generic object data structure, but currently it's
> expected that you wouldn't share generic objects across threads (the initial
> purpose was to get it, use it, free it all on the same stack).
> 
> bob

Thanks for the reply, Bob!

I forgot to say that I am pretty sure that curl never uses the pointer returned by PK11_CreateGenericObject() in a different thread than the owning one.  Based on the above analysis, the problem seems to be that NSS internally links the objects with each other.  Consequently, even if each thread manipulates its own object, the collision may happen during the update of the linking pointers.

As long as linking the objects into list is an implementation detail of NSS, it is not clear what libcurl should protect at the level of NSS API.

Comment 16 Bob Relyea 2014-07-14 16:27:33 UTC

Hmm, then something is wrong with the analysis, because NSS doesn't reuse those pointers either, nor does it put those pointers on global threads. We can still look into adding locks to these objects themselves, but that wouldn't solve the original issue. (Which may be a memory corruption rather than a thread issue).

Comment 17 Kamil Dudka 2014-07-17 12:12:05 UTC

Fair enough.  I agree that the analysis does not apply to the problem reported in this bug.  The memory consumption growing per connection is a known issue of nss:

https://bugzilla.mozilla.org/402712#c58

As for the valgrind reports, does the amount of "possibly lost memory" go to zero if you call PR_Cleanup() at the end of main() function?

Comment 18 zjx20 2015-01-07 02:51:26 UTC

I observed a similar leak with my single-threaded demo program. So this issue is not related to threading.


$ valgrind --leak-check=full --num-callers=20  ./main
==25607== Memcheck, a memory error detector
==25607== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==25607== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==25607== Command: ./main
==25607== 
==25607== 
==25607== HEAP SUMMARY:
==25607==     in use at exit: 119,019 bytes in 245 blocks
==25607==   total heap usage: 136,228 allocs, 135,983 frees, 89,561,004 bytes allocate
d
==25607== 
==25607== 40 bytes in 1 blocks are possibly lost in loss record 45 of 99
==25607==    at 0x4C27A2E: malloc (vg_replace_malloc.c:270)
==25607==    by 0x824E60B: _PR_Getfd (prfdcach.c:112)
==25607==    by 0x8266094: pt_SetMethods (ptio.c:3303)
==25607==    by 0x8266468: PR_OpenFile (ptio.c:3581)
==25607==    by 0x9938C67: blapi_SHVerifyFile (shvfy.c:424)
==25607==    by 0x9938FFE: blapi_SHVerify (shvfy.c:358)
==25607==    by 0x9923438: bl_startup_tests (fipsfreebl.c:1541)
==25607==    by 0x99620D5: ??? (in /lib64/libfreebl3.so)
==25607==    by 0x9910A7A: ??? (in /lib64/libfreebl3.so)
==25607== 
==25607== 48 bytes in 1 blocks are possibly lost in loss record 49 of 99
==25607==    at 0x4C27A2E: malloc (vg_replace_malloc.c:270)
==25607==    by 0x824E5F9: _PR_Getfd (prfdcach.c:109)
==25607==    by 0x8266094: pt_SetMethods (ptio.c:3303)
==25607==    by 0x8266468: PR_OpenFile (ptio.c:3581)
==25607==    by 0x9938C67: blapi_SHVerifyFile (shvfy.c:424)
==25607==    by 0x9938FFE: blapi_SHVerify (shvfy.c:358)
==25607==    by 0x9923438: bl_startup_tests (fipsfreebl.c:1541)
==25607==    by 0x99620D5: ??? (in /lib64/libfreebl3.so)
==25607==    by 0x9910A7A: ??? (in /lib64/libfreebl3.so)
==25607== 
==25607== 2,000 bytes in 50 blocks are possibly lost in loss record 86 of 99
==25607==    at 0x4C27A2E: malloc (vg_replace_malloc.c:270)
==25607==    by 0x824E60B: _PR_Getfd (prfdcach.c:112)
==25607==    by 0x8266094: pt_SetMethods (ptio.c:3303)
==25607==    by 0x8266468: PR_OpenFile (ptio.c:3581)
==25607==    by 0xB273F84: ???
==25607==    by 0xB27176A: ???
==25607==    by 0xB276B98: ???
==25607==    by 0xB27D7F1: ???
==25607==    by 0x7915419: PK11_CreateNewObject (pk11obj.c:381)
==25607==    by 0x79156B2: PK11_CreateGenericObject (pk11obj.c:1550)
==25607==    by 0x4E6E6BF: nss_create_object (nss.c:353)
==25607==    by 0x4E6E846: nss_load_cert (nss.c:387)
==25607==    by 0x4E70513: nss_connect_common (nss.c:1168)
==25607==    by 0x4E676E8: Curl_ssl_connect_nonblocking (sslgen.c:201)
==25607==    by 0x4E423BD: https_connecting (http.c:1830)
==25607==    by 0x4E4C931: Curl_protocol_connect (url.c:3078)
==25607==    by 0x4E609B5: multi_runsingle (multi.c:1069)
==25607==    by 0x4E61013: multi_socket (multi.c:1952)
==25607==    by 0x4E610DE: curl_multi_socket_action (multi.c:2070)
==25607==    by 0x40298C: perform(murl_t*, long) (murl.c:309)
==25607== 
==25607== 2,400 bytes in 50 blocks are possibly lost in loss record 98 of 99
==25607==    at 0x4C27A2E: malloc (vg_replace_malloc.c:270)
==25607==    by 0x824E5F9: _PR_Getfd (prfdcach.c:109)
==25607==    by 0x8266094: pt_SetMethods (ptio.c:3303)
==25607==    by 0x8266468: PR_OpenFile (ptio.c:3581)
==25607==    by 0xB273F84: ???
==25607==    by 0xB27176A: ???
==25607==    by 0xB276B98: ???
==25607==    by 0xB27D7F1: ???
==25607==    by 0x7915419: PK11_CreateNewObject (pk11obj.c:381)
==25607==    by 0x79156B2: PK11_CreateGenericObject (pk11obj.c:1550)
==25607==    by 0x4E6E6BF: nss_create_object (nss.c:353)
==25607==    by 0x4E6E846: nss_load_cert (nss.c:387)
==25607==    by 0x4E70513: nss_connect_common (nss.c:1168)
==25607==    by 0x4E676E8: Curl_ssl_connect_nonblocking (sslgen.c:201)
==25607==    by 0x4E423BD: https_connecting (http.c:1830)
==25607==    by 0x4E4C931: Curl_protocol_connect (url.c:3078)
==25607==    by 0x4E609B5: multi_runsingle (multi.c:1069)
==25607==    by 0x4E61013: multi_socket (multi.c:1952)
==25607==    by 0x4E610DE: curl_multi_socket_action (multi.c:2070)
==25607==    by 0x40298C: perform(murl_t*, long) (murl.c:309)
==25607== 
==25607== LEAK SUMMARY:
==25607==    definitely lost: 0 bytes in 0 blocks
==25607==    indirectly lost: 0 bytes in 0 blocks
==25607==      possibly lost: 4,488 bytes in 102 blocks
==25607==    still reachable: 114,531 bytes in 143 blocks
==25607==         suppressed: 0 bytes in 0 blocks
==25607== Reachable blocks (those to which a pointer was found) are not shown.
==25607== To see them, rerun with: --leak-check=full --show-reachable=yes
==25607== 
==25607== For counts of detected and suppressed errors, rerun with: -v
==25607== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 12 from 9)

Comment 19 Kamil Dudka 2015-01-07 07:11:55 UTC

(In reply to zjx20 from comment #18)
> I observed a similar leak with my single-threaded demo program. So this
> issue is not related to threading.

"This issue" in your case has nothing to do with this bug report.  The possibly lost memory is just file descriptor cache allocated by NSPR, which should not grow per connection.  You will get rid of the memory leak reports by calling PR_Cleanup() at the end of main() function.  See the following curl commit for inspiration:

https://github.com/bagder/curl/commit/24c3cdce

Comment 20 john.haxby@oracle.com 2015-09-04 16:50:04 UTC

I believe the problem here is in PK11_DestroyGenericObject() (which is called by nss_destroy_object() in curl).  

As you can see in comment #4, PK11_DestroyGenericObject() unlinks the object from its list and reduces the link count in its slot and then calls PORT_Free() to release the actually storage: three pointers and an integer.  Unfortunately the object referred to by object->objectID isn't destroyed.  We're missing a call to PK11_DestroyObject(object->slot, object->objectID) immediately after PK11_FreeSlot():

SECStatus 
PK11_DestroyGenericObject(PK11GenericObject *object)
{
    if (object == NULL) {
        return SECSuccess;
    }

    PK11_UnlinkGenericObject(object);
    if (object->slot) {
        PK11_DestroyObject(object->slot, object->objectID);
        PK11_FreeSlot(object->slot);
    }
    PORT_Free(object);
    return SECSuccess;
}

(This is similar to SECKEY_DestroyPublicKey, but we don't check for permanent objects because the objects we're wanting to delete in nss_destroy_object are in fact permanent.)

With that patch, all the tests pass when building the nss rpm and the test programs that use libcurl no longer leak.  However, looking at the callers of PK11_DestroyGenericObject() I think that some of them might be assuming that the underlying object is not actually destroyed so perhaps a PK11_CompletelyDestroyGenericObject() might be needed.

Comment 21 Kamil Dudka 2015-09-07 07:29:13 UTC

(In reply to john.haxby from comment #20)
> With that patch, all the tests pass when building the nss rpm and the test
> programs that use libcurl no longer leak.

Which test programs?  Do they leak memory without the patch?

Please report the memory leaks together with a reproducer and valgrind's output.

I believe this bug is not about leaking memory at program's exit, it is about unbounded memory usage during its run time.

Comment 22 john.haxby@oracle.com 2015-09-07 10:01:28 UTC

OK, "unbounded memory usage" :)  The objectIDs that are used to referrred to the allocated storage as no longer accessible to the program because the generic objects that contain them were destroyed.

From the program I'm about to attach (which is based on that in comment #4):

/*
 * Test for unbounded memory usage (aka memory leak) with curl and
 * nss.
 *
 * Create a local https server like this:
 *
 * openssl req -x509 -nodes -days 365 \
 *	-subj '/C=XX/ST=Here/L=There/O=MyOrg/CN=localhost' \
 *      -newkey rsa:1024 -keyout server.key -out server.crt
 * openssl s_server -key server.key -cert server.crt -www > /dev/null &
 *
 * Compile this program:
 *
 *   cc -Os -Wall -g -o nss-leak-test nss-leak-test.c -lcurl
 *
 * Run (for example)
 *
 *  ./nss-leak-test -n 10000
 *
 * and note that the storage increases while the program is running.
 * Running under valgrind shows no leaks.  Run like this to shown
 * possibly lost storage:
 *
 *   valgrind  --leak-check=full --num-callers=30 ./nss-leak-test -c 10 -a 9
 *
 * This eschews the normal clean-up and shows lots of possibly lost
 * memory, some with a (partial) stack like this
 *
 *  [...]
 *  by 0x5D77EE9: PK11_CreateNewObject (pk11obj.c:381)
 *  by 0x5D79CE2: PK11_CreateGenericObject (pk11obj.c:1550)
 *  by 0x4E406CC: nss_create_object (nss.c:371)
 *  by 0x4E4077D: nss_load_cert (nss.c:416)
 *  by 0x4E90BFB: nss_load_ca_certificates (nss.c:1326)
 *  by 0x4E90BFB: nss_setup_connect (nss.c:1588)
 *  by 0x4E90BFB: nss_connect_common (nss.c:1798)
 *  [...]
 *
 * The object allocated in nss_create_object() is destroyed in
 * nss_destroy_object() which called PK11_DestroyGenericObject() which
 * doesn't, in fact, destroy the object.
 *
 * If you turn off verification then this particular stack doesn't
 * appear and you don't get the leak.
 */

Without the patch, the test program grows from about 8.7MB RSS to about 20MB in 10,000 iterations; the number of possibly lost blocks increases by about five on each iteration, everything else stays much the same.

I encourage you to try it and try it with the nss patch (which I'll also attach rather than just describing it).

Comment 23 john.haxby@oracle.com 2015-09-07 10:02:34 UTC

Created attachment 1070920 [details]
Reproducer for issue (based on comment #4)

Comment 24 john.haxby@oracle.com 2015-09-07 10:05:33 UTC

I realise, of course, that curl probably should not reload the server certificates on every connection, but that's a separate issue and would merely avoid the problem.

Comment 25 Kamil Dudka 2015-09-07 10:23:05 UTC

As far as I remember, the excessive memory usage is not caused by the NSS library itself but the PEM PKCS #11 module that libcurl uses to load certs/keys from PEM files:

https://bugzilla.mozilla.org/show_bug.cgi?id=402712#c58

Comment 26 john.haxby@oracle.com 2015-09-07 10:49:32 UTC

Created attachment 1070947 [details]
PK11_DestroyGenericObject patch

Comment 27 john.haxby@oracle.com 2015-09-07 11:06:19 UTC

Almost but not quite.   I should have included a slightly longer stack trace:

   by 0xABE8B29: NSSCKFWC_CreateObject (wrap.c:1958)
   by 0x5D77EE9: PK11_CreateNewObject (pk11obj.c:381)
   by 0x5D79CE2: PK11_CreateGenericObject (pk11obj.c:1550)
   by 0x4E406CC: nss_create_object (nss.c:371)
   by 0x4E4077D: nss_load_cert (nss.c:416)
   by 0x4E90BFB: nss_load_ca_certificates (nss.c:1326)

With the patch PK11_DestroyGenericObject() calls PK11_GETTAB(slot)->C_DestroyObject(slot->session,object), aka NSSCKFWC_DestroyObject().

Basically, PK11_CreateGenericObject() calls   PK11_GETTAB(slot)->C_CreateObject() and so it seems that PK11_DestroyGenericObject() should call PK11_GETTAB(slot)->C_DestroyObject().

Does that make sense?

Comment 28 john.haxby@oracle.com 2015-09-07 11:08:32 UTC

I should add that with the patch, my test program and its predecessor in comment #4 do not change size at all: the possibly-lost blocks do no change with the number of times around the loop.   My only concern is that PK11_CreateGenericObject() and PK11_DestroyGenericObject() are not supposed to be symmetric, in spite of their names.

Comment 29 john.haxby@oracle.com 2015-09-07 13:16:14 UTC

For completeness: https://bugzilla.mozilla.org/show_bug.cgi?id=1202413

Comment 30 Bob Relyea 2015-09-09 01:07:03 UTC

You have something here John, but it's a bit more complex. We need to keep an 'onwer' tag on the generic object. If we create it and it's a session object, then PK11_DestroyGenericObject should destroy it. If we find it with 'FindGenericObject, or we create it as a token object (which means we mean it to be persistant) then we PK11_DestroyGenericObject should not destroy it).

We need to add a new field to struct PK11GenericObjectStr { which is a PRBool called owner.

owner needs to be set in PK11_CreateGenericObject as:
   PR_TRUE if token == PR_FALSE
   PR_FALSE if token == PR_TRUE
and in PK11_FindGenericObjects it needs to be set to PR_FALSE 

Then we call PK11_DestroyObject() in PK11_DestroyGenericObject if owner == PR_TRUE.


bob

Comment 31 Bob Relyea 2015-09-09 01:15:41 UTC

For completeness, there is one other place that creates PK11_GenericObjects() for the pk11_newMergeLog() in pk11merge.c. These should be set to PR_FALSE.

bob

Comment 35 Paulo Andrade 2016-01-25 22:30:55 UTC

  I have tested the patch in a nss build for latest rhel6
for an user code.
  The code uses libcurl with https. The leak does not go
away, it is just "significantly" reduced. But I think there
is something odd in the patch anyway, and we (me and the
user) reverted it for the moment.

Comment 36 Paulo Andrade 2016-01-25 22:33:02 UTC

Created attachment 1118263 [details]
valgrind.txt

  Hopefully this valgrind log can help in having an idea of
where the problem is.

Comment 38 Sandis Neilands 2016-03-04 08:10:13 UTC

Quoting from https://github.com/curl/curl/issues/696 .

"""
Several of our customers reported memory leaks when using Zabbix with web-scenarios. We tracked down the problem to libcurl / libnss and CURLOPT_SSL_VERIFYPEER = 1 option on CENTOS 6, CENTOS 7 and the corresponding RHEL versions. Eventually OOM killer kicks in and kills Zabbix processes.

The attached program demonstrates the issue (see the included README for running instructions). If you run it on the aforementioned operating systems with the default libcurl from their repositories then heap memory segment will grow continuously (run it for several minutes and use the included topmap.sh script to monitor memory segments).

Valgrind doesn't show any lost memory.
"""

The program in question is single-threaded. The problem can be reliably repeated on CENTOS 6.7 and CENTOS 7.2 (our customers use RHEL).

Could somebody confirm that this is the same issue as discussed in this thread?

Comment 39 Sandis Neilands 2016-03-04 08:12:11 UTC

Created attachment 1133104 [details]
A single threaded program demonstrating the issue.

See "leak.txt" and "smaps" files in "leak_v2.zip".

Comment 40 john.haxby@oracle.com 2016-03-04 09:23:11 UTC

Yes, that sounds exactly like the problem.

I have a similar reproducer attached to this bug, read from comment #20 onwards.

I'm beginning to think that the right way to fix this bug is to go back to OpenSSL.  Again.

* Thu Sep 06 2007 Jindrich Novy <jnovy> 7.16.4-5
- add support for the NSS PKCS#11 pem reader so the command-line is the
  same for both OpenSSL and NSS by Rob Crittenden (rcritten)
- switch to NSS again

* Mon Sep 03 2007 Jindrich Novy <jnovy> 7.16.4-4
- revert back to use OpenSSL (#266021)

* Mon Aug 27 2007 Jindrich Novy <jnovy> 7.16.4-3
- don't use openssl, use nss instead

Comment 41 Sandis Neilands 2016-03-04 10:02:26 UTC

John, thanks for the confirmation.

Unless there are very good reasons not to switch back to OpenSSL I don't see why RH shouldn't do it (at least until this problem is solved):
* this is a high severity issue;
* it can be reproduced with the simplest libcurl "hello world" program;
* several products are affected severely in production environments;
* there has been very little progress towards correction for this issue in upstream;
* there are alternative TLS backends that do not have this issue.

Comment 42 Kamil Dudka 2016-03-04 10:51:39 UTC

NSS maintainers seem busy.  I will have a look at this in a week or so.  Thanks for being patient!

Comment 44 zjx20 2016-03-07 02:57:07 UTC

(In reply to Kamil Dudka from comment #42)
> NSS maintainers seem busy.  I will have a look at this in a week or so. 
> Thanks for being patient!

My 7*24 program has been affected by the leaking for more than one year (since comment #18). Please fix that by switching to OpenSSL, thank you!

Comment 45 Kamil Dudka 2016-03-07 06:56:01 UTC

(In reply to zjx20 from comment #44)
> Please fix that by switching to OpenSSL, thank you!

I am afraid this will never happen in RHEL-6 but you are free to recompile curl against OpenSSL on your own responsibility.

Comment 46 zjx20 2016-03-07 07:31:59 UTC

(In reply to Kamil Dudka from comment #45)
> (In reply to zjx20 from comment #44)
> > Please fix that by switching to OpenSSL, thank you!
> 
> I am afraid this will never happen in RHEL-6 but you are free to recompile
> curl against OpenSSL on your own responsibility.

Good to know it will be landed in RHEL-7, we are using REEL-7 :)

Comment 47 Kamil Dudka 2016-03-07 08:56:05 UTC

We are going to fix this bug in RHEL-7, too, but there is currently no plan to change the TLS backend in RHEL (not even in Fedora).

Comment 48 john.haxby@oracle.com 2016-03-07 09:19:21 UTC

(In reply to Kamil Dudka from comment #47)
> We are going to fix this bug in RHEL-7, too, but there is currently no plan
> to change the TLS backend in RHEL (not even in Fedora).

Good to know.   If you want a volunteer to test the fix, count me in. Thanks.

Comment 49 Sandis Neilands 2016-03-07 09:48:33 UTC

Kamil, could you upgrade the severity to high?

According to the Severity field help:
"high-impact issues in which the customer's operation is disrupted, but there is some capacity to produce."

Some customers have to restart Zabbix servers pre-emptively on a daily basis to avoid OOM killer events, or disable SSL peer verification.

Comment 50 Kamil Dudka 2016-03-07 10:42:46 UTC

(In reply to Sandis Neilands from comment #49)
> Some customers have to restart Zabbix servers pre-emptively on a daily basis
> to avoid OOM killer events, or disable SSL peer verification.

If you are a customer, do not hesitate to contact Product Support.  Bugzilla is a bug tracking tool only.  Priority of our work is driven by customer requests.

Comment 51 zjx20 2016-04-06 03:20:06 UTC

Is there any update since last month?

Comment 52 Kamil Dudka 2016-04-07 11:02:28 UTC

(In reply to zjx20 from comment #51)
> Is there any update since last month?

I am currently working on decoupling nss-pem from nss in the "public" branch:

https://git.fedorahosted.org/cgit/nss-pem.git/log/?h=public

The next step is going to be covering the current functionality by a test-suite to prevent regressions while redesigning the internal structures of nss-pem.

Comment 56 john.haxby@oracle.com 2016-09-21 14:58:21 UTC

Any update on this?   Someone else has just discovered this problem again :(

Has the work stalled because I don't see changes since April?

Comment 57 Kamil Dudka 2016-09-21 15:23:34 UTC

(In reply to john.haxby from comment #56)
> Has the work stalled because I don't see changes since April?

Nope.  The development of nss-pem continues on GitHub:

https://github.com/kdudka/nss-pem

Pull requests are welcome!

Comment 63 Kai Engert (:kaie) (inactive account) 2016-10-27 19:09:40 UTC

clearing needinfo, question had been answered a while ago in comment 40.

Comment 66 Kamil Dudka 2016-11-18 07:27:12 UTC

The patch for this bug has been already reverted upstream because it broke Firefox.  It needs to be immediately reverted in all versions of RHEL, too.

Comment 67 Kai Engert (:kaie) (inactive account) 2016-11-18 09:19:37 UTC

Daiki, could you please handle the backout?

First, could you check where we have added this patch? Only to 6.9 ?

Comment 69 Daiki Ueno 2016-11-21 12:43:49 UTC

(In reply to Kai Engert (:kaie) from comment #67)
> Daiki, could you please handle the backout?
> 
> First, could you check where we have added this patch? Only to 6.9 ?

Yes, only to 6.9.

Comment 74 John Haxby 2017-06-13 07:35:55 UTC

What is happening with this bug?   I see https://github.com/kdudka/nss-pem was last updated at the beginning of March.  Presumably this bug also affects RHEL7, Fedora etc but it seems to have completely stalled.

Has the time come to say that using curl+nss in long-lived applications is not supported?

Comment 75 Chris Williams 2017-06-14 20:17:20 UTC

Hi John,

Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017.  During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.
 
The official life cycle policy can be reviewed here:
 
http://redhat.com/rhel/lifecycle
 
I'm not saying this wont be fixed in RHEL 6, but it's more likely to be sorted in RHEL 7, https://bugzilla.redhat.com/show_bug.cgi?id=1395803, which is still in production phase 1. Just trying to set realistic expectations given where we are with RHEL 6's life cycle. If a solution can be sorted upstream then we might be able to get it in for 6.10.

Comment 76 John Haxby 2017-06-14 20:21:08 UTC

Thanks Chris.   I wasn't really expecting a fix in RHEL6, but I didn't want it to be forgotten for RHEL7.

Comment 77 Chris Williams 2017-08-15 18:04:23 UTC

Closing this BZ for RHEL 6 per Comment 75. Work will continue on this issue in https://bugzilla.redhat.com/show_bug.cgi?id=1395803 for RHEL 7 which is still in production phase 1.

Note You need to log in before you can comment on or make changes to this bug.