Same issue as described in Bug #1057388 exists when libcurl-devel-7.29.0-35.el7.x86_64 is used together with nss-3.21.3-2.el7_3.x86_64
+++ This bug was initially created as a clone of Bug #1057388 +++
Description of problem:
s3backer is a FUSE filesystem that exposes an Amazon S3 bucket as a file in your local filesystem. It relies on libcurl for communication with Amazon. RHEL's libcurl relies on nss as its TLS provider. Subjecting s3backer to prolonged stress on RHEL/CentOS 6.4 with the stock libcurl will result in unbounded memory consumption. Analysis with valgrind's memcheck tool shows no significant memory leaks. However, recompiling libcurl to use OpenSSL as its TLS provider instead of NSS makes the issue go away.
Version-Release number of selected component (if applicable):
It happens every time.
Steps to Reproduce:
1. Spin up a RHEL/CentOS 6.x AWS instance and create an Amazon S3 bucket. In my case, the bucket has a 1M object size. Note that the only thing that is probably necessary here is a S3-like service, but to ensure reproducibility, I am providing precise parameters.
2. Install s3backer.
3. Mount the S3 bucket with a command such as the following; note sensitive details such as keys have been sanitized:
sudo s3backer --readAhead=0 --blockSize=1M --size=500G --listBlocks --vhost --baseURL=https://s3.amazonaws.com/ --timeout=15 --blockCacheThreads=10 --accessId=<accessId> --accessKey=<accessKey> --block
CacheSize=10 <bucket> <mountpoint>
4. Run dd on the mounted s3 bucket:
sudo dd if=/dev/urandom of=<mountpoint>/file bs=1M count=102400 iflag=nonblock,noatime oflag=nonblock,noatime
The workload where I originally observed this involved running a file system on top of s3backer that is not supported by Redhat. However, the same behavior can be reproduced by running dd on top of s3backer. It becomes increasingly obvious when using /proc/$PID/status to view memory usage over the span of a 2-4 hours. While the system running s3backer need not be on AWS, access to S3 from outside Amazon is quite slow and I highly recommend placing it there. A long running program that exercises the same code paths in libcurl and nss as s3backer should also exhibit this problem.
VmRSS slowly grows until either the dd stops or the OOM killer triggers.
VmRSS should stabilize with VmHWM not changing after the first hour or so.
Analysis with Valgrind's massif tool suggests shows multiple backtraces containing NSS in the same snapshot and the share of their memory usage grows as a function of instructions. Here is an excerpt of one of the backtraces:
| ->21.93% (13,056,000B) 0xE91C59A: ???
| | ->21.93% (13,056,000B) 0xE9218C7: ???
| | ->21.93% (13,056,000B) 0xE928520: ???
| | ->21.93% (13,056,000B) 0x37C56470C8: PK11_CreateNewObject (pk11obj.c:378)
| | ->21.93% (13,056,000B) 0x37C5647361: PK11_CreateGenericObject (pk11obj.c:1415)
| | ->21.93% (13,056,000B) 0x37C823F49E: nss_create_object (nss.c:349)
| | ->21.93% (13,056,000B) 0x37C823F625: nss_load_cert (nss.c:383)
| | ->21.93% (13,056,000B) 0x37C8240E0E: Curl_nss_connect (nss.c:1095)
| | ->21.93% (13,056,000B) 0x37C8238480: Curl_ssl_connect (sslgen.c:185)
| | ->21.93% (13,056,000B) 0x37C8216EC9: Curl_http_connect (http.c:1796)
| | ->21.93% (13,056,000B) 0x37C821D680: Curl_protocol_connect (url.c:3077)
| | ->21.93% (13,056,000B) 0x37C8223B3A: Curl_connect (url.c:4743)
| | ->21.93% (13,056,000B) 0x37C822BBAE: Curl_perform (transfer.c:2523)
| | ->21.93% (13,056,000B) 0x409E30: http_io_perform_io (http_io.c:1437)
| | ->18.49% (11,009,280B) 0x40CB36: http_io_write_block (http_io.c:1348)
| | | ->18.49% (11,009,280B) 0x407422: ec_protect_write_block (ec_protect.c:433)
| | | ->18.49% (11,009,280B) 0x404CFD: block_cache_worker_main (block_cache.c:1090)
| | | ->18.49% (11,009,280B) 0x379120784F: start_thread (pthread_create.c:301)
| | | ->18.49% (11,009,280B) 0x3790AE894B: clone (clone.S:115)
Recompiling libcurl with OpenSSL as its TLS provider makes the issue go away, which strongly suggests a problem either in NSS or in how libcurl uses NSS.
I am not familiar with the NSS codebase and I am concerned about the possibility of introducing a security hole should I attempt to patch this myself. Redhat employees in #rhel on freenode informed me that Redhat employs multiple NSS developers and suggested that I file a report here for them.
There are 2 fixes needed for this problem. The nsspem and nss. The nss portion needs an upstream patch as it changes the ABI semantics. Moving to 7.5
We better fix it for 7.5. I'll dev ack it .
Fixed in nss-3.34.0-0.1.beta1.el7
The new api is PK11_CreateManagedObject.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.