Bug 1122421

Summary:	/lib64/libfreebl3.so causes memleak in snmptrapd
Product:	Red Hat Enterprise Linux 6	Reporter:	Dalibor Pospíšil <dapospis>
Component:	nss-softokn	Assignee:	Elio Maldonado Batiz <emaldona>
Status:	CLOSED NOTABUG	QA Contact:	BaseOS QE Security Team <qe-baseos-security>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	6.6	CC:	emaldona, jsafrane, rrelyea, sforsber
Target Milestone:	rc	Keywords:	Regression, Reopened
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-08-07 13:22:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dalibor Pospíšil 2014-07-23 08:18:43 UTC

Description of problem:
/lib64/libfreebl3.so from nss-softokn-freebl-3.14.3-14.el6.x86_64 causes following memleak:
==17500== 40 bytes in 1 blocks are possibly lost in loss record 790 of 1,100
==17500==    at 0x4A069EE: malloc (vg_replace_malloc.c:270)
==17500==    by 0xA2BE60B: ??? (in /lib64/libnspr4.so)
==17500==    by 0xA2D6094: ??? (in /lib64/libnspr4.so)
==17500==    by 0xA2D6468: PR_OpenFile (in /lib64/libnspr4.so)
==17500==    by 0x9A2D9D7: ??? (in /lib64/libfreebl3.so)
==17500==    by 0x9A2DD6E: ??? (in /lib64/libfreebl3.so)
==17500==    by 0x9A18338: ??? (in /lib64/libfreebl3.so)
==17500==    by 0x9A56E55: ??? (in /lib64/libfreebl3.so)
==17500==    by 0x9A059C2: ??? (in /lib64/libfreebl3.so)
==17500==    by 0x7FEFFFF46: ???
==17500==    by 0x7274706D6E732F6D: ???
==17500==    by 0x64734C2D00647060: ???
==17500==
==17500== 48 bytes in 1 blocks are possibly lost in loss record 794 of 1,100
==17500==    at 0x4A069EE: malloc (vg_replace_malloc.c:270)
==17500==    by 0xA2BE5F9: ??? (in /lib64/libnspr4.so)
==17500==    by 0xA2D6094: ??? (in /lib64/libnspr4.so)
==17500==    by 0xA2D6468: PR_OpenFile (in /lib64/libnspr4.so)
==17500==    by 0x9A2D9D7: ??? (in /lib64/libfreebl3.so)
==17500==    by 0x9A2DD6E: ??? (in /lib64/libfreebl3.so)
==17500==    by 0x9A18338: ??? (in /lib64/libfreebl3.so)
==17500==    by 0x9A56E55: ??? (in /lib64/libfreebl3.so)
==17500==    by 0x9A059C2: ??? (in /lib64/libfreebl3.so)
==17500==    by 0x7FEFFFF46: ???
==17500==    by 0x7274706D6E732F6D: ???
==17500==    by 0x64734C2D00647060: ???

Version-Release number of selected component (if applicable):
nss-softokn-freebl-3.14.3-14.el6.x86_64
net-snmp-5.5-49.el6_5.1.x86_64
valgrind-3.8.1-3.6.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. see linked test

Additional info:
it was ok in nss-softokn-freebl-3.14.3-9.el6.x86_64.rpm

Comment 3 Bob Relyea 2014-07-24 18:54:22 UTC

Is it possible to get a stack traceback with nss-softokn-debuginfo installed so I can see where in freebl the leak is supposedly happening?

bob

Comment 4 Dalibor Pospíšil 2014-07-28 15:44:17 UTC

(In reply to Bob Relyea from comment #3)
> Is it possible to get a stack traceback with nss-softokn-debuginfo installed
> so I can see where in freebl the leak is supposedly happening?
> 
> bob

after installing all related debuginfos I've got following
==10317== 40 bytes in 1 blocks are possibly lost in loss record 790 of 1,100
==10317==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==10317==    by 0xA2B460B: _PR_Getfd (prfdcach.c:112)
==10317==    by 0xA2CC094: pt_SetMethods (ptio.c:3303)
==10317==    by 0xA2CC468: PR_OpenFile (ptio.c:3581)
==10317==    by 0x9A239D7: blapi_SHVerifyFile (shvfy.c:356)
==10317==    by 0x9A23D6E: blapi_SHVerify (shvfy.c:290)
==10317==    by 0x9A0E338: bl_startup_tests (fipsfreebl.c:1541)
==10317==    by 0x9A4CE55: ??? (in /lib64/libfreebl3.so)
==10317==    by 0x99FB9C2: ??? (in /lib64/libfreebl3.so)
==10317==    by 0x7FEFFFF96: ???
==10317==    by 0x7274706D6E732F6D: ???
==10317==    by 0x64734C2D00647060: ???
==10317==
==10317== 48 bytes in 1 blocks are possibly lost in loss record 794 of 1,100
==10317==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==10317==    by 0xA2B45F9: _PR_Getfd (prfdcach.c:109)
==10317==    by 0xA2CC094: pt_SetMethods (ptio.c:3303)
==10317==    by 0xA2CC468: PR_OpenFile (ptio.c:3581)
==10317==    by 0x9A239D7: blapi_SHVerifyFile (shvfy.c:356)
==10317==    by 0x9A23D6E: blapi_SHVerify (shvfy.c:290)
==10317==    by 0x9A0E338: bl_startup_tests (fipsfreebl.c:1541)
==10317==    by 0x9A4CE55: ??? (in /lib64/libfreebl3.so)
==10317==    by 0x99FB9C2: ??? (in /lib64/libfreebl3.so)
==10317==    by 0x7FEFFFF96: ???
==10317==    by 0x7274706D6E732F6D: ???
==10317==    by 0x64734C2D00647060: ???

Comment 5 Bob Relyea 2014-07-28 17:12:45 UTC

OK, this is actually a bug in your test case. If you need to have a clean result from your test case you need to call PR_Cleanup() at the end of your testcase.

What is happening is NSPR caches some of the data it used to create it's IO structures. Even though NSS has closed the file, NSPR, keeps the data it allocated here around for future use. There is a change in NSS in 6.6, which causes it to do the PR_Open every time it's initialized rather than only when you are in FIPS mode. That is why you see the 'leak' show up.

In any case it's a one time innocuous leak (even if it were a 'real' leak, it's a one time leak), which you can make go away by calling PR_Cleanup(). I'm closing this 'not a bug' feel free to reopen if you don't agree.

Comment 6 Dalibor Pospíšil 2014-07-29 07:57:27 UTC

I cannot do anything about that in my test case. It is blackbox testing. As you can see at line 73 in http://pkgs.devel.redhat.com/cgit/tests/net-snmp/tree/Regression/bz676955-snmptrapd-leaks-memory-when-NetSNMP-TrapReceiver/runtest.sh I just execute snmptrapd using valgrind. Nspr is called indirectly so I guess even snmptrapd is not responsible for calling PR_Cleanup().

Honzo, what do you think about this behavior change?

Comment 7 Jan Safranek 2014-07-29 08:41:45 UTC

If a library decides to cache something 'automatically' in an update, it should also automatically free it (I know, it's not possible in a library :-).

I don't see complete stack trace, so I even don't know what exactly allocates the NSPR cache (i.e. who calls PR_Open) - net-snmp does not link with NSS, it must be something in perl or so. Therefore I cannot even say, where to put PR_Cleanup().


Please provide complete stack trace, e.g. by using valgrind --num-callers=50, and install all necessary debuginfo packages so we see something useful instead of ???.

Comment 8 Dalibor Pospíšil 2014-07-29 12:49:30 UTC

I don't know what more to do to get rid of those ??? any other ideas?

# rpm -qa | grep debuginfo
nss-softokn-debuginfo-3.14.3-14.el6.x86_64
nss_compat_ossl-debuginfo-0.9.6-1.el6.x86_64
nss_db-debuginfo-2.2.3-0.5.pre1.el6_5.1.x86_64
nss-debuginfo-3.16.1-9.el6.x86_64
nss-util-debuginfo-3.16.1-1.el6.x86_64
net-snmp-debuginfo-5.5-49.el6_5.1.x86_64
nss-pam-ldapd-debuginfo-0.7.5-18.2.el6_4.x86_64
nspr-debuginfo-4.10.6-1.el6.x86_64


==24784== 40 bytes in 1 blocks are possibly lost in loss record 792 of 1,110
==24784==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==24784==    by 0xA2B460B: _PR_Getfd (prfdcach.c:112)
==24784==    by 0xA2CC094: pt_SetMethods (ptio.c:3303)
==24784==    by 0xA2CC468: PR_OpenFile (ptio.c:3581)
==24784==    by 0x9A239D7: blapi_SHVerifyFile (shvfy.c:356)
==24784==    by 0x9A23D6E: blapi_SHVerify (shvfy.c:290)
==24784==    by 0x9A0E338: bl_startup_tests (fipsfreebl.c:1541)
==24784==    by 0x9A4CE55: ??? (in /lib64/libfreebl3.so)
==24784==    by 0x99FB9C2: ??? (in /lib64/libfreebl3.so)
==24784==    by 0x7FEFFFF96: ???
==24784==    by 0x7274706D6E732F6D: ???
==24784==    by 0x64734C2D00647060: ???
==24784==    by 0x7261762F00702CFF: ???
==24784==    by 0x6D6E732F6E75722E: ???
==24784==    by 0x702E64706172746F: ???
==24784==    by 0x4F4800662D006468: ???
==24784==    by 0x733D454D414E5452: ???
==24784==    by 0x6568722D736F706E: ???
==24784==    by 0x752E7172622D366B: ???
==24784==    by 0x722E737973726572: ???
==24784==    by 0x6F632E7461686464: ???
==24784==    by 0x554E494C4553006C: ???
==24784==    by 0x525F454C4F525F57: ???
==24784==    by 0x4445545345555144: ???
==24784==    by 0x45544E495F5F003C: ???
==24784==    by 0x4645445F4C414E51: ???
==24784==    by 0x5045525F544C5540: ???
==24784==    by 0x555345525F54524E: ???
==24784==    by 0x2F6E69622F3D544B: ???
==24784==    by 0x4548530065757273: ???
==24784==    by 0x2F6E69622F3D4C4B: ???
==24784==    by 0x5245540068736161: ???
==24784==    by 0x6E65657263733D4C: ???
==24784==    by 0x414C46454B414CFF: ???
==24784==    by 0x54534948003D5346: ???
==24784==    by 0x3030313D455A4952: ???
==24784==    by 0x5F4853530030302F: ???
==24784==    by 0x313D544E45494C42: ???
==24784==    by 0x2E34322E34332E2F: ???
==24784==    by 0x3639393320393130: ???
==24784==    by 0x5345540032322033: ???
==24784==    by 0x4E4F495352455653: ???
==24784==    by 0x706F6C657665643C: ???
==24784==    by 0x4E494C4553007264: ???
==24784==    by 0x435F4553555F5854: ???
==24784==    by 0x525F544E45525254: ???
==24784==    by 0x4542003D45474E40: ???
==24784==    by 0x3D42494C52454B40: ???
==24784==    by 0x6168732F7273752E: ???
==24784==    by 0x656B6165622F6571: ???
==24784==
==24784== 48 bytes in 1 blocks are possibly lost in loss record 798 of 1,110
==24784==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==24784==    by 0xA2B45F9: _PR_Getfd (prfdcach.c:109)
==24784==    by 0xA2CC094: pt_SetMethods (ptio.c:3303)
==24784==    by 0xA2CC468: PR_OpenFile (ptio.c:3581)
==24784==    by 0x9A239D7: blapi_SHVerifyFile (shvfy.c:356)
==24784==    by 0x9A23D6E: blapi_SHVerify (shvfy.c:290)
==24784==    by 0x9A0E338: bl_startup_tests (fipsfreebl.c:1541)
==24784==    by 0x9A4CE55: ??? (in /lib64/libfreebl3.so)
==24784==    by 0x99FB9C2: ??? (in /lib64/libfreebl3.so)
==24784==    by 0x7FEFFFF96: ???
==24784==    by 0x7274706D6E732F6D: ???
==24784==    by 0x64734C2D00647060: ???
==24784==    by 0x7261762F00702CFF: ???
==24784==    by 0x6D6E732F6E75722E: ???
==24784==    by 0x702E64706172746F: ???
==24784==    by 0x4F4800662D006468: ???
==24784==    by 0x733D454D414E5452: ???
==24784==    by 0x6568722D736F706E: ???
==24784==    by 0x752E7172622D366B: ???
==24784==    by 0x722E737973726572: ???
==24784==    by 0x6F632E7461686464: ???
==24784==    by 0x554E494C4553006C: ???
==24784==    by 0x525F454C4F525F57: ???
==24784==    by 0x4445545345555144: ???
==24784==    by 0x45544E495F5F003C: ???
==24784==    by 0x4645445F4C414E51: ???
==24784==    by 0x5045525F544C5540: ???
==24784==    by 0x555345525F54524E: ???
==24784==    by 0x2F6E69622F3D544B: ???
==24784==    by 0x4548530065757273: ???
==24784==    by 0x2F6E69622F3D4C4B: ???
==24784==    by 0x5245540068736161: ???
==24784==    by 0x6E65657263733D4C: ???
==24784==    by 0x414C46454B414CFF: ???
==24784==    by 0x54534948003D5346: ???
==24784==    by 0x3030313D455A4952: ???
==24784==    by 0x5F4853530030302F: ???
==24784==    by 0x313D544E45494C42: ???
==24784==    by 0x2E34322E34332E2F: ???
==24784==    by 0x3639393320393130: ???
==24784==    by 0x5345540032322033: ???
==24784==    by 0x4E4F495352455653: ???
==24784==    by 0x706F6C657665643C: ???
==24784==    by 0x4E494C4553007264: ???
==24784==    by 0x435F4553555F5854: ???
==24784==    by 0x525F544E45525254: ???
==24784==    by 0x4542003D45474E40: ???
==24784==    by 0x3D42494C52454B40: ???
==24784==    by 0x6168732F7273752E: ???
==24784==    by 0x656B6165622F6571: ???

Comment 9 Bob Relyea 2014-07-29 16:58:57 UTC

> If a library decides to cache something 'automatically' in an update, it should > also automatically free it (I know, it's not possible in a library :-).

The library didn't suddenly decide to automatically cache freed data structures, the library has always done this. The change is the test case was never run in NSS FIPS mode. Now FIPS requires us to run that code always.

The bug has always been in the application, it was just never visible before (it's latent). It's also not a serious bug. There isn't a real leak here. The memory in question always says in the NSPR free list. There is nothing NSS can do to make the leak go away because the PR_Cleanup() call has to happen when the application is through using the NSS and NSPR libraries.

PR_Cleanup() is also only needed to silence these kinds of false leaks issues, so for most uses, it's not really an error not to call it unless you are doing leak detection.

Upshot either the testcase needs to call PR_Cleanup() (preferable), or white list this 'memory leak'.

bob

Comment 10 Suzanne Forsberg 2014-08-07 13:22:09 UTC

I am closing this again. I don;t need to report on this every week at the RHEL meeting if it is really not a bug in nss-softokn. If this is truly a bug in the nss code, re-open.

Comment 11 Dalibor Pospíšil 2014-09-02 11:54:00 UTC

I have found another leak on s390x only.
==44581== 280 bytes in 1 blocks are definitely lost in loss record 1,056 of 1,103 
==44581==    at 0x4808AC4: calloc (vg_replace_malloc.c:593)
==44581==    by 0x57FBFB7: _PR_InitThreads (ptthread.c:956)
==44581==    by 0x57EB5EF: _PR_InitStuff (prinit.c:180)
==44581==    by 0x57EB885: PR_CallOnce (prinit.c:219)
==44581==    by 0x574C3AD: BL_Init (rsa.c:1505)
==44581==    by 0x5736757: bl_startup_tests (fipsfreebl.c:1721)
==44581==    by 0x576D537: ??? (in /lib64/libfreebl3.so)

Is this the same story?

Comment 12 Red Hat Bugzilla 2023-09-14 02:11:55 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days