566746 – rpm: selinux context initialization memory leak

Bug 566746 - rpm: selinux context initialization memory leak

Summary: rpm: selinux context initialization memory leak

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	rpm
Sub Component:
Version:	5.5
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Panu Matilainen
QA Contact:	BaseOS QE Security Team
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	470838 651501 (view as bug list)
Depends On:
Blocks:	502912 627630
TreeView+	depends on / blocked

Reported:	2010-02-19 15:56 UTC by daryl herzmann
Modified:	2018-11-14 19:39 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-01-13 23:47:57 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
valgrind --tool=memcheck yum shell (56.51 KB, application/octet-stream) 2010-04-28 11:42 UTC, Milan Zázrivec	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0124	0	normal	SHIPPED_LIVE	rpm bug fix and enhancement update	2011-01-12 17:28:36 UTC

Description daryl herzmann 2010-02-19 15:56:10 UTC

Description of problem:
Running rhn_check with a number of scheduled actions applicable on RHN will cause rhn_check to loop over each scheduled item.  Each iteration leaks memory and it takes very little time to reach the OOM killer.

Version-Release number of selected component (if applicable):
rpm-check 0.4.20-9.el5

How reproducible:
everything

Steps to Reproduce:
1. schedule some actions on RHN
2. run rhn_check
3. watch memory grow
  
Additional info:

Didn't see a rhn-check component on bugzilla.

thanks

Comment 1 daryl herzmann 2010-03-04 15:13:43 UTC

RHEL5.5 beta fails in the same manner.

Processing 44 of 71 scheduled errata was enough to hit OOM on 650M/1.2G swap memory machine.

I also created SR# 2000154 on this

thank you.

Comment 2 daryl herzmann 2010-03-10 13:27:11 UTC

Hi,

This leak only appears to happen when you have scheduled errata for application.  Just scheduling packages for install will not provoke the leak.

thank you!

Comment 7 Milan Zázrivec 2010-04-20 15:24:08 UTC

The memory consumption grows in /usr/share/rhn/actions/packages.py,
in _run_yum_action() routine, when calling yum_base.buildTransaction()
and yum_base.doTransaction()

Comment 8 Milan Zázrivec 2010-04-21 15:21:56 UTC

This problem is also present in RHEL 5.3 and was not introduced
in RHEL 5.4 as some of the above comments (or currently attached
ITs) suggest.

Comment 9 Milan Zázrivec 2010-04-22 10:10:48 UTC

This essentially is the same problem as the one described in bug #470838

Comment 10 Milan Zázrivec 2010-04-26 18:22:49 UTC

Greetings James, I'd very much appreciate your advice or hint on this
bug report.

Here's a link to yum-rhn-plugin code, that's used by rhn_check
when applying errata to a system:

http://git.fedorahosted.org/git/?p=spacewalk.git;a=blob_plain;f=client/rhel/yum-rhn-plugin/actions/packages.py;hb=HEAD

For every scheduled errata, YumAction.doTransaction() is called at some point,
which calls YumBases's runTransaction() at the very end.

The memory consumption grows significantly, when self.ts.run() is called
inside runTransaction().

The reason I'm asking you for advice is that I'm not sure whether we're
looking at some rpm-python bug or whether the way we're using yum libraries
is plain broken.

Thank you.

Comment 12 Milan Zázrivec 2010-04-27 11:52:34 UTC

(In reply to comment #11)
> I think a lot of the RHN code that uses yum APIs is "non-optimal" at
> least, but then it's pretty old.
> 
> So I'm not sure which bits you want me to look at in particular.
> 
> I don't understand the old code in comment #4, p[0] should traceback with
> KeyError ... no? Looking at getInstalledPackageList closer, this is
> duplicating a bunch of objects in rpmdb, although it is throwing
> the headers away.

Comment #4 is a bit misleading. It shows some changes made in
rhn-client-tools code between RHEL-5.4 and RHEL-5.5, though
I don't believe those changes cause the discussed problem (the
big memory consumption was present also before RHEL-5.4.

> The doTransaction() in that file doesn't look like it is doing much that
> the > yum side wouldn't do. In general I'd expect memory usage to grow in
> runTransaction() because the depsolver runs then, and (although I'm not sure)
> you might be hitting a bunch of caching stuff in yum that doesn't get hit
> before that in your call paths. It's really hard to say if this is "bad"
> or not.
> 
> Just looking in that file:
> 
> getInstalledPkgObject is slow, I guess you should be calling
> rpmdb.searchNevra(). Certainly never parsePackages.
> 
> I'm unsure how runTransaction() can work, it's altering tuples ...
> which should give:
> 
>   TypeError: 'tuple' object does not support item assignment
> 
> ...and add_transaction_data() doesn't do any checking. But neither
> of the last two should cause memory leaks.
> 
>  What do you do after the transaction runs ... do you del the YumBase
> object (does it all go away, if you do)? We've had a couple of circular
> reference bugs in YumBase, over time.    

There's only one yum_base object (instance of YumAction(YumBase) class)
defined at the packages.py module level, no deleting.

Nonetheless, the memory leak (or memory consumption) problem can
be reproduced without involving any RHN code whatsoever.

Install RHEL-5.5 (latest - greatest), setup a yum repo (for example
EPEL-5, no registration to RHN is required) and start yum shell.

In yum shell, install couple of packages, single transaction for every
package:

> install package1
> ts run
...
> install package2
> ts run
...
> etc ...

Never leave yum shell!

Watch the memory of yum process growing every time you execute
the transaction. Sooner or later (depending on how much memory your
system has), ook-killer zooms in and kills your yum.

Comment 13 James Antill 2010-04-27 14:34:44 UTC

Ahh, cool, thanks ... I should be able to fix that, although $DIETY knows when it'll get into RHEL :).

I'll reassign to me for now.

Comment 14 James Antill 2010-04-27 16:56:06 UTC

 This is interesting, if I do a loop of "remove blah; install blah;" then on RHEL-5 I lose about 13 MB for each op. (26MB for each pass of the loop).
 On F-13 I lose maybe a couple of 100k.

 Cc'ing David Malcolm.

 David I remember you saying something about a leak you'd found out about at pycon ... could this be it?

 FYI to the RHN guys, RHEL-5 doesn't leak if I do the "normal" YumBase() create/del test ... how hard would it be to create a new YumBase() for each install set?

Comment 15 James Antill 2010-04-27 17:01:18 UTC

The python 2.4 bug is:

https://bugzilla.redhat.com/show_bug.cgi?id=569093

...and I'd hope that wouldn't be what is hitting us here, but I can't be sure (David ... I don't suppose you have a test python I can use?).

Comment 16 Dave Malcolm 2010-04-27 18:46:45 UTC

(In reply to comment #15)
> The python 2.4 bug is:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=569093
> 
> ...and I'd hope that wouldn't be what is hitting us here, but I can't be sure
> (David ... I don't suppose you have a test python I can use?).    
See https://bugzilla.redhat.com/show_bug.cgi?id=569093#c4

Comment 17 James Antill 2010-04-27 19:44:36 UTC

I already tried that ... but it seems to have timed out or something. At least I can't see any rpms to download from the build. I was hoping you might have saved them somewhere.

Comment 18 James Antill 2010-04-27 20:34:02 UTC

Ok, just checked a rebuild and that didn't fix it.

Comment 19 blackforest 2010-04-27 21:28:44 UTC

I keep hitting this memory leakage myself. Anyone know a work around?

Comment 20 blackforest 2010-04-27 21:34:49 UTC

Even after I update the rhn_check to:  rhn-check-0.4.20-33.el5_5.1
The issue of memory grow and eventual crash resurfaces.

Comment 22 James Antill 2010-04-27 21:51:07 UTC

Ok, so after many hours of debugging the problem appears to be this line in runTransaction():

        errors = self.ts.run(cb.callback, '')

...my understanding is that this is all rpm. And this happens even if I start a new YumBase() for each transaction.
 So Panu, and known leaks in ts.run?

Comment 23 Panu Matilainen 2010-04-28 06:26:16 UTC

I don't recall any known memory leaks in rpmtsRun() of 4.4.x, but that doesn't mean there aren't any... however such leaks would've been there forever. Any idea when did this problem start occurring? Comment #8 says it was present in RHEL 5.3 already, what about older?

What I do remember though is a severe memory fragmentation issue when calling ts.run() several times (especially bad from python, for whatever reason), see bug 472507: the first ts.run() call runs in "reasonable" memory, the second one already blows through the roof in some circumstances and more ts.run() calls you do, the worse it probably gets. The fragmentation issue was addressed in RHEL 5.4 by using a more reasonable reallocation scheme for the problematic case but addressed != entirely fixed.

If somebody can reproduce this with valgrind (run those single item transactions until memory starts ballooning, exit before it gets killed by OOM), that'd make it easier to see if its actually leaking or if its something else.

Comment 24 Milan Zázrivec 2010-04-28 11:42:15 UTC

Created attachment 409781 [details]
valgrind --tool=memcheck yum shell

Comment 25 Panu Matilainen 2010-04-28 12:23:00 UTC

Thanks, Milan. Does the problem go away if you boot with SELinux fully disabled, ie append 'selinux=0' to kernel command line in grub? (note that this will mess up SELinux context labeling, dont try on production boxes)

Comment 26 Milan Zázrivec 2010-04-28 12:35:51 UTC

Interestingly, the problem does go away with selinux fully disabled.

The memory grows a little during the transaction execution, though
drops back when it finishes (which it did not with selinux on).

You can run more transactions from inside yum shell, the memory
always drops back to the state before.

Comment 27 Panu Matilainen 2010-04-28 13:07:55 UTC

Good, thanks for confirming. Easy fix then.

This selinux context initialization leak is about as old as SELinux "support" in rpm: it calls matchpathcon_init() at beginning of every transaction but never calls matchpathcon_fini() which would free up the memory. In normal rpm/yum usage patterns this doesn't make much of a difference but with a big number of transactions within a process lifetime it starts adding up.

(aside: it's also a somewhat dumb behavior from libselinux - matchpathcon_init() doesn't return a handle for the caller to free but takes care of bookkeeping by internally, so it could just as well handle repeated matchpathcon_init() calls intelligently but doesn't)

Comment 28 daryl herzmann 2010-04-30 14:15:48 UTC

Hello Red Hat,

Please consider this for an async errata release.  Waiting till RHEL 5.6 will only break more machines as the fix won't be in place in time for when the bug occurs.  Yes, I asked on my GSS Support Ticket as well.

Regardless, thanks for fixing this issue.

Comment 29 blackforest 2010-04-30 20:57:02 UTC

Disabling SELinux is not a fix. It's a work around. We need an official fix for this bug.

Comment 30 Panu Matilainen 2010-05-03 06:26:37 UTC

I didn't suggest disabling SELinux as a fix or a workaroud but to confirm the leak was indeed related to SELinux handling within rpm.

Comment 37 Milan Zázrivec 2010-07-12 09:50:03 UTC

*** Bug 470838 has been marked as a duplicate of this bug. ***

Comment 51 daryl herzmann 2010-08-25 14:52:40 UTC

RedHat, any comments on getting this out for async errata?  Again, waiting till RHEL 5.6 defeats the purpose of fixing this bug.  

daryl

Comment 52 blackforest 2010-08-25 15:13:42 UTC

Is there any progress in addressing this bug? It is creating skepticism at my shop towards Red Hat, as upper management comments on how derided MS is when they are slow in releasing bug fixes...and now Red Hat is following suite...?
I still got "Faith of Heart." 
Red Hat..."Don't Let me Down!"

Comment 61 errata-xmlrpc 2011-01-13 23:47:57 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0124.html

Comment 62 Milan Zázrivec 2011-01-21 10:58:12 UTC

*** Bug 651501 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.