Description of problem: Running rhn_check with a number of scheduled actions applicable on RHN will cause rhn_check to loop over each scheduled item. Each iteration leaks memory and it takes very little time to reach the OOM killer. Version-Release number of selected component (if applicable): rpm-check 0.4.20-9.el5 How reproducible: everything Steps to Reproduce: 1. schedule some actions on RHN 2. run rhn_check 3. watch memory grow Additional info: Didn't see a rhn-check component on bugzilla. thanks
RHEL5.5 beta fails in the same manner. Processing 44 of 71 scheduled errata was enough to hit OOM on 650M/1.2G swap memory machine. I also created SR# 2000154 on this thank you.
Hi, This leak only appears to happen when you have scheduled errata for application. Just scheduling packages for install will not provoke the leak. thank you!
The memory consumption grows in /usr/share/rhn/actions/packages.py, in _run_yum_action() routine, when calling yum_base.buildTransaction() and yum_base.doTransaction()
This problem is also present in RHEL 5.3 and was not introduced in RHEL 5.4 as some of the above comments (or currently attached ITs) suggest.
This essentially is the same problem as the one described in bug #470838
Greetings James, I'd very much appreciate your advice or hint on this bug report. Here's a link to yum-rhn-plugin code, that's used by rhn_check when applying errata to a system: http://git.fedorahosted.org/git/?p=spacewalk.git;a=blob_plain;f=client/rhel/yum-rhn-plugin/actions/packages.py;hb=HEAD For every scheduled errata, YumAction.doTransaction() is called at some point, which calls YumBases's runTransaction() at the very end. The memory consumption grows significantly, when self.ts.run() is called inside runTransaction(). The reason I'm asking you for advice is that I'm not sure whether we're looking at some rpm-python bug or whether the way we're using yum libraries is plain broken. Thank you.
(In reply to comment #11) > I think a lot of the RHN code that uses yum APIs is "non-optimal" at > least, but then it's pretty old. > > So I'm not sure which bits you want me to look at in particular. > > I don't understand the old code in comment #4, p[0] should traceback with > KeyError ... no? Looking at getInstalledPackageList closer, this is > duplicating a bunch of objects in rpmdb, although it is throwing > the headers away. Comment #4 is a bit misleading. It shows some changes made in rhn-client-tools code between RHEL-5.4 and RHEL-5.5, though I don't believe those changes cause the discussed problem (the big memory consumption was present also before RHEL-5.4. > The doTransaction() in that file doesn't look like it is doing much that > the > yum side wouldn't do. In general I'd expect memory usage to grow in > runTransaction() because the depsolver runs then, and (although I'm not sure) > you might be hitting a bunch of caching stuff in yum that doesn't get hit > before that in your call paths. It's really hard to say if this is "bad" > or not. > > Just looking in that file: > > getInstalledPkgObject is slow, I guess you should be calling > rpmdb.searchNevra(). Certainly never parsePackages. > > I'm unsure how runTransaction() can work, it's altering tuples ... > which should give: > > TypeError: 'tuple' object does not support item assignment > > ...and add_transaction_data() doesn't do any checking. But neither > of the last two should cause memory leaks. > > What do you do after the transaction runs ... do you del the YumBase > object (does it all go away, if you do)? We've had a couple of circular > reference bugs in YumBase, over time. There's only one yum_base object (instance of YumAction(YumBase) class) defined at the packages.py module level, no deleting. Nonetheless, the memory leak (or memory consumption) problem can be reproduced without involving any RHN code whatsoever. Install RHEL-5.5 (latest - greatest), setup a yum repo (for example EPEL-5, no registration to RHN is required) and start yum shell. In yum shell, install couple of packages, single transaction for every package: > install package1 > ts run ... > install package2 > ts run ... > etc ... Never leave yum shell! Watch the memory of yum process growing every time you execute the transaction. Sooner or later (depending on how much memory your system has), ook-killer zooms in and kills your yum.
Ahh, cool, thanks ... I should be able to fix that, although $DIETY knows when it'll get into RHEL :). I'll reassign to me for now.
This is interesting, if I do a loop of "remove blah; install blah;" then on RHEL-5 I lose about 13 MB for each op. (26MB for each pass of the loop). On F-13 I lose maybe a couple of 100k. Cc'ing David Malcolm. David I remember you saying something about a leak you'd found out about at pycon ... could this be it? FYI to the RHN guys, RHEL-5 doesn't leak if I do the "normal" YumBase() create/del test ... how hard would it be to create a new YumBase() for each install set?
The python 2.4 bug is: https://bugzilla.redhat.com/show_bug.cgi?id=569093 ...and I'd hope that wouldn't be what is hitting us here, but I can't be sure (David ... I don't suppose you have a test python I can use?).
(In reply to comment #15) > The python 2.4 bug is: > > https://bugzilla.redhat.com/show_bug.cgi?id=569093 > > ...and I'd hope that wouldn't be what is hitting us here, but I can't be sure > (David ... I don't suppose you have a test python I can use?). See https://bugzilla.redhat.com/show_bug.cgi?id=569093#c4
I already tried that ... but it seems to have timed out or something. At least I can't see any rpms to download from the build. I was hoping you might have saved them somewhere.
Ok, just checked a rebuild and that didn't fix it.
I keep hitting this memory leakage myself. Anyone know a work around?
Even after I update the rhn_check to: rhn-check-0.4.20-33.el5_5.1 The issue of memory grow and eventual crash resurfaces.
Ok, so after many hours of debugging the problem appears to be this line in runTransaction(): errors = self.ts.run(cb.callback, '') ...my understanding is that this is all rpm. And this happens even if I start a new YumBase() for each transaction. So Panu, and known leaks in ts.run?
I don't recall any known memory leaks in rpmtsRun() of 4.4.x, but that doesn't mean there aren't any... however such leaks would've been there forever. Any idea when did this problem start occurring? Comment #8 says it was present in RHEL 5.3 already, what about older? What I do remember though is a severe memory fragmentation issue when calling ts.run() several times (especially bad from python, for whatever reason), see bug 472507: the first ts.run() call runs in "reasonable" memory, the second one already blows through the roof in some circumstances and more ts.run() calls you do, the worse it probably gets. The fragmentation issue was addressed in RHEL 5.4 by using a more reasonable reallocation scheme for the problematic case but addressed != entirely fixed. If somebody can reproduce this with valgrind (run those single item transactions until memory starts ballooning, exit before it gets killed by OOM), that'd make it easier to see if its actually leaking or if its something else.
Created attachment 409781 [details] valgrind --tool=memcheck yum shell
Thanks, Milan. Does the problem go away if you boot with SELinux fully disabled, ie append 'selinux=0' to kernel command line in grub? (note that this will mess up SELinux context labeling, dont try on production boxes)
Interestingly, the problem does go away with selinux fully disabled. The memory grows a little during the transaction execution, though drops back when it finishes (which it did not with selinux on). You can run more transactions from inside yum shell, the memory always drops back to the state before.
Good, thanks for confirming. Easy fix then. This selinux context initialization leak is about as old as SELinux "support" in rpm: it calls matchpathcon_init() at beginning of every transaction but never calls matchpathcon_fini() which would free up the memory. In normal rpm/yum usage patterns this doesn't make much of a difference but with a big number of transactions within a process lifetime it starts adding up. (aside: it's also a somewhat dumb behavior from libselinux - matchpathcon_init() doesn't return a handle for the caller to free but takes care of bookkeeping by internally, so it could just as well handle repeated matchpathcon_init() calls intelligently but doesn't)
Hello Red Hat, Please consider this for an async errata release. Waiting till RHEL 5.6 will only break more machines as the fix won't be in place in time for when the bug occurs. Yes, I asked on my GSS Support Ticket as well. Regardless, thanks for fixing this issue.
Disabling SELinux is not a fix. It's a work around. We need an official fix for this bug.
I didn't suggest disabling SELinux as a fix or a workaroud but to confirm the leak was indeed related to SELinux handling within rpm.
*** Bug 470838 has been marked as a duplicate of this bug. ***
RedHat, any comments on getting this out for async errata? Again, waiting till RHEL 5.6 defeats the purpose of fixing this bug. daryl
Is there any progress in addressing this bug? It is creating skepticism at my shop towards Red Hat, as upper management comments on how derided MS is when they are slow in releasing bug fixes...and now Red Hat is following suite...? I still got "Faith of Heart." Red Hat..."Don't Let me Down!"
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0124.html
*** Bug 651501 has been marked as a duplicate of this bug. ***