Red Hat Bugzilla – Bug 246564
setroubleshoot has large numbers of files open
Last modified: 2008-01-09 11:48:26 EST
Description of problem:
setroubleshootd has /var/lib/rpm/* open many times each, I'm guessing this is a
bug ... but I don't think it's doing anything bad. Eg.
% sudo lsof | egrep setroub | wc -l
% sudo lsof | egrep setroub | egrep /var/lib/rpm/ | wc -l
% sudo lsof | egrep setroub | egrep /var/lib/rpm/ | tail
setrouble 2141 root 1014r REG 253,0 12288 11261200
setrouble 2141 root 1015rR REG 253,0 53370880 11261194
setrouble 2141 root 1016r REG 253,0 90112 11261198
setrouble 2141 root 1017r REG 253,0 12288 11261200
setrouble 2141 root 1018rR REG 253,0 53370880 11261194
setrouble 2141 root 1019r REG 253,0 10100736 11261199
setrouble 2141 root 1020r REG 253,0 12288 11261200
setrouble 2141 root 1021rR REG 253,0 53370880 11261194
setrouble 2141 root 1022r REG 253,0 90112 11261198
setrouble 2141 root 1023r REG 253,0 12288 11261200
Version-Release number of selected component (if applicable):
% rpm -q setroubleshoot
Thank you alerting me to this. Setroubleshoot does call into rpm to lookup what
package(s) are involved in an AVC when it knows the path information. It creates
an rpm transaction set to perform the query. A quick examination of the code
reveals there is no explicit close of the transation set. Perhaps these open
files are a consequence of that. However, we are using the python bindings and
one would expect when the transaction set goes out of scope and its ref count
drops to zero the binding code would clean up. I will investigate further ...
It is "doing something bad" in the sense that it's leaving so many stale locks
behind it prevents rpm from working at all, see bug 245389.
John, are the rpmdb queries done from a reoccurring thread? I think that would
explain the situation:
1) setroubleshoot lauches a thread
2) does an rpmdb lookup in the thread
3) tracebacks due to whatever reason in the thread
4) thread exists uncleanly -> stale lock is left behind
5) goto 1)
James, if you can reproduce this, please try running "setroubleshootd -f" to see
if there are tracebacks present.
Oh and btw, the bindings are supposed to clean things up as things go out of
scope, and normally do so (of course it's possible there are bugs wrt that in
the python bindings, one I just recently fixed but the fix is not yet in F7).
However it is a good idea to explicitly delete the ts instance immediately when
no longer needed to avoid unnecessarily holding locks on the db.
Thank you Panu. A few quick notes. I did do more investigation after this bug
was filed. I looked at the rpm python bindings to make sure things were properly
ref counted and cleaned up. I also constructed a stand alone test and could not
reproduce it. I was not able to reproduce it with a full setroubleshoot. I do
believe its happening though.
One thing I did do was to make sure every time we call rpm it's wrapped in a
try/except block. If memory serves me correcty that had not been the case. FWIW
when setroubleshootd does get an exception it's usually logged to
/var/log/setroubleshoot/setroublehootd.log and syslog. So if it was getting an
exception I would expect it to have been logged, but I've also seen cases where
exceptions were not logged (usually because they were caught by another library).
Panu, you may be onto something with your comment about threads. The calls to
rpm do occur in a new thread each time we run analysis so you might be right.
James, the version with the try/except wrapping is in mercurial and has not been
pushed. I am leaving now and won't be back till 8/6. I will follow up then.
setroubleshootd has 780 open file descriptors to my rpmdb contents. This is a
very serious bug, as it locks up my rpmdb in just a few hours of running.
n0dalus, can you try rpm 188.8.131.52 from updates-testing and see what happens with
Panu, what does the new rpm package you cite in comment #5 fix? Did you discover
a problem in rpm?
Looks like this issue is affecting others as well, bug #249990 and bug #253679
look like the same issue.
The new rpm package just adds a hook to perform cleanup in case a python process
tracebacks with rpmdb open, bug 245349 comment #26 confirms the fix working at
least for that person. That rpm-python didn't have such a cleanup mechanism in
place could easily be considered as a problem in rpm, yes :)
Those two bugs are certainly at least related and I've a couple open filed
against rpm as well.
Re comment #7, I don't think bug 245349 is the right reference. typo?
What did you do in rpm-python?
If a python thread exits due to a traceback are there references left in place?
No matter how a thread exits I would have expected the references it held to
have been decremented on exit just as if it exited normally. Is that not the
case and if so does that imply the CPython code in rpm has to somehow hook into
the exception handling logic?
Is the rpm-python code following the conventions outlined here:
I noticed that rpmts_dealloc is not calling rpmtsCloseDB, is that a problem?
Duh, yes I meant bug 245389 :)
The problem is that when python process/thread exits with an uncaught exception,
normal refcount based cleanup doesn't happen. So rpm-python now plants in an
exithook that gets called on even in the case of a traceback so it can cleanup
any open BDB iterators and locks. No messing with exception handling as such.
In other words: it's nothing you should worry about, rpm(-python) was just made
a little bit more robust. The tracebacking code should still be fixed of course,
but dying uncleanly isn't that dangerous for rpm anymore.
Oh and btw: do make sure you're not accessing rpmdb across threads (eg creating
a transaction set in one thread and reading from it in another) - rpm is NOT
thread safe. Adding protection against use across threads is in my todo, right
now you just need to be careful.
*** Bug 249990 has been marked as a duplicate of this bug. ***
*** Bug 253679 has been marked as a duplicate of this bug. ***
*** Bug 244257 has been marked as a duplicate of this bug. ***
please note that setroubleshoot-1.10.1-1.fc7 has been pushed to testing. This
version should do a better job of catching any tracebacks in the analysis thread
and report the cause of the traceback. It should also prevent any abnormal
thread termination which is suspected of causing the abnormal rpm resource leakage.
The setroubleshoot in F7 updates-testing has broken deps - it requires an
setroubleshoot-plugins which doesn't exist.
I can't see an rpm-184.108.40.206 in updates-testing at all.
rpm-220.127.116.11 has already been moved to final updates.
closing this as the fundamental bug was in rpm and has been fixed. Also current
versions of setroubleshoot are much more robust with catching exceptions.