Description of problem: setroubleshootd has /var/lib/rpm/* open many times each, I'm guessing this is a bug ... but I don't think it's doing anything bad. Eg. % sudo lsof | egrep setroub | wc -l 1095 % sudo lsof | egrep setroub | egrep /var/lib/rpm/ | wc -l 1014 % sudo lsof | egrep setroub | egrep /var/lib/rpm/ | tail setrouble 2141 root 1014r REG 253,0 12288 11261200 /var/lib/rpm/Pubkeys setrouble 2141 root 1015rR REG 253,0 53370880 11261194 /var/lib/rpm/Packages setrouble 2141 root 1016r REG 253,0 90112 11261198 /var/lib/rpm/Name setrouble 2141 root 1017r REG 253,0 12288 11261200 /var/lib/rpm/Pubkeys setrouble 2141 root 1018rR REG 253,0 53370880 11261194 /var/lib/rpm/Packages setrouble 2141 root 1019r REG 253,0 10100736 11261199 /var/lib/rpm/Basenames setrouble 2141 root 1020r REG 253,0 12288 11261200 /var/lib/rpm/Pubkeys setrouble 2141 root 1021rR REG 253,0 53370880 11261194 /var/lib/rpm/Packages setrouble 2141 root 1022r REG 253,0 90112 11261198 /var/lib/rpm/Name setrouble 2141 root 1023r REG 253,0 12288 11261200 /var/lib/rpm/Pubkeys Version-Release number of selected component (if applicable): % rpm -q setroubleshoot setroubleshoot(0:1.9.4-2.fc7).noarch
Thank you alerting me to this. Setroubleshoot does call into rpm to lookup what package(s) are involved in an AVC when it knows the path information. It creates an rpm transaction set to perform the query. A quick examination of the code reveals there is no explicit close of the transation set. Perhaps these open files are a consequence of that. However, we are using the python bindings and one would expect when the transaction set goes out of scope and its ref count drops to zero the binding code would clean up. I will investigate further ...
It is "doing something bad" in the sense that it's leaving so many stale locks behind it prevents rpm from working at all, see bug 245389. John, are the rpmdb queries done from a reoccurring thread? I think that would explain the situation: 1) setroubleshoot lauches a thread 2) does an rpmdb lookup in the thread 3) tracebacks due to whatever reason in the thread 4) thread exists uncleanly -> stale lock is left behind 5) goto 1) James, if you can reproduce this, please try running "setroubleshootd -f" to see if there are tracebacks present. Oh and btw, the bindings are supposed to clean things up as things go out of scope, and normally do so (of course it's possible there are bugs wrt that in the python bindings, one I just recently fixed but the fix is not yet in F7). However it is a good idea to explicitly delete the ts instance immediately when no longer needed to avoid unnecessarily holding locks on the db.
Thank you Panu. A few quick notes. I did do more investigation after this bug was filed. I looked at the rpm python bindings to make sure things were properly ref counted and cleaned up. I also constructed a stand alone test and could not reproduce it. I was not able to reproduce it with a full setroubleshoot. I do believe its happening though. One thing I did do was to make sure every time we call rpm it's wrapped in a try/except block. If memory serves me correcty that had not been the case. FWIW when setroubleshootd does get an exception it's usually logged to /var/log/setroubleshoot/setroublehootd.log and syslog. So if it was getting an exception I would expect it to have been logged, but I've also seen cases where exceptions were not logged (usually because they were caught by another library). Panu, you may be onto something with your comment about threads. The calls to rpm do occur in a new thread each time we run analysis so you might be right. James, the version with the try/except wrapping is in mercurial and has not been pushed. I am leaving now and won't be back till 8/6. I will follow up then.
setroubleshootd has 780 open file descriptors to my rpmdb contents. This is a very serious bug, as it locks up my rpmdb in just a few hours of running.
n0dalus, can you try rpm 4.4.2.1 from updates-testing and see what happens with that?
Panu, what does the new rpm package you cite in comment #5 fix? Did you discover a problem in rpm? Looks like this issue is affecting others as well, bug #249990 and bug #253679 look like the same issue.
The new rpm package just adds a hook to perform cleanup in case a python process tracebacks with rpmdb open, bug 245349 comment #26 confirms the fix working at least for that person. That rpm-python didn't have such a cleanup mechanism in place could easily be considered as a problem in rpm, yes :) Those two bugs are certainly at least related and I've a couple open filed against rpm as well.
Re comment #7, I don't think bug 245349 is the right reference. typo? What did you do in rpm-python? If a python thread exits due to a traceback are there references left in place? No matter how a thread exits I would have expected the references it held to have been decremented on exit just as if it exited normally. Is that not the case and if so does that imply the CPython code in rpm has to somehow hook into the exception handling logic?
Is the rpm-python code following the conventions outlined here: http://docs.python.org/api/exceptions.html I noticed that rpmts_dealloc is not calling rpmtsCloseDB, is that a problem?
Duh, yes I meant bug 245389 :) The problem is that when python process/thread exits with an uncaught exception, normal refcount based cleanup doesn't happen. So rpm-python now plants in an exithook that gets called on even in the case of a traceback so it can cleanup any open BDB iterators and locks. No messing with exception handling as such. In other words: it's nothing you should worry about, rpm(-python) was just made a little bit more robust. The tracebacking code should still be fixed of course, but dying uncleanly isn't that dangerous for rpm anymore.
Oh and btw: do make sure you're not accessing rpmdb across threads (eg creating a transaction set in one thread and reading from it in another) - rpm is NOT thread safe. Adding protection against use across threads is in my todo, right now you just need to be careful.
*** Bug 249990 has been marked as a duplicate of this bug. ***
*** Bug 253679 has been marked as a duplicate of this bug. ***
*** Bug 244257 has been marked as a duplicate of this bug. ***
please note that setroubleshoot-1.10.1-1.fc7 has been pushed to testing. This version should do a better job of catching any tracebacks in the analysis thread and report the cause of the traceback. It should also prevent any abnormal thread termination which is suspected of causing the abnormal rpm resource leakage.
The setroubleshoot in F7 updates-testing has broken deps - it requires an setroubleshoot-plugins which doesn't exist. I can't see an rpm-4.4.2.1 in updates-testing at all.
rpm-4.4.2.1 has already been moved to final updates.
closing this as the fundamental bug was in rpm and has been fixed. Also current versions of setroubleshoot are much more robust with catching exceptions.