246564 – setroubleshoot has large numbers of files open

Bug 246564 - setroubleshoot has large numbers of files open

Summary: setroubleshoot has large numbers of files open

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	setroubleshoot
Sub Component:
Version:	7
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	John Dennis
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	244257 249990 253679 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-07-03 03:50 UTC by James Antill
Modified:	2008-01-09 16:48 UTC (History)
CC List:	7 users (show)
Fixed In Version:	rpm-4.4.2.1
Clone Of:
Environment:
Last Closed:	2008-01-09 16:41:07 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description James Antill 2007-07-03 03:50:08 UTC

Description of problem:
 setroubleshootd has /var/lib/rpm/* open many times each, I'm guessing this is a
bug ... but I don't think it's doing anything bad. Eg.

% sudo lsof | egrep setroub | wc -l
1095
% sudo lsof | egrep setroub | egrep /var/lib/rpm/ | wc -l
1014
% sudo lsof | egrep setroub | egrep /var/lib/rpm/ | tail         
setrouble  2141      root 1014r      REG              253,0     12288   11261200
/var/lib/rpm/Pubkeys
setrouble  2141      root 1015rR     REG              253,0  53370880   11261194
/var/lib/rpm/Packages
setrouble  2141      root 1016r      REG              253,0     90112   11261198
/var/lib/rpm/Name
setrouble  2141      root 1017r      REG              253,0     12288   11261200
/var/lib/rpm/Pubkeys
setrouble  2141      root 1018rR     REG              253,0  53370880   11261194
/var/lib/rpm/Packages
setrouble  2141      root 1019r      REG              253,0  10100736   11261199
/var/lib/rpm/Basenames
setrouble  2141      root 1020r      REG              253,0     12288   11261200
/var/lib/rpm/Pubkeys
setrouble  2141      root 1021rR     REG              253,0  53370880   11261194
/var/lib/rpm/Packages
setrouble  2141      root 1022r      REG              253,0     90112   11261198
/var/lib/rpm/Name
setrouble  2141      root 1023r      REG              253,0     12288   11261200
/var/lib/rpm/Pubkeys

Version-Release number of selected component (if applicable):
% rpm -q setroubleshoot                                       
setroubleshoot(0:1.9.4-2.fc7).noarch

Comment 1 John Dennis 2007-07-03 04:57:12 UTC

Thank you alerting me to this. Setroubleshoot does call into rpm to lookup what
package(s) are involved in an AVC when it knows the path information. It creates
an rpm transaction set to perform the query. A quick examination of the code
reveals there is no explicit close of the transation set. Perhaps these open
files are a consequence of that. However, we are using the python bindings and
one would expect when the transaction set goes out of scope and its ref count
drops to zero the binding code would clean up. I will investigate further ...

Comment 2 Panu Matilainen 2007-07-23 09:28:23 UTC

It is "doing something bad" in the sense that it's leaving so many stale locks
behind it prevents rpm from working at all, see bug 245389.

John, are the rpmdb queries done from a reoccurring thread? I think that would
explain the situation:
1) setroubleshoot lauches a thread
2) does an rpmdb lookup in the thread
3) tracebacks due to whatever reason in the thread
4) thread exists uncleanly -> stale lock is left behind
5) goto 1)

James, if you can reproduce this, please try running "setroubleshootd -f" to see
if there are tracebacks present.

Oh and btw, the bindings are supposed to clean things up as things go out of
scope, and normally do so (of course it's possible there are bugs wrt that in
the python bindings, one I just recently fixed but the fix is not yet in F7).
However it is a good idea to explicitly delete the ts instance immediately when
no longer needed to avoid unnecessarily holding locks on the db.

Comment 3 John Dennis 2007-07-23 16:55:28 UTC

Thank you Panu. A few quick notes. I did do more investigation after this bug
was filed. I looked at the rpm python bindings to make sure things were properly
ref counted and cleaned up. I also constructed a stand alone test and could not
reproduce it. I was not able to reproduce it with a full setroubleshoot. I do
believe its happening though.

One thing I did do was to make sure every time we call rpm it's wrapped in a
try/except block. If memory serves me correcty that had not been the case. FWIW
when setroubleshootd does get an exception it's usually logged to
/var/log/setroubleshoot/setroublehootd.log and syslog. So if it was getting an
exception I would expect it to have been logged, but I've also seen cases where
exceptions were not logged (usually because they were caught by another library).

Panu, you may be onto something with your comment about threads. The calls to
rpm do occur in a new thread each time we run analysis so you might be right.

James, the version with the try/except wrapping is in mercurial and has not been
pushed. I am leaving now and won't be back till 8/6. I will follow up then.

Comment 4 n0dalus 2007-08-17 12:06:43 UTC

setroubleshootd has 780 open file descriptors to my rpmdb contents. This is a
very serious bug, as it locks up my rpmdb in just a few hours of running.

Comment 5 Panu Matilainen 2007-08-17 13:36:30 UTC

n0dalus, can you try rpm 4.4.2.1 from updates-testing and see what happens with
that?

Comment 6 John Dennis 2007-08-21 16:06:33 UTC

Panu, what does the new rpm package you cite in comment #5 fix? Did you discover
a problem in rpm?

Looks like this issue is affecting others as well, bug #249990 and bug #253679
look like the same issue.

Comment 7 Panu Matilainen 2007-08-21 20:57:57 UTC

The new rpm package just adds a hook to perform cleanup in case a python process
tracebacks with rpmdb open, bug 245349 comment #26 confirms the fix working at
least for that person. That rpm-python didn't have such a cleanup mechanism in
place could easily be considered as a problem in rpm, yes :)

Those two bugs are certainly at least related and I've a couple open filed
against rpm as well.

Comment 8 John Dennis 2007-08-21 21:41:05 UTC

Re comment #7, I don't think bug 245349 is the right reference. typo?

What did you do in rpm-python?

If a python thread exits due to a traceback are there references left in place?
No matter how a thread exits I would have expected the references it held to
have been decremented on exit just as if it exited normally. Is that not the
case and if so does that imply the CPython code in rpm has to somehow hook into
the exception handling logic?

Comment 9 John Dennis 2007-08-21 22:17:45 UTC

Is the rpm-python code following the conventions outlined here:
http://docs.python.org/api/exceptions.html

I noticed that rpmts_dealloc is not calling rpmtsCloseDB, is that a problem?

Comment 10 Panu Matilainen 2007-08-22 05:51:14 UTC

Duh, yes I meant bug 245389 :)

The problem is that when python process/thread exits with an uncaught exception,
normal refcount based cleanup doesn't happen. So rpm-python now plants in an
exithook that gets called on even in the case of a traceback so it can cleanup
any open BDB iterators and locks. No messing with exception handling as such.

In other words: it's nothing you should worry about, rpm(-python) was just made
a little bit more robust. The tracebacking code should still be fixed of course,
but dying uncleanly isn't that dangerous for rpm anymore.

Comment 11 Panu Matilainen 2007-08-22 06:01:07 UTC

Oh and btw: do make sure you're not accessing rpmdb across threads (eg creating
a transaction set in one thread and reading from it in another) - rpm is NOT
thread safe. Adding protection against use across threads is in my todo, right
now you just need to be careful.

Comment 12 John Dennis 2007-08-23 16:28:00 UTC

*** Bug 249990 has been marked as a duplicate of this bug. ***

Comment 13 John Dennis 2007-08-23 16:33:59 UTC

*** Bug 253679 has been marked as a duplicate of this bug. ***

Comment 14 Panu Matilainen 2007-08-24 07:07:16 UTC

*** Bug 244257 has been marked as a duplicate of this bug. ***

Comment 15 John Dennis 2007-08-27 17:14:34 UTC

please note that setroubleshoot-1.10.1-1.fc7 has been pushed to testing. This
version should do a better job of catching any tracebacks in the analysis thread
and report the cause of the traceback. It should also prevent any abnormal
thread termination which is suspected of causing the abnormal rpm resource leakage.

Comment 16 Joe Orton 2007-08-28 08:48:36 UTC

The setroubleshoot in F7 updates-testing has broken deps - it requires an
setroubleshoot-plugins which doesn't exist.

I can't see an rpm-4.4.2.1 in updates-testing at all.

Comment 17 Panu Matilainen 2007-08-28 09:41:41 UTC

rpm-4.4.2.1 has already been moved to final updates.

Comment 18 John Dennis 2008-01-09 16:41:07 UTC

closing this as the fundamental bug was in rpm and has been fixed. Also current
versions of setroubleshoot are much more robust with catching exceptions.

Note You need to log in before you can comment on or make changes to this bug.