245389 – RPM database runs out of locks

Bug 245389 - RPM database runs out of locks

Summary: RPM database runs out of locks

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	rpm
Sub Component:
Version:	7
Hardware:	i386
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Panu Matilainen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (4):	244178 244243 244738 250041 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-06-22 19:30 UTC by Liam Dennehy
Modified:	2007-11-30 22:12 UTC (History)
CC List:	8 users (show)
Fixed In Version:	4.4.2.1-1.fc7
Clone Of:
Environment:
Last Closed:	2007-08-27 21:48:41 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
RPM query output (615 bytes, application/octet-stream) 2007-06-22 19:30 UTC, Liam Dennehy	no flags	Details
strace of yum rpmdb lock failure (331.14 KB, text/plain) 2007-06-24 20:01 UTC, John McBride	no flags	Details
View All

Description Liam Dennehy 2007-06-22 19:30:32 UTC

Description of problem:
Any attempt to install or upgrade a package fails, and now even querying the
database locks (found out when tring to query the version of RPM I use fo this
bug report). Originally spotted in Pup/Yum but nailed right down to RPM. Error
message:
rpmdb: Lock table is out of available locker entries

This is a fresh installation of F7, less than a week old.

Version-Release number of selected component (if applicable):
RPM v4.4.2, release number will be exactly a shipped in F7

How reproducible:
When problem starts, it won't stop, only a reboot will fix it. Indeterminate
time until it reoccurs, but I think it's related to the number of packages being
churned...

Steps to Reproduce:
1. rpm -qi rpm
2.
3.
  
Actual results:
rpmdb: Lock table is out of available locker entries
and other messages (will attach).

Expected results:
Information on the installed application

Additional info:

Comment 1 Liam Dennehy 2007-06-22 19:30:32 UTC

Created attachment 157641 [details]
RPM query output

Comment 2 Jeff Johnson 2007-06-22 20:40:46 UTC

This is likely a dupe of #244738, certainly has the same symptoms.

Can you try with selinux disabled? You will also need to do
    rm -f /var/lib/rpm/__db*

Comment 3 Liam Dennehy 2007-06-23 00:14:09 UTC

*** Bug 244178 has been marked as a duplicate of this bug. ***

Comment 4 Liam Dennehy 2007-06-23 00:17:40 UTC

(In reply to comment #2)
> This is likely a dupe of #244738, certainly has the same symptoms.
> 
> Can you try with selinux disabled? You will also need to do
>     rm -f /var/lib/rpm/__db*

Indeed, I searched and found nothing looking like my bug, sorry I didn't catch it.

However, #244738 menitions his system s no longer bust, mine is. SELinux disable
fixed it, I haven't yet re-enabled to test if it breaks again, but will be happy
to assist with debugging...

Comment 5 John McBride 2007-06-24 20:01:52 UTC

Created attachment 157722 [details]
strace of yum rpmdb lock failure

Comment 6 George N. White III 2007-07-09 20:22:39 UTC

This has happened to me twice, both times "rm -f /var/lib/rpm/__db*" got it
working.  F7 is installed on a clean partition with selinux in permissive mode.

Comment 7 Răzvan Sandu 2007-07-13 06:56:49 UTC

*** Bug 244243 has been marked as a duplicate of this bug. ***

Comment 8 Panu Matilainen 2007-07-17 14:27:30 UTC

*** Bug 244738 has been marked as a duplicate of this bug. ***

Comment 9 Răzvan Sandu 2007-07-17 22:22:11 UTC

Hello,

Just a quick add, which I hope to be helpful.

I had installed Fedora 6.93 on some machines. Being a development release, it
had the development repos active in /etc/yum.repos.d

After F7 was released (final), I've forgotten to disable the development
repositories on some machines. So they started to bring updates from Fedora 8 at
night (the new development tree after final F7).

Now, on *some* machines I partially have F8 packages (I disabled the development
repositories in the mean time).

I noticed that I only have this "rpm database runs out of locks" problem on the
machines where the above happened.


Regards,
Răzvan

Comment 10 Chris Langlands 2007-07-18 01:11:32 UTC

My F7 is a fresh install on bare metal, and I experience this problem as well. 
I have had to do "rm -f /var/lib/rpm/__db*" on a daily basis ever since I
installed F7.

Comment 11 Panu Matilainen 2007-07-18 06:11:25 UTC

Something is leaving tons of stale locks around. My immediate suspect would be
yum-updatesd, primarily because
1) it runs often so it has the best chance to do damage wrt this
2) it's tracebacks go unnoticed so it's all too easy to not see if something's
wrong 

The thing is, if a python process such as yum-updatesd tracebacks, it can leave
stale rpmdb locks behind. Fix for this is in the works but not there yet.

For example bug #203938 suggests that there is/was a problem with fedorakmod
plugin in combination with updatesd, this could easily cause tracebacks and thus
stale locks.

What I'd like you folks do is:
# service yum-updatesd stop
<ensure no other yum or rpm processes are around>
# rm -f /var/lib/rpm/__*
# yum-updatesd -f

..and leave it running in the terminal. Once it kicks in (you can either
manually run puplet or wait until the next refresh), do you see anything strange
like tracebacks?

Comment 12 Chris Langlands 2007-07-18 07:20:52 UTC

I've been running without yum-updatesd since the last reboot.. and just now I
experienced the locks problem again.

# chkconfig --list yum-updatesd 
yum-updatesd    0:off   1:off   2:off   3:off   4:off   5:off   6:off

# rm -f /var/lib/rpm/__db.00* && rpm --rebuilddb

# reboot

# service yum-updatesd status
yum-updatesd is stopped

# rpm -e <package>
<snip>
rpmdb: Lock table is out of available locker entries
rpmdb: Unknown locker ID: 15e8
error: db4 error(22) from db->close: Invalid argument
error: cannot open Group index using db3 - Cannot allocate memory (12)
rpmdb: Lock table is out of available locker entries
rpmdb: Unknown locker ID: 15e9
error: db4 error(22) from db->close: Invalid argument
rpmdb: Lock table is out of available locker entries
rpmdb: Unknown locker ID: 15ea
<snip>

I think maybe the problem is with the setroubleshoot daemon - I'm disabling that
now.

Comment 13 Panu Matilainen 2007-07-18 19:27:37 UTC

I don't see why setroubleshoot daemon would have anything to do with rpmdb
locks... if not yum-updatesd then I don't quite know what it would be.
Try something like this to see what's accessing the db (leave it running until
it produces something interesting):
# while [ 1 -eq 1 ]; do /sbin/fuser -v /var/lib/rpm/__db.00*; sleep 1; done

Comment 14 Chris Langlands 2007-07-18 22:46:27 UTC

When I first received the rpmdb error I ran lsof | grep rpm and there were
several screenfulls of setroubleshoot processes with the RPM database open. 
Since I performed service setroubleshoot stop, I have not had the RPM locks
problem once.

Comment 15 Chris Langlands 2007-07-18 22:49:02 UTC

[root@studio /]# while [ 1 -eq 1 ]; do /sbin/fuser -v /var/lib/rpm/__db.00*;
sleep 1; done &
[1] 13521
[root@studio /]# service setroubleshoot start
Starting setroubleshootd:                                  [  OK  ]
[root@studio /]#
                     USER        PID ACCESS COMMAND
/var/lib/rpm/__db.001:
                     root      13570 ....m setroubleshootd

/var/lib/rpm/__db.002:
                     root      13570 ....m setroubleshootd

/var/lib/rpm/__db.003:
                     root      13570 ....m setroubleshootd


                     USER        PID ACCESS COMMAND
/var/lib/rpm/__db.001:
                     root      13570 ....m setroubleshootd

/var/lib/rpm/__db.002:
                     root      13570 ....m setroubleshootd

/var/lib/rpm/__db.003:
                     root      13570 ....m setroubleshootd


                     USER        PID ACCESS COMMAND
/var/lib/rpm/__db.001:
                     root      13570 ....m setroubleshootd

/var/lib/rpm/__db.002:
                     root      13570 ....m setroubleshootd

/var/lib/rpm/__db.003:
                     root      13570 ....m setroubleshootd

...ad infinitum (i.e. endless terminal scrolling ensues)

Comment 16 Panu Matilainen 2007-07-19 07:13:14 UTC

Ok, you have me convinced. Looking at setroubleshootd source, sure enough it
does something with rpmdb (I didn't know that). Please check what running
setroubleshootd in non-daemon mode says:
# service setroubleshootd stop
# /usr/sbin/setroubleshootd -f

My guess is that on the problematic systems, it tracebacks in a thread so it
doesn't actually die but keeps on trying (and failing), thus leaving all those
locks around.

Comment 17 Panu Matilainen 2007-07-23 09:03:59 UTC

rpm 4.4.2.1-0.6.rc3 in the development repository has fixes that should prevent
stale rpmdb locks from gathering on python process tracebacks. Please try
updating to that, clear the locks and let me know whether it helps or not. 

Also please please try the steps in comment #16, it's most likely that
setroubleshootd has bug(s) that trigger this, but rpm should be better at
cleaning up the locks, that's what 4.4.2.1-0.6.rc3 tries to address.

Comment 18 Panu Matilainen 2007-07-23 18:27:13 UTC

Hmm, setroubleshoot has it's own log where tracebacks and such are supposed to
go, can people seeing this check out /var/log/setroubleshoot/setroublehootd.log*
for tracebacks / other strangeness?

Comment 19 Chris Langlands 2007-07-23 19:03:24 UTC

nothing strange in the logs, in fact there's hardly anything in the logs.

Comment 20 Jack Tanner 2007-08-05 13:23:22 UTC

I just saw this bug for the first time (F7, selinux in permissive mode). Here's
my setroublehootd.log. Note the rpm-related traceback towards the end.

2007-08-02 09:08:57,027 [email.WARNING] cannot open file /var/lib/setroubleshoot
/email_alert_recipients, No such file or directory
2007-08-03 09:59:12,661 [avc.ERROR] Plugin Exception plugins.home_tmp_bad_labels
 
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/setroubleshoot/analyze.py", line 259, i
n analyze_avc
    report_receiver.report_problem(report)
  File "/usr/lib/python2.5/site-packages/setroubleshoot/server.py", line 157, in
 report_problem
    _(" For complete SELinux messages. run sealert -l %s" % siginfo.local_id ))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 89: ordinal
 not in range(128)
2007-08-03 09:59:12,692 [avc.ERROR] Plugin Exception plugins.catchall_file 
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/setroubleshoot/analyze.py", line 259, i
n analyze_avc
    report_receiver.report_problem(report)
  File "/usr/lib/python2.5/site-packages/setroubleshoot/server.py", line 157, in
 report_problem
    _(" For complete SELinux messages. run sealert -l %s" % siginfo.local_id ))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 66: ordinal
 not in range(128)
2007-08-03 09:59:12,705 [avc.ERROR] Plugin Exception plugins.catchall 
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/setroubleshoot/analyze.py", line 259, i
n analyze_avc
    report_receiver.report_problem(report)
  File "/usr/lib/python2.5/site-packages/setroubleshoot/server.py", line 157, in
 report_problem
    _(" For complete SELinux messages. run sealert -l %s" % siginfo.local_id ))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 66: ordinal
 not in range(128)
2007-08-03 09:59:13,140 [avc.ERROR] Plugin Exception plugins.restorecon 
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/setroubleshoot/analyze.py", line 259, i
n analyze_avc
    report_receiver.report_problem(report)
  File "/usr/lib/python2.5/site-packages/setroubleshoot/server.py", line 157, in
 report_problem
    _(" For complete SELinux messages. run sealert -l %s" % siginfo.local_id ))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 102: ordina
l not in range(128)
2007-08-04 23:41:18,478 [plugin.ERROR] failed to retrieve rpm info for amazon_ti
me
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/setroubleshoot/util.py", line 228, in g
et_rpm_nvr_by_file_path
    mi = ts.dbMatch(rpm.RPMTAG_BASENAMES, path)
TypeError: rpmdb open failed

Comment 21 Panu Matilainen 2007-08-06 08:32:35 UTC

Yes, the rpm related traceback just says "the damage is already done", it's not
interesting in itself. But ok, setroubleshoot is tracebacking all over the place
repeatedly which kinda proves my theorem how rpm comes to run out of locks.

What I would like to know is: can you reproduce this after upgrading to rpm
4.4.2.1 from rawhide?

Comment 22 Panu Matilainen 2007-08-08 07:50:52 UTC

*** Bug 250041 has been marked as a duplicate of this bug. ***

Comment 23 Jack Tanner 2007-08-08 13:02:52 UTC

Well, I'll upgrade to 4.4.2.1, but I'm not equipped to handle serious problems
caused by "rpm from Rawhide." So if things hit the fan, Panu, you get to help me
troubleshoot. Deal?

Comment 24 Panu Matilainen 2007-08-08 13:21:52 UTC

Sure - it shouldn't eat kittens for breakfast :) Or alternatively you can wait a
bit more for 4.4.2.1 to land on F7 updates-testing. The plan is to get that done
this week, I've just been delayed by other things.

Comment 25 Fedora Update System 2007-08-13 17:03:13 UTC

rpm-4.4.2.1-1.fc7 has been pushed to the Fedora 7 testing repository.  If problems still persist, please make note of it in this bug report.

Comment 26 Răzvan Sandu 2007-08-19 09:09:25 UTC

I've installed and tested it on a few machines and my answer is: works for me !

What I've did was only to enable fedora-update-testing.repo in /etc/yum.repos.d
and do a "yum --obsoletes upgrade rpm"

If others confirm, too, please move this package to stable repository.

Regards,
Răzvan

Comment 27 Fedora Update System 2007-08-27 21:47:16 UTC

rpm-4.4.2.1-1.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 28 Panu Matilainen 2007-08-28 05:46:39 UTC

Just FYI, I think I've done what I can from rpm: rpmdb locks get cleaned on
unclean exit from python processes/threads. The root cause of this bug has been
setroubleshoot crashing all the time for some people, following up on that
should happen in bug 246564.

Note You need to log in before you can comment on or make changes to this bug.