458234 – rpm error: db4 error(2) from dbcursor->c_get: No such file or directory

Bug 458234 - rpm error: db4 error(2) from dbcursor->c_get: No such file or directory

Summary: rpm error: db4 error(2) from dbcursor->c_get: No such file or directory

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mock
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	David Cantrell
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	465724 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-08-07 05:36 UTC by Matt Domsch
Modified:	2013-01-10 04:46 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-11-13 03:35:39 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Always clean environment on cached roots (845 bytes, patch) 2008-08-14 08:38 UTC, Panu Matilainen	no flags	Details \| Diff
mock workaround for rpmdb/root_cache interaction weirdness (1.19 KB, patch) 2008-08-15 20:13 UTC, Clark Williams	no flags	Details \| Diff
Fix cleaning up of rpmdb environment (728 bytes, patch) 2008-10-15 07:20 UTC, Panu Matilainen	no flags	Details \| Diff
Patch to clean up rpmdb from source rpm install (550 bytes, application/octet-stream) 2008-12-04 20:25 UTC, Ryan Thomas	no flags	Details
View All

Description Matt Domsch 2008-08-07 05:36:13 UTC

Description of problem:
rpm-4.5.90-0.git8426.9.x86_64 in rawhide.

Throws the above error often when I am rebuilding all of rawhide.  Each builder has 4 separate mock builds running simultaneously.

one common mock failure points at rpm being the root cause.

ERROR: Command failed:
 # /usr/bin/yum --installroot /var/lib/mock/fedora-development-i386-CTL-1.4.1-6.fc9.src.rpm/root/  install  'ilmbase-devel'
error: db4 error(2) from dbcursor->c_get: No such file or directory
error: db4 error(2) from dbcursor->c_get: No such file or directory
error: db4 error(2) from dbcursor->c_get: No such file or directory
error: db4 error(2) from dbcursor->c_get: No such file or directory
Traceback (most recent call last):
  File "/usr/bin/yum", line 29, in <module>
    yummain.user_main(sys.argv[1:], exit_code=True)
  File "/usr/share/yum-cli/yummain.py", line 243, in user_main
    errcode = main(args)
  File "/usr/share/yum-cli/yummain.py", line 159, in main
    (result, resultmsgs) = base.buildTransaction()
  File "/usr/lib/python2.5/site-packages/yum/__init__.py", line 628, in buildTransaction
    (rescode, restring) = self.resolveDeps()
  File "/usr/lib/python2.5/site-packages/yum/depsolve.py", line 670, in resolveDeps
    for conflict in self._checkConflicts():
  File "/usr/lib/python2.5/site-packages/yum/depsolve.py", line 874, in _checkConflicts
    for conflict in po.returnPrco('conflicts'):
  File "/usr/lib/python2.5/site-packages/yum/packages.py", line 770, in returnPrco
    self._populatePrco()
  File "/usr/lib/python2.5/site-packages/yum/packages.py", line 784, in _populatePrco
    hdr = self._get_hdr()
  File "/usr/lib/python2.5/site-packages/yum/rpmsack.py", line 57, in _get_hdr
    return mi.next()
StopIteration


Version-Release number of selected component (if applicable):
rpm-4.5.90-0.git8426.9.x86_64 in rawhide.


How reproducible:
often

Steps to Reproduce:
1. start 4 mock builds in parallel, each with their own chroot
2. wait for failure db4 error(2) message, which causes yum to fail, which causes mock to fail.

The actual point within yum moves around a bit, but always involves an rpm transaction trying to read some data from the database or from a package.  No /var/lib/rpm/__db* files are present in the host system when the failure occurs.

  
Actual results:
failure

Expected results:
no failure

Note, this will prevent me from completing a full rawhide rebuild until resolved.

Comment 1 Panu Matilainen 2008-08-08 13:41:15 UTC

The hosts /var/lib/rpm/__db* files don't matter on chroot build/install, the ones in the chroots (eg /var/lib/mock/fedora-development-i386-CTL-1.4.1-6.fc9.src.rpm/root/) are the relevant ones.

One possibile cause for these kind of issues is accessing the same rpmdb with different version from inside and outside the chroot without clearing up the environment in between. Is there any pattern to the failing builds, or is it just plain random (ie if one build fails like this, does it always fail or occasionally succeed)?

Comment 2 Panu Matilainen 2008-08-08 13:43:53 UTC

Oh, how exactly are you starting these builds - "make mockbuild" from dist-cvs or something else? Just so I dont chase ghosts trying to reproduce...

Comment 3 Matt Domsch 2008-08-08 14:05:13 UTC

I have not yet found a pattern to the failures.

Yes, both rpm outside the chroot and inside are exactly the same in these cases.  I rebuilt all the machines using rawhide, and am rebuilding the packages using the same rawhide.

I run 'mock -r fedora-rawhide-$arch --uniqueext=$something --resultdir=$somewhere --rebuild $somesrpm'.

I run 4 instances of 'mock' on each builder, 2 for each of i386 and x86_64 in parallel, but obviously in separate chroots.

Machines have plenty of RAM and swap.

Comment 4 Jeff Johnson 2008-08-09 18:26:12 UTC

Bzzt! Berkeley DB version conflict is detected on dbenv open and has entirely different
error message than
    error: db4 error(2) from dbcursor->c_get: No such file or directory
Nice guess though ...

I'd suggest running an strace on one/all of the mock builds to pin down
the sequence of events. Pay particular attention to whether mock is
opening, accessing, and immediately closing indices.

Comment 5 Matt Domsch 2008-08-11 15:40:31 UTC

Jeff may be on to something.

$ rpm -qp --provides db4-4.7.25-2.fc10.x86_64.rpm
libdb-4.7.so()(64bit)
db4 = 4.7.25-2.fc10

$ rpm -qp --requires rpm-4.5.90-0.git8426.9.x86_64.rpm
libdb-4.5.so()(64bit)


so if rpm is dlopen()ing libdb, boom...

Comment 6 Panu Matilainen 2008-08-14 08:38:16 UTC

Created attachment 314295 [details]
Always clean environment on cached roots

No, rpm doesn't dlopen() anything. If it were a db environment *version* mismatch you'd indeed get a different message.

AFAICT, this has to do with mock root cache containing the db environment, the fact that rpm opens the db before entering the chroot and limited (at least mostly) to --uniqueext use.

Matt, can you see if the problem goes away if you
a) disable root caching in mock
b) apply the attached hack of a patch to mock (with root caching enabled)

Comment 7 Matt Domsch 2008-08-14 15:49:55 UTC

I disabled the root_cache, and 5% of the jobs are complete now with no errors; before it should have failed before now.

I'll see about building mock with your patch.

Comment 8 Matt Domsch 2008-08-14 19:04:28 UTC

The mock patch appears to be working for me too.  No failures in a few hours since using it.  This begs the question though - is this really a mock bug, or is it papering over a problem with rpm?  Mock needing to delete __db* files (created and managed by rpm) to keep rpm from dying seems like the wrong solution.

Comment 9 Jeff Johnson 2008-08-14 19:29:07 UTC

Doing
    rm -f /var/lib/rpm/__db*
is papering over the problem. In fact, removing those files opens
up a lock race, the only reason the effects of the lock race are not
being widely seen is that most accesses of rpmdb tend to be serialized
through other means, like monkey watching a screen.

OTOH, the same "papering" has been "working" with rpmdb's for years, just
not at all the correct fix.

Comment 10 Michael E Brown 2008-08-14 19:41:04 UTC

Matt,
   We upgraded the 'mock' machine from F7 to F9. I think that we may be running into a versioning conflict with using root cache from F7 on F9 and/or vice-versa. Most of the builders have cache dir symlinked to NFS. Can you investigate that?

Comment 11 Jeff Johnson 2008-08-14 19:52:59 UTC

If you want "sanity" with multiple rpmdb's, with different versions
of Berkeley DB everywhere, than the simplest/best solution is
using a common Berkeley DB everywhere. KISS is always better
than vendor brand loyalty ...

FWIW, that's what was always done with the RedHat build systems,
no clue what they do anymore.

Hint w multiple chroot's: You *really* want this macro set to 1:

#       Open all indices before doing chroot(2).
#
%_openall_before_chroot 0


No clue what rpm.org does instead. Have fun!

Comment 12 Michael E Brown 2008-08-14 19:59:42 UTC

After conversation with Matt, the comment above (#10) does not apply. Please ignore.

Comment 13 Clark Williams 2008-08-14 21:28:23 UTC

Weird. We (mock) don't *ever* use the rpm that's installed in the chroot. We do all our package installs outside using --root, then when it's time to build we go into the chroot. 

My only thought here is that we've unpacked a cached root, then are installing specific packages for the SRPM dependencies and the root rpmdb (from the cache) is incompatible with the current RPM. 

Have you tried generating a new root cache and then tried it without deleting the cache? e.g.:

$ sudo rm /var/lib/mock/cache/fedora-9-i386/root_cache/*
$ sudo mock --init -r fedora-9-i386

then try your build again?

Comment 14 Matt Domsch 2008-08-14 21:31:55 UTC

Clark: yes, my buildruns start by erasing everything under /var/lib/mock, and let the root cache get created fresh with the first mock --rebuild.  In my case, I had the same RPM both outside the chroot and inside; I had re-installed all my builders with the same rawhide tree I was about to rebuild, which is the same tree used inside the buildroots.  So there was no version incompatibility, unless somehow it existed within rawhide.

Comment 15 Clark Williams 2008-08-14 21:59:46 UTC

Well that blows my first theory (trudges dejectedly back to the dugout). 

Does this happen if you serialize the builds (i.e. only one going at a time)?

I wonder if we're not doing something right when we build the root cache? I init'ed a fedora-9-i386 chroot on my laptop (rawhide), then ran some rpm commands using --root to specify the chroot location; -qa and --rebuilddb worked as expected. 

Panu, got any super-secret rpmdb-verify command that stomps through the db and ensures that it's correct?

Comment 16 Jeff Johnson 2008-08-14 22:12:56 UTC

(repeated) Hint: try strace, verify that rpmdb within chroot is all that is opened.
There's a reopen during db->close() that can hit the outer, not the chroot, dbenv.
The issue reappears every other Berkeley DB release or so ...

And if you have rpmdb on NFS all bets are off. Dunno what is in mock "cache".

Comment 17 Clark Williams 2008-08-15 01:06:13 UTC

The mock root cache is a tarball that contains the initial contents of a chroot, before all the dependent packages for an SRPM are installed into it. We've found that it's *much* faster to unpack a tarball into a chroot than doing a yum transaction to build the chroot.  The cache tarball is built after the chroot is initialized and the base packages are installed. 

I'm not sure how you'd strace this one. You've got mock calling yum and I believe yum makes direct calls into librpm, so you'd probably have to strace yum. 

I did a quick try  of editing /etc/mock/fedora-9-i386.cfg and adding this:

     config_opt['yum_path'] = '/usr/bin/strace -o /tmp/yum.trace /usr/bin/yum'

but that didn't do what I thought it would.

Comment 18 Jeff Johnson 2008-08-15 01:21:50 UTC

k. Sure a prestaged tarball will beat any other means of content copying
short of a loopback mount image with COW wrapper.

 Hmmm, is there a /var/lib/rpm in your tarball?

strace -e used to get open/chroot calls is likely sufficient to pick out
whether the outer /var/lib/rpm dbenv is being opened. All open's should
either include chroot prefix, or (if lazily open'd) be within chroot enter/exit.

An attempt to open the outer rpmdb path is consistent with original report of ENOENT
return from dbcursor->c_get() if chroot(2) changes path. Note that a missing
page in cache might also return ENOENT, I fergit, but the Berkeley DB doco is
quite complete if necessary.

(aside) There's another possible cause, taking rpmdb join keys outside of a locking
context, by closing a dbenv, but I'm not hearing indications that is a problem so far.

Comment 19 Panu Matilainen 2008-08-15 06:13:48 UTC

Jeff, we don't need your guesswork here, thank you very much.

The root issue here is that rpm opens up the db before entering the chroot, and so the environment ends up containing paths like /var/lib/mock/root/fedora-rawhide-x86_64/root/var/lib/rpm/yadda. Currently, the environment gets included in the tarball that mock builds if root caching is enabled. That's still "ok", but once you start using --uniqueext=<something> with root caching enabled, the paths that the db environment in the root cache tarball point to might no longer exist, and certainly point to wrong files even if they do. That's where it blows up.

The easy fix is to make mock not tar up the /var/lib/rpm/__* files from the chroot, ie "rm -f /var/lib/rpm/__*" before tarring up the root contents. My patch to mock in comment #6 was just a proof-of-theory thing that works the wrong way around (removing the environment after unpacking the tarball, instead of not tarring them up in the first place) but the same thing is accomplished: a cached root wont contain bogus paths.

The "real" fix would be rpm never ever opening the rpmdb from outside chroot in the first place, but Berkeley DB throws some curveballs into the picture. As to comment #8 - it's a bit of both: filtering out the db environment from the chroot cache tarball in mock is the right thing to do anyway, but rpm is at blame too.

Comment 20 Jeff Johnson 2008-08-15 12:58:01 UTC

Hardly guesswork ...

But You da man now, Dude. Have fun!

Comment 21 Clark Williams 2008-08-15 13:48:29 UTC

Panu,

Ah, I didn't think about the paths being hosed due to opening before the chroot. 

It's easy enough for the root_cache plugin to --exclude /var/lib/rpm when we create the cache, but I presume that means we'd need to create it and then 'rpm --rebuilddb' after we unpacked the cache, correct?

Comment 22 Panu Matilainen 2008-08-15 19:49:48 UTC

Don't skip the entire /var/lib/rpm, otherwise there'd be no db to rebuild :) Just exclude /var/lib/rpm/__* from the cache tarball, that's all you need to do. Or to be exact, glob.glob("%s/__db*" % rpm.expandMacro("%{_dbpath}")) is what needs excluding from the cache. And no, you don't need to rebuild the db.

Comment 23 Clark Williams 2008-08-15 20:13:58 UTC

Created attachment 314406 [details]
mock workaround for rpmdb/root_cache interaction weirdness

heh, yeah I figured that out when I was hacking the root_cache plugin :)

I fooled around with a couple of ways to exclude the __db* files and eventually realized the simplest was to always do rm -f <chroot-path>/var/lib/rpm/__db* in the post-hook of the root_cache plugin. Here's a patch to try:

Comment 24 Matt Domsch 2008-09-03 13:23:33 UTC

I'm waiting on starting another rawhide rebuild until this bug and #455387 are resolved.

Comment 25 Matt Domsch 2008-09-13 05:31:19 UTC

mock-0.9.11-1.fc10.src.rpm and rpm-4.5.90-0.git8461.7.src.rpm
together do not resolve this problem.  I re-ran all my failed builds with this combination (in fact with machines upgraded to today's rawhide), and those builds failed.

Comment 26 Matt Domsch 2008-10-10 04:05:34 UTC

for the record, I've stopped using the mock root cache until this is resolved.  This increases my full rawhide rebuild time from about 30 hours with root cache enabled to 75.5 hours with root cache disabled.

Comment 27 Panu Matilainen 2008-10-15 07:20:02 UTC

Created attachment 320397 [details]
Fix cleaning up of rpmdb environment

The problem is that the patch from comment #23 that went into mock doesn't actually do what it's supposed to do despite looking basically correct. See "tar tzf cache.tar.gz ./var/lib/rpm/" on a root cache generated by mock 0.9.11 still shows the environment there.

I didn't track it further but I guess the glob doesn't get expanded when passed to mock.util.do(), the attached patch that globs + cleans the path "manually" makes it actually do something.

Comment 28 Panu Matilainen 2008-10-15 15:09:19 UTC

*** Bug 465724 has been marked as a duplicate of this bug. ***

Comment 29 Clark Williams 2008-10-15 15:14:06 UTC

ugh, how embarrassing. 

I'll pick up the patch from #27 and spin a new mock today.

Comment 30 Clark Williams 2008-10-20 19:41:11 UTC

mock-0.9.12 is out with Panu's corrected patch.

Comment 31 Fedora Update System 2008-10-20 19:46:00 UTC

mock-0.9.12-1.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/mock-0.9.12-1.fc9

Comment 32 Fedora Update System 2008-10-20 19:49:10 UTC

mock-0.9.12-1.fc8 has been submitted as an update for Fedora 8.
http://admin.fedoraproject.org/updates/mock-0.9.12-1.fc8

Comment 33 Fedora Update System 2008-10-23 16:39:52 UTC

mock-0.9.12-1.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mock'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-9074

Comment 34 Fedora Update System 2008-10-23 16:40:48 UTC

mock-0.9.12-1.fc8 has been pushed to the Fedora 8 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mock'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2008-9085

Comment 35 Fedora Update System 2008-11-06 22:34:33 UTC

mock-0.9.13-1.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/mock-0.9.13-1.fc9

Comment 36 Fedora Update System 2008-11-06 22:34:44 UTC

mock-0.9.13-1.fc8 has been submitted as an update for Fedora 8.
http://admin.fedoraproject.org/updates/mock-0.9.13-1.fc8

Comment 37 Fedora Update System 2008-11-08 02:10:43 UTC

mock-0.9.13-1.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mock'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-9499

Comment 38 Fedora Update System 2008-11-08 02:11:43 UTC

mock-0.9.13-1.fc8 has been pushed to the Fedora 8 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mock'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2008-9512

Comment 39 Fedora Update System 2008-11-13 03:35:28 UTC

mock-0.9.13-1.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 40 Fedora Update System 2008-11-13 03:37:19 UTC

mock-0.9.13-1.fc8 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 41 Ryan Thomas 2008-12-04 20:25:58 UTC

Created attachment 325743 [details]
Patch to clean up rpmdb from source rpm install

Comment 42 Ryan Thomas 2008-12-04 20:30:10 UTC

This bug isn't quite fixed.  Mock does use the rpm in the chroot and apparently newer versions of RPM cause a problem with this.

I noticed that I was unable to build fedora 10 rpms from a fedora 9 host.  I traced it to the fact that apparently fedora 10's rpm generates the __db* files when installing a source rpm.  And because installing the source rpm is done in the chroot you end up with conflicting rpmdb versions. 

My fix was to clean up the __db* files after installing the source rpm.  It would probably be more cleanly solved by having the source rpm be installed outside of the chroot.

Comment 43 Jeff Johnson 2008-12-04 20:34:19 UTC

Note that "cleaning up" after installing a source rpm opens a lock race window
with other installs. You can't just blow away locks after installing a source rpm,
there may be other processes running concurrently.

But have fun with mock!

Comment 44 Jesse Keating 2008-12-04 21:16:13 UTC

(In reply to comment #42)
> This bug isn't quite fixed.  Mock does use the rpm in the chroot and apparently
> newer versions of RPM cause a problem with this.
> 
> I noticed that I was unable to build fedora 10 rpms from a fedora 9 host.  I
> traced it to the fact that apparently fedora 10's rpm generates the __db* files
> when installing a source rpm.  And because installing the source rpm is done in
> the chroot you end up with conflicting rpmdb versions. 
> 
> My fix was to clean up the __db* files after installing the source rpm.  It
> would probably be more cleanly solved by having the source rpm be installed
> outside of the chroot.

We actually want to use the inchroot rpm more and more, as it will be gaining features that are not compatible with the host rpm.

Comment 45 Jesse Keating 2008-12-04 21:16:53 UTC

(In reply to comment #43)
> Note that "cleaning up" after installing a source rpm opens a lock race window
> with other installs. You can't just blow away locks after installing a source
> rpm,
> there may be other processes running concurrently.
> 
> But have fun with mock!

In the mock case this is safe(r) as mock is ultimately in control of what's happening in the chroot, and thus it can decide to clean before continuing on to the next action.

Comment 46 Jeff Johnson 2008-12-04 21:31:55 UTC

Agreed safe(r). I trust condom's and progesterone more than mock however.

Removing rpmdb concurrency locks can never be done safe(r)ly w/o introducing
a race unless there is an  additional locking guarantee or one is prepared to
deal with the consequences.

Comment 47 Nathan G. Grennan 2009-01-14 23:17:30 UTC

I am seeing what seems to be the same bug with Fedora 10 x86_64 when trying to compile Fedora 10 packages for i386.

Note You need to log in before you can comment on or make changes to this bug.