Description of problem: rpm-4.5.90-0.git8426.9.x86_64 in rawhide. Throws the above error often when I am rebuilding all of rawhide. Each builder has 4 separate mock builds running simultaneously. one common mock failure points at rpm being the root cause. ERROR: Command failed: # /usr/bin/yum --installroot /var/lib/mock/fedora-development-i386-CTL-1.4.1-6.fc9.src.rpm/root/ install 'ilmbase-devel' error: db4 error(2) from dbcursor->c_get: No such file or directory error: db4 error(2) from dbcursor->c_get: No such file or directory error: db4 error(2) from dbcursor->c_get: No such file or directory error: db4 error(2) from dbcursor->c_get: No such file or directory Traceback (most recent call last): File "/usr/bin/yum", line 29, in <module> yummain.user_main(sys.argv[1:], exit_code=True) File "/usr/share/yum-cli/yummain.py", line 243, in user_main errcode = main(args) File "/usr/share/yum-cli/yummain.py", line 159, in main (result, resultmsgs) = base.buildTransaction() File "/usr/lib/python2.5/site-packages/yum/__init__.py", line 628, in buildTransaction (rescode, restring) = self.resolveDeps() File "/usr/lib/python2.5/site-packages/yum/depsolve.py", line 670, in resolveDeps for conflict in self._checkConflicts(): File "/usr/lib/python2.5/site-packages/yum/depsolve.py", line 874, in _checkConflicts for conflict in po.returnPrco('conflicts'): File "/usr/lib/python2.5/site-packages/yum/packages.py", line 770, in returnPrco self._populatePrco() File "/usr/lib/python2.5/site-packages/yum/packages.py", line 784, in _populatePrco hdr = self._get_hdr() File "/usr/lib/python2.5/site-packages/yum/rpmsack.py", line 57, in _get_hdr return mi.next() StopIteration Version-Release number of selected component (if applicable): rpm-4.5.90-0.git8426.9.x86_64 in rawhide. How reproducible: often Steps to Reproduce: 1. start 4 mock builds in parallel, each with their own chroot 2. wait for failure db4 error(2) message, which causes yum to fail, which causes mock to fail. The actual point within yum moves around a bit, but always involves an rpm transaction trying to read some data from the database or from a package. No /var/lib/rpm/__db* files are present in the host system when the failure occurs. Actual results: failure Expected results: no failure Note, this will prevent me from completing a full rawhide rebuild until resolved.
The hosts /var/lib/rpm/__db* files don't matter on chroot build/install, the ones in the chroots (eg /var/lib/mock/fedora-development-i386-CTL-1.4.1-6.fc9.src.rpm/root/) are the relevant ones. One possibile cause for these kind of issues is accessing the same rpmdb with different version from inside and outside the chroot without clearing up the environment in between. Is there any pattern to the failing builds, or is it just plain random (ie if one build fails like this, does it always fail or occasionally succeed)?
Oh, how exactly are you starting these builds - "make mockbuild" from dist-cvs or something else? Just so I dont chase ghosts trying to reproduce...
I have not yet found a pattern to the failures. Yes, both rpm outside the chroot and inside are exactly the same in these cases. I rebuilt all the machines using rawhide, and am rebuilding the packages using the same rawhide. I run 'mock -r fedora-rawhide-$arch --uniqueext=$something --resultdir=$somewhere --rebuild $somesrpm'. I run 4 instances of 'mock' on each builder, 2 for each of i386 and x86_64 in parallel, but obviously in separate chroots. Machines have plenty of RAM and swap.
Bzzt! Berkeley DB version conflict is detected on dbenv open and has entirely different error message than error: db4 error(2) from dbcursor->c_get: No such file or directory Nice guess though ... I'd suggest running an strace on one/all of the mock builds to pin down the sequence of events. Pay particular attention to whether mock is opening, accessing, and immediately closing indices.
Jeff may be on to something. $ rpm -qp --provides db4-4.7.25-2.fc10.x86_64.rpm libdb-4.7.so()(64bit) db4 = 4.7.25-2.fc10 $ rpm -qp --requires rpm-4.5.90-0.git8426.9.x86_64.rpm libdb-4.5.so()(64bit) so if rpm is dlopen()ing libdb, boom...
Created attachment 314295 [details] Always clean environment on cached roots No, rpm doesn't dlopen() anything. If it were a db environment *version* mismatch you'd indeed get a different message. AFAICT, this has to do with mock root cache containing the db environment, the fact that rpm opens the db before entering the chroot and limited (at least mostly) to --uniqueext use. Matt, can you see if the problem goes away if you a) disable root caching in mock b) apply the attached hack of a patch to mock (with root caching enabled)
I disabled the root_cache, and 5% of the jobs are complete now with no errors; before it should have failed before now. I'll see about building mock with your patch.
The mock patch appears to be working for me too. No failures in a few hours since using it. This begs the question though - is this really a mock bug, or is it papering over a problem with rpm? Mock needing to delete __db* files (created and managed by rpm) to keep rpm from dying seems like the wrong solution.
Doing rm -f /var/lib/rpm/__db* is papering over the problem. In fact, removing those files opens up a lock race, the only reason the effects of the lock race are not being widely seen is that most accesses of rpmdb tend to be serialized through other means, like monkey watching a screen. OTOH, the same "papering" has been "working" with rpmdb's for years, just not at all the correct fix.
Matt, We upgraded the 'mock' machine from F7 to F9. I think that we may be running into a versioning conflict with using root cache from F7 on F9 and/or vice-versa. Most of the builders have cache dir symlinked to NFS. Can you investigate that?
If you want "sanity" with multiple rpmdb's, with different versions of Berkeley DB everywhere, than the simplest/best solution is using a common Berkeley DB everywhere. KISS is always better than vendor brand loyalty ... FWIW, that's what was always done with the RedHat build systems, no clue what they do anymore. Hint w multiple chroot's: You *really* want this macro set to 1: # Open all indices before doing chroot(2). # %_openall_before_chroot 0 No clue what rpm.org does instead. Have fun!
After conversation with Matt, the comment above (#10) does not apply. Please ignore.
Weird. We (mock) don't *ever* use the rpm that's installed in the chroot. We do all our package installs outside using --root, then when it's time to build we go into the chroot. My only thought here is that we've unpacked a cached root, then are installing specific packages for the SRPM dependencies and the root rpmdb (from the cache) is incompatible with the current RPM. Have you tried generating a new root cache and then tried it without deleting the cache? e.g.: $ sudo rm /var/lib/mock/cache/fedora-9-i386/root_cache/* $ sudo mock --init -r fedora-9-i386 then try your build again?
Clark: yes, my buildruns start by erasing everything under /var/lib/mock, and let the root cache get created fresh with the first mock --rebuild. In my case, I had the same RPM both outside the chroot and inside; I had re-installed all my builders with the same rawhide tree I was about to rebuild, which is the same tree used inside the buildroots. So there was no version incompatibility, unless somehow it existed within rawhide.
Well that blows my first theory (trudges dejectedly back to the dugout). Does this happen if you serialize the builds (i.e. only one going at a time)? I wonder if we're not doing something right when we build the root cache? I init'ed a fedora-9-i386 chroot on my laptop (rawhide), then ran some rpm commands using --root to specify the chroot location; -qa and --rebuilddb worked as expected. Panu, got any super-secret rpmdb-verify command that stomps through the db and ensures that it's correct?
(repeated) Hint: try strace, verify that rpmdb within chroot is all that is opened. There's a reopen during db->close() that can hit the outer, not the chroot, dbenv. The issue reappears every other Berkeley DB release or so ... And if you have rpmdb on NFS all bets are off. Dunno what is in mock "cache".
The mock root cache is a tarball that contains the initial contents of a chroot, before all the dependent packages for an SRPM are installed into it. We've found that it's *much* faster to unpack a tarball into a chroot than doing a yum transaction to build the chroot. The cache tarball is built after the chroot is initialized and the base packages are installed. I'm not sure how you'd strace this one. You've got mock calling yum and I believe yum makes direct calls into librpm, so you'd probably have to strace yum. I did a quick try of editing /etc/mock/fedora-9-i386.cfg and adding this: config_opt['yum_path'] = '/usr/bin/strace -o /tmp/yum.trace /usr/bin/yum' but that didn't do what I thought it would.
k. Sure a prestaged tarball will beat any other means of content copying short of a loopback mount image with COW wrapper. Hmmm, is there a /var/lib/rpm in your tarball? strace -e used to get open/chroot calls is likely sufficient to pick out whether the outer /var/lib/rpm dbenv is being opened. All open's should either include chroot prefix, or (if lazily open'd) be within chroot enter/exit. An attempt to open the outer rpmdb path is consistent with original report of ENOENT return from dbcursor->c_get() if chroot(2) changes path. Note that a missing page in cache might also return ENOENT, I fergit, but the Berkeley DB doco is quite complete if necessary. (aside) There's another possible cause, taking rpmdb join keys outside of a locking context, by closing a dbenv, but I'm not hearing indications that is a problem so far.
Jeff, we don't need your guesswork here, thank you very much. The root issue here is that rpm opens up the db before entering the chroot, and so the environment ends up containing paths like /var/lib/mock/root/fedora-rawhide-x86_64/root/var/lib/rpm/yadda. Currently, the environment gets included in the tarball that mock builds if root caching is enabled. That's still "ok", but once you start using --uniqueext=<something> with root caching enabled, the paths that the db environment in the root cache tarball point to might no longer exist, and certainly point to wrong files even if they do. That's where it blows up. The easy fix is to make mock not tar up the /var/lib/rpm/__* files from the chroot, ie "rm -f /var/lib/rpm/__*" before tarring up the root contents. My patch to mock in comment #6 was just a proof-of-theory thing that works the wrong way around (removing the environment after unpacking the tarball, instead of not tarring them up in the first place) but the same thing is accomplished: a cached root wont contain bogus paths. The "real" fix would be rpm never ever opening the rpmdb from outside chroot in the first place, but Berkeley DB throws some curveballs into the picture. As to comment #8 - it's a bit of both: filtering out the db environment from the chroot cache tarball in mock is the right thing to do anyway, but rpm is at blame too.
Hardly guesswork ... But You da man now, Dude. Have fun!
Panu, Ah, I didn't think about the paths being hosed due to opening before the chroot. It's easy enough for the root_cache plugin to --exclude /var/lib/rpm when we create the cache, but I presume that means we'd need to create it and then 'rpm --rebuilddb' after we unpacked the cache, correct?
Don't skip the entire /var/lib/rpm, otherwise there'd be no db to rebuild :) Just exclude /var/lib/rpm/__* from the cache tarball, that's all you need to do. Or to be exact, glob.glob("%s/__db*" % rpm.expandMacro("%{_dbpath}")) is what needs excluding from the cache. And no, you don't need to rebuild the db.
Created attachment 314406 [details] mock workaround for rpmdb/root_cache interaction weirdness heh, yeah I figured that out when I was hacking the root_cache plugin :) I fooled around with a couple of ways to exclude the __db* files and eventually realized the simplest was to always do rm -f <chroot-path>/var/lib/rpm/__db* in the post-hook of the root_cache plugin. Here's a patch to try:
I'm waiting on starting another rawhide rebuild until this bug and #455387 are resolved.
mock-0.9.11-1.fc10.src.rpm and rpm-4.5.90-0.git8461.7.src.rpm together do not resolve this problem. I re-ran all my failed builds with this combination (in fact with machines upgraded to today's rawhide), and those builds failed.
for the record, I've stopped using the mock root cache until this is resolved. This increases my full rawhide rebuild time from about 30 hours with root cache enabled to 75.5 hours with root cache disabled.
Created attachment 320397 [details] Fix cleaning up of rpmdb environment The problem is that the patch from comment #23 that went into mock doesn't actually do what it's supposed to do despite looking basically correct. See "tar tzf cache.tar.gz ./var/lib/rpm/" on a root cache generated by mock 0.9.11 still shows the environment there. I didn't track it further but I guess the glob doesn't get expanded when passed to mock.util.do(), the attached patch that globs + cleans the path "manually" makes it actually do something.
*** Bug 465724 has been marked as a duplicate of this bug. ***
ugh, how embarrassing. I'll pick up the patch from #27 and spin a new mock today.
mock-0.9.12 is out with Panu's corrected patch.
mock-0.9.12-1.fc9 has been submitted as an update for Fedora 9. http://admin.fedoraproject.org/updates/mock-0.9.12-1.fc9
mock-0.9.12-1.fc8 has been submitted as an update for Fedora 8. http://admin.fedoraproject.org/updates/mock-0.9.12-1.fc8
mock-0.9.12-1.fc9 has been pushed to the Fedora 9 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mock'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-9074
mock-0.9.12-1.fc8 has been pushed to the Fedora 8 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mock'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2008-9085
mock-0.9.13-1.fc9 has been submitted as an update for Fedora 9. http://admin.fedoraproject.org/updates/mock-0.9.13-1.fc9
mock-0.9.13-1.fc8 has been submitted as an update for Fedora 8. http://admin.fedoraproject.org/updates/mock-0.9.13-1.fc8
mock-0.9.13-1.fc9 has been pushed to the Fedora 9 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mock'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-9499
mock-0.9.13-1.fc8 has been pushed to the Fedora 8 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update mock'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2008-9512
mock-0.9.13-1.fc9 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report.
mock-0.9.13-1.fc8 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report.
Created attachment 325743 [details] Patch to clean up rpmdb from source rpm install
This bug isn't quite fixed. Mock does use the rpm in the chroot and apparently newer versions of RPM cause a problem with this. I noticed that I was unable to build fedora 10 rpms from a fedora 9 host. I traced it to the fact that apparently fedora 10's rpm generates the __db* files when installing a source rpm. And because installing the source rpm is done in the chroot you end up with conflicting rpmdb versions. My fix was to clean up the __db* files after installing the source rpm. It would probably be more cleanly solved by having the source rpm be installed outside of the chroot.
Note that "cleaning up" after installing a source rpm opens a lock race window with other installs. You can't just blow away locks after installing a source rpm, there may be other processes running concurrently. But have fun with mock!
(In reply to comment #42) > This bug isn't quite fixed. Mock does use the rpm in the chroot and apparently > newer versions of RPM cause a problem with this. > > I noticed that I was unable to build fedora 10 rpms from a fedora 9 host. I > traced it to the fact that apparently fedora 10's rpm generates the __db* files > when installing a source rpm. And because installing the source rpm is done in > the chroot you end up with conflicting rpmdb versions. > > My fix was to clean up the __db* files after installing the source rpm. It > would probably be more cleanly solved by having the source rpm be installed > outside of the chroot. We actually want to use the inchroot rpm more and more, as it will be gaining features that are not compatible with the host rpm.
(In reply to comment #43) > Note that "cleaning up" after installing a source rpm opens a lock race window > with other installs. You can't just blow away locks after installing a source > rpm, > there may be other processes running concurrently. > > But have fun with mock! In the mock case this is safe(r) as mock is ultimately in control of what's happening in the chroot, and thus it can decide to clean before continuing on to the next action.
Agreed safe(r). I trust condom's and progesterone more than mock however. Removing rpmdb concurrency locks can never be done safe(r)ly w/o introducing a race unless there is an additional locking guarantee or one is prepared to deal with the consequences.
I am seeing what seems to be the same bug with Fedora 10 x86_64 when trying to compile Fedora 10 packages for i386.