Bug 88720
Summary: | Updated RH box reports DB_PAGE_NOTFOUND accessing rpmdb | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Alan Cox <alan> |
Component: | rpm | Assignee: | Jeff Johnson <jbj> |
Status: | CLOSED WONTFIX | QA Contact: | Mike McLean <mikem> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 9 | CC: | barryn, herrold, jtate, mike.dorman, peterbaitz, redhat.com, thomas, wtogami |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-12-13 12:59:10 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alan Cox
2003-04-12 15:39:46 UTC
Yup, there's some sort of cache problem, have no reproducer. Doing rm -f /var/lib/rpm/__db* should fix, is cheaper than --rebuilddb. Happens regularly on this box with up2date/rpm. VIA Cyrix III 533 Single processor non NPTL kernel I've seen some similar db3 funnies with other applications on RH9 notably phpwiki. May be co-incidence of course. phpwiki is fine on RH8, but on RH9 it gets random db3 corruptions Does phpwiki use a dbenv? If so, the problems are likelier to be related. If you can characterize "regularly", I can try to reproduce. I haven't yet gotten anything coherent enough to attempt reproducing. I've never seen the error myself, but then I don't use up2date very much. phpwiki doesnt appear to use dbenv, it sees corruption (page allocated twice) in the main db instead. With 8.0 it was solid. Its a bit of pain because live updating to RH9 is ok, live backing down to RH8 is going to be painful. With the RPM stuff we've seen it 5 or 6 times so far on this box. That may in part be because of the way we abuse up2date a lot. Basically we take the live box for ftp*.linux.org.uk, link all the RPM pakcages from the ftp archive into /var/spool/up2date so that it thinks they are downloaded then up2date -u between releases. This seems to quite reliably break it at least once. Thanks, that's coherent enough to attempt to reproduce. phpwiki uses db-4.1.25 or db-4.0.14? In fact, there's a bug in db-4.0.14 when deleting and then re-adding an item from the same exec, a cache page was not marked dirty. Fized in rpm, and (I believe) db-4.1.25. Whatever PEAR is loading for it. Its php, that means its "magic" 8) same here, during up2date i see these 2 messages (DB_PAGE_NOTFOUND), started after using non-NPTL kernel. in addition, on another machine (no-non standard rpms, normal kernel) - i see rpm stuck on uninstalling multiple packages : rpm -e glibc-devel-2.3.2-27.9 ncurses-devel-5.3-4 python-devel-2.2.2-26 compat- libstdc++-devel-7.3-2.96.118 compat-gcc-7.3-2.96.118 gcc-3.2.2-5 compat-gcc- c++-7.3-2.96.118 attaching strace to the process shows it's waiting on futex : [root@w6 root]# strace -p 5966 futex(0x405b3f20, FUTEX_WAIT, 0, NULL CTL-c doesn't work, only kill -9. after that the __db.00<x> files are left in /var/lib/rpm normal, latest kernel : [root@w6 rpm]# rpm -qa |grep kernel kernel-2.4.20-9 kernel-smp-2.4.20-9 Cmon, rpm problems ? a distro is not a distro if it doesn't do package management well ! rpm 4.0.4 was fine, have you heard of a phrase "if it ain't broken, don't fix it" ? This was after installing a new package from RPM. I got this error when doing a rpm -qa: error: db4 error(-30989) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found After doing a rpm --rebuilddb (which the first time gave me the above error plus "error: db4 error(16) from dbenv->remove: Device or resource busy", 127 RPMS are no longer in my rpm database. Big important rpms like coreutils, glibc, bash, info, zlib, etc. Thus all dependencies fail when trying to install new packages. [root@cheetah RPMS]# rpm -ivh nptl-devel-2.3.2-11.9.i686.rpm warning: nptl-devel-2.3.2-11.9.i686.rpm: V3 DSA signature: NOKEY, key ID db42a60e error: Failed dependencies: glibc-devel = 2.3.2-11.9 is needed by nptl-devel-2.3.2-11.9 [root@cheetah RPMS]# rpm -i glibc-devel-2.3.2-11.9.i386.rpm warning: glibc-devel-2.3.2-11.9.i386.rpm: V3 DSA signature: NOKEY, key ID db42a60e error: Failed dependencies: /bin/sh is needed by glibc-devel-2.3.2-11.9 /sbin/install-info is needed by glibc-devel-2.3.2-11.9 glibc = 2.3.2 is needed by glibc-devel-2.3.2-11.9 kernel-headers is needed by glibc-devel-2.3.2-11.9 kernel-headers >= 2.2.1 is needed by glibc-devel-2.3.2-11.9 Not good. I'm using a stock RH 9 with all updates installed. Dell PowerEdge 2550 on PERC3 raid array. This is the second time I have seen this problem on this server with RH 9. Had to "upgrade" the server to get it back to operation. Recommend upping priority and severity to HIGH. Sorry typo. HW is a PowerEdge 2650 with PERC3 raid array. This isn't limited to Red Hat 9 -- with the test-4.1.1 RPMs from ftp.rpm.org, I'm also seeing this on Red Hat 8.0, when using up2date to install or upgrade packages (albeit with a Current 1.4.3 server). (By "this" I mean exactly the type of thing that Alan Cox mentioned in the first post in this bug. It does not happen with RPM 4.1 as shipped with Red Hat 8.0, however.) I can provide more info (and possibly a highly reproducible test case) if needed. Barry: I'm looking for reliable reproducer of the DB_PAGE_NOTFOUND problem, so far nada. I'm not 100% sure this procedure will reproduce the problem, but this procedure roughly resembles how I've made the problem happen on two 8.0 boxes. I'll probably try to verify this exact procedure on a third box later today. 1. Do a fresh RH 8.0 install (or take a box with an existing 8.0 installation). 2. Install the test-4.1.1 RPMs as well as all RH 8.0 errata. (The RH 8.0 errata may not be necessary, but that's what I've tested with so far. It doesn't seem to make a difference whether the test-4.1.1 RPMs are installed before or after the 8.0 errata, or even if it's installed after some errata packages and before others.) 3. Register the machine with RHN or set it up to be a client with a Current server. (I've only tried the latter, yet.) Optionally, reboot into the 2.4.18-27.7.x errata kernel if you have not done so already. 4. rpm -e kernel-utils 5. rpm --rebuilddb 6. up2date kernel-utils 7. If step 6 did not trigger the problem, repeat steps 4-6 again. If that still doesn't do it, try 4-6 again one or two more times, and if that still isn't sufficient, then I must have forgotten some detail somewhere. Some notes: If steps 4 and 5 are swapped (i.e., do step 5 before step 4 each time), the problem does not happen. Similarly, if step 5 is omitted, the problem does not happen. If rpm is used to install the package, instead of up2date, the problem does not happen. AFAIK this procedure makes the problem happen with any package, but I know for a fact that kernel-utils does it. AFAICT, bug 89477 also claims to have a 100% reproducible way to trigger this bug on Red Hat 9, for what that's worth. I haven't tried that procedure however. Here's a slightly different/more specific set of instructions. I just performed these. 1. Perform a Workstation install of Red Hat 8.0 that boots into graphical login by default. I did not modify or customize the default package selection. I used an IDE hard disk for this, although the first system I experienced this bug on used SCSI. 2. Go through the firstboot wizard. Register the system with RHN (Basic entitlement in my case, FWIW). Go through the screens until you reach the long list of errata packages, and click Cancel. (I would also expect it to be OK to cancel at the Channels screen, but I didn't try that.) 3. Once the gdm login prompt appears, Control-Alt-F1 over to a text terminal. (The first time the problem happened on one of our boxes we were logged into X, but I used a text console this time.) 4. Log in as a normal user and bring over the test-4.1.1 packages (I scp'd them over from another machine). 5. Log into another text console as root (you could probably just su, but I did a separate login straight into root without logging out as a normal user). 6. "up2date glibc" -- the test-4.1.1 packages won't install without the glibc errata. (If you'd prefer, you could probably go ahead and install all the errata at this step. That would be come closer to reflecting the setup of the other two machines I saw this problem on. On this machine I only updated glibc for the sake of saving time, though.) 7. "rpm -Fvh ~luser/test-4.1.1/*.8x.i386.rpm" -- replace "luser" with the real username of the normal user account from step 4, of course. 8. "rpm --rebuilddb" 9. "up2date kernel-utils" *BOOM!* 10. In the unlikely event that step 9 didn't exhibit the error, "rpm -e kernel-utils" and repeat steps 8 and 9. I'll go try Red Hat 9 now, and see what I can come up with in terms of a reproducible test case there. I'm seeing what sounds like the same thing on a "stock" (non-updated system). We're just starting to test the RH 9 kickstart, using a boot CD, and an NFS mounted source tree. On two of the first three installs, the system came up with rpm in this "corrupted" state (rpm -qa only lists a subset of the installed rpms, and these DB errors are reported). Even though the third attempt came up clean, it's not giving a good feeling about RH9 An addendum to my previous posts for reproducing this bug: I think if you do not remove "kernel-*" from the package skip list, you might need to run "up2date -f kernel-utils" instead of "up2date kernel-utils". (At least on RH 9, without the -f, if kernel-* is in the pkg. skip list, up2date will just say everything's up to date and will not install the kernel-utils RPM. I didn't try this again on RH 8.0 but it could be the same way.) The steps I performed on RH 8.0 are mostly what's needed to reproduce the bug on RH 9 as well. Here are the main differences: + The glibc errata isn't needed to install the test-4.2 RPMs, AFAICT + (Obviously) install test-4.2 instead of test-4.1.1 + "LD_ASSUME_KERNEL=2.2.5 up2date -f kernel-utils" instead of "up2date -f kernel-utils" -- without the LD_ASSUME_KERNEL, the bug does not happen (unless maybe I boot with nosysinfo, but I didn't try that). The presence or lack thereof of LD_ASSUME_KERNEL for rpm --rebuilddb and rpm -e kernel-utils makes no user-discernable difference, save for the harmless message from the database rebuild without LD_ASSUME_KERNEL. I have no idea if these steps make the bug happen without the test-4.2 RPMs. I didn't try that. Soon (hopefully later tonight) I'll try torturing the test-4.0.5 RPMs to see if they happen to have this bug or anything similar. *** Bug 89477 has been marked as a duplicate of this bug. *** *** Bug 89726 has been marked as a duplicate of this bug. *** Im now getting this on the beta btw, and --rebuilddb isnt fixing it Do you need any input from me? My Red Hat 9.0 box does still have the RPM issues I reported in my bug incident... usually have to reboot to get RPM working again... What happened to the unofficial fix out on ftp://people.redhat.com/jbj ? What am I supposed to do now? Just let RPM blow up once a week and rebuild the DB? Tweeks The packages are now at ftp://ftp.rpm.org/pub/rpm/dist/rpm-4.2.x (for Red Hat 9) or for Red Hat 8.0: ftp://ftp.rpm.org/pub/rpm/dist/rpm-4.1.x *** Bug 102223 has been marked as a duplicate of this bug. *** Has anyone seen this happen on RHEL 3/FC1 or later? (IOW, is anyone still seeing this bug in the wild, on releases that haven't hit end-of-life yet?) The problem still exists, which is why the bug is still open. I personally have never seen the problem still, even after attempting to narrow down a reproducer in order to attempt a fix. Talk to your RHCE trainers... They see it all the time in their classes. I myself have seen them see it several times. :) ... and have seen it around 20 times myself while training others. Tweeks Yes, I have experienced this problem both in FC1 and RHEL 3. The installation of RPMs leaves the __db.00x cache files and leads to the DB_PAGE_NOTFOUND errors. Removing the cache files makes it go away, and described by others. DB_PAGE_NOTFOUND correlates with using rpm built for an NPTL environment being used in a non-NPTL environment. I know of no other problems. The fix is either to build rpm for a non-NPTL environment, or to use rpm in an NPTL environment. The version of rpm built by Red Hat expects a NPTL environment. The error DB_PAGE_NOTFOUND is an indication of cache incoherency, usually fixed by doing rm -f /var/lib/rpm/__db* when the error is seen. WONTFIX because I know of no way to build a single version of rpm that accomodates both NPTL and non-NPTL environments. |