Bug 88720

Summary: Updated RH box reports DB_PAGE_NOTFOUND accessing rpmdb
Product: [Retired] Red Hat Linux Reporter: Alan Cox <alan>
Component: rpmAssignee: Jeff Johnson <jbj>
Status: CLOSED WONTFIX QA Contact: Mike McLean <mikem>
Severity: high Docs Contact:
Priority: high    
Version: 9CC: barryn, herrold, jtate, mike.dorman, peterbaitz, redhat.com, thomas, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-13 12:59:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alan Cox 2003-04-12 15:39:46 UTC
Description of problem:


   1:perl-DBI               ########################################### [100%]
   2:libstdc++              ########################################### [100%]
   3:compat-libstdc++       ########################################### [100%]
   4:mysql                  ########################################### [100%]
   5:perl-DBD-MySQL         ########################################### [100%]
error: db4 error(-30989) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page
not found
error: db4 error(-30989) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page
not found
The following packages were added to your selection to satisfy dependencies:

Version-Release number of selected component (if applicable):

Red Hat 9, but with an older non NPTL kernel

How reproducible:

Unknown sorry

rpm --rebuilddb seems to have made things happy

Comment 1 Jeff Johnson 2003-04-14 14:35:23 UTC
Yup, there's some sort of cache problem, have no reproducer.

Doing rm -f /var/lib/rpm/__db* should fix, is cheaper than --rebuilddb.

Comment 2 Alan Cox 2003-04-14 14:41:05 UTC
Happens regularly on this box with up2date/rpm.

VIA Cyrix III 533
Single processor
non NPTL kernel

I've seen some similar db3 funnies with other applications on RH9 notably
phpwiki. May
be co-incidence of course. phpwiki is fine on RH8, but on RH9 it gets random db3 
corruptions



Comment 3 Jeff Johnson 2003-04-14 15:18:49 UTC
Does phpwiki use a dbenv? If so, the problems are likelier
to be related.

If you can characterize "regularly", I can try to reproduce.
I haven't yet gotten anything coherent enough to attempt reproducing.
I've never seen the error myself, but then I don't use up2date very much.

Comment 4 Alan Cox 2003-04-14 15:21:59 UTC
phpwiki doesnt appear to use dbenv, it sees corruption (page allocated twice) in
the main
db instead. With 8.0 it was solid. Its a bit of pain because live updating to
RH9 is ok, live backing down to RH8 is going to be painful.

With the RPM stuff we've seen it 5 or 6 times so far on this box. That may in
part be because of the way we abuse up2date a lot. Basically we take the live
box for ftp*.linux.org.uk, link all
the RPM pakcages from the ftp archive into /var/spool/up2date so that it thinks
they are downloaded then up2date -u between releases. This seems to quite
reliably break it at
least once.



Comment 5 Jeff Johnson 2003-04-14 15:33:51 UTC
Thanks, that's coherent enough to attempt to reproduce.

phpwiki uses db-4.1.25 or db-4.0.14?

Comment 6 Jeff Johnson 2003-04-14 15:36:13 UTC
In fact, there's a bug in db-4.0.14 when deleting and then re-adding
an item from the same exec, a cache page was not marked dirty.

Fized in rpm, and (I believe) db-4.1.25.

Comment 7 Alan Cox 2003-04-14 15:46:08 UTC
Whatever PEAR is loading for it. Its php, that means its "magic" 8)


Comment 8 alex kramarov 2003-04-19 19:09:04 UTC
same here, during up2date i see these 2 messages (DB_PAGE_NOTFOUND), started 
after using non-NPTL kernel. in addition, on another machine (no-non standard 
rpms, normal kernel) - i see rpm stuck on uninstalling multiple packages :

rpm -e glibc-devel-2.3.2-27.9 ncurses-devel-5.3-4 python-devel-2.2.2-26 compat-
libstdc++-devel-7.3-2.96.118 compat-gcc-7.3-2.96.118 gcc-3.2.2-5 compat-gcc-
c++-7.3-2.96.118

attaching strace to the process shows it's waiting on futex :

[root@w6 root]# strace -p 5966
futex(0x405b3f20, FUTEX_WAIT, 0, NULL

CTL-c doesn't work, only kill -9. after that the __db.00<x> files are left 
in /var/lib/rpm

normal, latest kernel :

[root@w6 rpm]# rpm -qa |grep kernel
kernel-2.4.20-9
kernel-smp-2.4.20-9


Cmon, rpm problems ? a distro is not a distro if it doesn't do package 
management well ! rpm 4.0.4 was fine, have you heard of a phrase "if it ain't 
broken, don't fix it" ?

Comment 9 Joseph Tate 2003-04-21 19:50:44 UTC
This was after installing a new package from RPM.

I got this error when doing a rpm -qa:
error: db4 error(-30989) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page
not found

After doing a rpm --rebuilddb (which the first time gave me the above error plus
"error: db4 error(16) from dbenv->remove: Device or resource busy", 127 RPMS are
no longer in my rpm database.  Big important rpms like coreutils, glibc, bash,
info, zlib, etc.  Thus all dependencies fail when trying to install new packages.

[root@cheetah RPMS]# rpm -ivh nptl-devel-2.3.2-11.9.i686.rpm
warning: nptl-devel-2.3.2-11.9.i686.rpm: V3 DSA signature: NOKEY, key ID db42a60e
error: Failed dependencies:
        glibc-devel = 2.3.2-11.9 is needed by nptl-devel-2.3.2-11.9
[root@cheetah RPMS]# rpm -i glibc-devel-2.3.2-11.9.i386.rpm
warning: glibc-devel-2.3.2-11.9.i386.rpm: V3 DSA signature: NOKEY, key ID db42a60e
error: Failed dependencies:
        /bin/sh is needed by glibc-devel-2.3.2-11.9
        /sbin/install-info is needed by glibc-devel-2.3.2-11.9
        glibc = 2.3.2 is needed by glibc-devel-2.3.2-11.9
        kernel-headers is needed by glibc-devel-2.3.2-11.9
        kernel-headers >= 2.2.1 is needed by glibc-devel-2.3.2-11.9

Not good.

I'm using a stock RH 9 with all updates installed.
Dell PowerEdge 2550 on PERC3 raid array.  This is the second time I have seen
this problem on this server with RH 9.  Had to "upgrade" the server to get it
back to operation.

Recommend upping priority and severity to HIGH.

Comment 10 Joseph Tate 2003-04-21 20:18:28 UTC
Sorry typo.  HW is a PowerEdge 2650 with PERC3 raid array.

Comment 11 Barry K. Nathan 2003-04-24 20:52:48 UTC
This isn't limited to Red Hat 9 -- with the test-4.1.1 RPMs from ftp.rpm.org,
I'm also seeing this on Red Hat 8.0, when using up2date to install or upgrade
packages (albeit with a Current 1.4.3 server). (By "this" I mean exactly the
type of thing that Alan Cox mentioned in the first post in this bug. It does not
happen with RPM 4.1 as shipped with Red Hat 8.0, however.)

I can provide more info (and possibly a highly reproducible test case) if needed.

Comment 12 Jeff Johnson 2003-04-24 21:12:27 UTC
Barry: I'm looking for reliable reproducer of the DB_PAGE_NOTFOUND problem,
so far nada.

Comment 13 Barry K. Nathan 2003-04-24 21:54:09 UTC
I'm not 100% sure this procedure will reproduce the problem, but this procedure
roughly resembles how I've made the problem happen on two 8.0 boxes. I'll
probably try to verify this exact procedure on a third box later today.

1. Do a fresh RH 8.0 install (or take a box with an existing 8.0 installation).

2. Install the test-4.1.1 RPMs as well as all RH 8.0 errata. (The RH 8.0 errata
may not be necessary, but that's what I've tested with so far. It doesn't seem
to make a difference whether the test-4.1.1 RPMs are installed before or after
the 8.0 errata, or even if it's installed after some errata packages and before
others.)

3. Register the machine with RHN or set it up to be a client with a Current
server. (I've only tried the latter, yet.) Optionally, reboot into the
2.4.18-27.7.x errata kernel if you have not done so already.

4. rpm -e kernel-utils

5. rpm --rebuilddb

6. up2date kernel-utils

7. If step 6 did not trigger the problem, repeat steps 4-6 again. If that still
doesn't do it, try 4-6 again one or two more times, and if that still isn't
sufficient, then I must have forgotten some detail somewhere.

Some notes:

If steps 4 and 5 are swapped (i.e., do step 5 before step 4 each time), the
problem does not happen. Similarly, if step 5 is omitted, the problem does not
happen. If rpm is used to install the package, instead of up2date, the problem
does not happen. AFAIK this procedure makes the problem happen with any package,
but I know for a fact that kernel-utils does it.

AFAICT, bug 89477 also claims to have a 100% reproducible way to trigger this
bug on Red Hat 9, for what that's worth. I haven't tried that procedure however.

Comment 14 Barry K. Nathan 2003-04-25 00:17:14 UTC
Here's a slightly different/more specific set of instructions. I just performed
these.

1. Perform a Workstation install of Red Hat 8.0 that boots into graphical login
by default. I did not modify or customize the default package selection. I used
an IDE hard disk for this, although the first system I experienced this bug on
used SCSI.

2. Go through the firstboot wizard. Register the system with RHN (Basic
entitlement in my case, FWIW). Go through the screens until you reach the long
list of errata packages, and click Cancel. (I would also expect it to be OK to
cancel at the Channels screen, but I didn't try that.)

3. Once the gdm login prompt appears, Control-Alt-F1 over to a text terminal.
(The first time the problem happened on one of our boxes we were logged into X,
but I used a text console this time.)

4. Log in as a normal user and bring over the test-4.1.1 packages (I scp'd them
over from another machine).

5. Log into another text console as root (you could probably just su, but I did
a separate login straight into root without logging out as a normal user).

6. "up2date glibc" -- the test-4.1.1 packages won't install without the glibc
errata. (If you'd prefer, you could probably go ahead and install all the errata
at this step. That would be come closer to reflecting the setup of the other two
machines I saw this problem on. On this machine I only updated glibc for the
sake of saving time, though.)

7. "rpm -Fvh ~luser/test-4.1.1/*.8x.i386.rpm" -- replace "luser" with the real
username of the normal user account from step 4, of course.

8. "rpm --rebuilddb"

9. "up2date kernel-utils"  *BOOM!*

10. In the unlikely event that step 9 didn't exhibit the error, "rpm -e
kernel-utils" and repeat steps 8 and 9.

I'll go try Red Hat 9 now, and see what I can come up with in terms of a
reproducible test case there.

Comment 15 Peter Fales 2003-04-25 13:47:39 UTC
I'm seeing what sounds like the same thing on a "stock" (non-updated system). 
We're just starting to test the RH 9 kickstart, using a boot CD, and an NFS
mounted source tree.  On two of the first three installs, the system came up
with rpm in this "corrupted" state (rpm -qa only lists a subset of the installed
rpms, and these DB errors are reported).  Even though the third attempt came up
clean, it's not giving a good feeling about RH9

Comment 16 Barry K. Nathan 2003-04-26 00:22:02 UTC
An addendum to my previous posts for reproducing this bug:

I think if you do not remove "kernel-*" from the package skip list, you might
need to run "up2date -f kernel-utils" instead of "up2date kernel-utils". (At
least on RH 9, without the -f, if kernel-* is in the pkg. skip list, up2date
will just say everything's up to date and will not install the kernel-utils RPM.
I didn't try this again on RH 8.0 but it could be the same way.)

Comment 17 Barry K. Nathan 2003-04-26 00:43:32 UTC
The steps I performed on RH 8.0 are mostly what's needed to reproduce the bug on
RH 9 as well. Here are the main differences:

+ The glibc errata isn't needed to install the test-4.2 RPMs, AFAICT
+ (Obviously) install test-4.2 instead of test-4.1.1
+ "LD_ASSUME_KERNEL=2.2.5 up2date -f kernel-utils" instead of "up2date -f
kernel-utils" -- without the LD_ASSUME_KERNEL, the bug does not happen (unless
maybe I boot with nosysinfo, but I didn't try that). The presence or lack
thereof of LD_ASSUME_KERNEL for rpm --rebuilddb and rpm -e kernel-utils makes no
user-discernable difference, save for the harmless message from the database
rebuild without LD_ASSUME_KERNEL.

I have no idea if these steps make the bug happen without the test-4.2 RPMs. I
didn't try that.

Soon (hopefully later tonight) I'll try torturing the test-4.0.5 RPMs to see if
they happen to have this bug or anything similar.

Comment 18 Jeff Johnson 2003-06-25 15:24:44 UTC
*** Bug 89477 has been marked as a duplicate of this bug. ***

Comment 19 Jeff Johnson 2003-06-25 16:53:39 UTC
*** Bug 89726 has been marked as a duplicate of this bug. ***

Comment 20 Alan Cox 2003-08-14 09:56:46 UTC
Im now getting this on the beta btw, and --rebuilddb isnt fixing it


Comment 21 Peter Baitz 2003-08-14 14:05:40 UTC
Do you need any input from me?  My Red Hat 9.0 box does still have the RPM
issues I reported in my bug incident... usually have to reboot to get RPM
working again...

Comment 22 Tom Weeks 2003-08-25 23:08:30 UTC
What happened to the unofficial fix out on ftp://people.redhat.com/jbj ?  What
am I supposed to do now?  Just let RPM blow up once a week and rebuild the DB?

Tweeks

Comment 23 Barry K. Nathan 2003-08-26 18:24:51 UTC
The packages are now at
ftp://ftp.rpm.org/pub/rpm/dist/rpm-4.2.x
(for Red Hat 9)
 
or for Red Hat 8.0:
ftp://ftp.rpm.org/pub/rpm/dist/rpm-4.1.x

Comment 24 Alan Cox 2003-11-12 00:21:52 UTC
*** Bug 102223 has been marked as a duplicate of this bug. ***

Comment 25 Barry K. Nathan 2004-07-14 14:38:52 UTC
Has anyone seen this happen on RHEL 3/FC1 or later? (IOW, is anyone
still seeing this bug in the wild, on releases that haven't hit
end-of-life yet?)

Comment 26 Jeff Johnson 2004-07-14 14:50:29 UTC
The problem still exists, which is why the bug is still open.

I personally have never seen the problem still, even after
attempting to narrow down a reproducer in order to attempt a fix.

Comment 27 Tom Weeks 2004-08-30 23:16:19 UTC
Talk to your RHCE trainers... They see it all the time in their 
classes.  I myself have seen them see it several times. :) ... and 
have seen it around 20 times myself while training others.

Tweeks

Comment 28 Mike Dorman 2004-11-12 20:45:55 UTC
Yes, I have experienced this problem both in FC1 and RHEL 3.  The 
installation of RPMs leaves the __db.00x cache files and leads to the 
DB_PAGE_NOTFOUND errors.  Removing the cache files makes it go away, 
and described by others.


Comment 29 Jeff Johnson 2004-12-13 12:59:10 UTC
DB_PAGE_NOTFOUND correlates with using rpm built
for an NPTL environment being used in a non-NPTL
environment. I know of no other problems.

The fix is either to build rpm for a non-NPTL environment,
or to use rpm in an NPTL environment. The version of rpm
built by Red Hat expects a NPTL environment.

The error DB_PAGE_NOTFOUND is an indication of cache
incoherency, usually fixed by doing
    rm -f /var/lib/rpm/__db*
when the error is seen.

WONTFIX because I know of no way to build a single
version of rpm that accomodates both NPTL and non-NPTL
environments.