Bug 230362
Summary: | rpmdb: page 50: illegal page type or format | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Orion Poplawski <orion> |
Component: | rpm | Assignee: | Panu Matilainen <pmatilai> |
Status: | CLOSED RAWHIDE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rawhide | CC: | herrold, pterjan, triage |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | bzcl34nup | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-04-03 19:27:51 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Orion Poplawski
2007-02-28 16:10:15 UTC
This is likely an inconsistent cache. Doing cd /var/lib/rpm rm -f __db* rpm -qa -vv should fix (and verify) the problem. Did following the recovery procedure work for you? (In reply to comment #2) > Did following the recovery procedure work for you? It did. Recovery fixed. Fresh install from 20070425, then: Running Transaction Installing: glib2-devel ####################### [ 1/34] Installing: dbus-devel ####################### [ 2/34] Installing: elfutils-libelf-devel ####################### [ 3/34] Installing: cairo-devel ####################### [ 4/34] Installing: pango-devel ####################### [ 5/34] Installing: elfutils-devel ####################### [ 6/34] Installing: hal-devel ####################### [ 7/34] Installing: dbus-glib-devel ####################### [ 8/34] Installing: atk-devel ####################### [ 9/34] Installing: libIDL-devel ####################### [10/34] Installing: libxml2-devel ####################### [11/34] Installing: libXi-devel ####################### [12/34] Installing: tcp_wrappers-devel ####################### [13/34] Installing: sqlite-devel ####################### [14/34] Installing: e2fsprogs-devel ####################### [15/34] Installing: krb5-devel ####################### [16/34] Installing: openssl-devel ####################### [17/34] Installing: indent ####################### [18/34] Installing: ORBit2-devel ####################### [19/34] Installing: GConf2-devel ####################### [20/34] Installing: libsepol-devel ####################### [21/34] Installing: libselinux-devel ####################### [22/34] Installing: expat-devel ####################### [23/34] Installing: neon-devel ####################### [24/34] rpmdb: page 417: illegal page type or format rpmdb: PANIC: Invalid argument error: db4 error(-30977) from dbcursor->c_get: DB_RUNRECOVERY: Fatal error, run database recovery error: error(-30977) getting "neon" records from Requirename index rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from dbcursor->c_get: DB_RUNRECOVERY: Fatal error, run database recovery error: error(-30977) getting "openssl-devel" records from Requirename index rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from dbcursor->c_get: DB_RUNRECOVERY: Fatal error, run database recovery .... and so on. Removing the __db* files and running rpm -qa -vv recovers somewhat. But if I re-run the yum command: [root@cynosure devel]# yum -y install net-snmp-devel gettext-devel ncurses-devel tcp_wrappers-devel gtk2-devel gnome-vfs2-devel Loading "installonlyn" plugin Setting up Install Process Parsing package install arguments primary.xml.gz 100% |=========================| 820 kB 00:00 developmen: ################################################## 2312/2312 primary.xml.gz 100% |=========================| 1.1 kB 00:00 macromedia: ################################################## 3/3 primary.sqlite.bz2 100% |=========================| 2.2 MB 00:00 Resolving Dependencies --> Running transaction check ---> Package gnome-vfs2-devel.i386 0:2.18.1-2.fc7 set to be updated Checking deps for gnome-vfs2-devel.i386 0-2.18.1-2.fc7 - u ---> Package ncurses-devel.i386 0:5.6-6.20070303.fc7 set to be updated Checking deps for ncurses-devel.i386 0-5.6-6.20070303.fc7 - u ---> Package gettext-devel.i386 0:0.16.1-7.fc7 set to be updated Checking deps for gettext-devel.i386 0-0.16.1-7.fc7 - u ---> Package net-snmp-devel.i386 1:5.4-13.fc7 set to be updated Checking deps for net-snmp-devel.i386 1-5.4-13.fc7 - u ---> Package gtk2-devel.i386 0:2.10.11-5.fc7 set to be updated Checking deps for gtk2-devel.i386 0-2.10.11-5.fc7 - u --> Processing Dependency: net-snmp = 1:5.4 for package: net-snmp-devel --> Processing Dependency: beecrypt-devel for package: net-snmp-devel --> Processing Dependency: rpm-devel for package: net-snmp-devel --> Processing Dependency: libXinerama-devel for package: gtk2-devel --> Processing Dependency: lm_sensors-devel for package: net-snmp-devel --> Restarting Dependency Resolution with new changes. --> Running transaction check ---> Package libXinerama-devel.i386 0:1.0.2-1.fc7 set to be updated Checking deps for libXinerama-devel.i386 0-1.0.2-1.fc7 - u ---> Package net-snmp.i386 1:5.4-13.fc7 set to be updated Checking deps for net-snmp.i386 1-5.4-13.fc7 - u ---> Package rpm-devel.i386 0:4.4.2-40.fc7 set to be updated Checking deps for rpm-devel.i386 0-4.4.2-40.fc7 - u ---> Package lm_sensors-devel.i386 0:2.10.3-2.fc7 set to be updated Checking deps for lm_sensors-devel.i386 0-2.10.3-2.fc7 - u ---> Package beecrypt-devel.i386 0:4.1.2-12 set to be updated Checking deps for beecrypt-devel.i386 0-4.1.2-12 - u --> Processing Dependency: neon-devel for package: rpm-devel --> Finished Dependency Resolution Error: Missing Dependency: neon-devel is needed by package rpm-devel [root@cynosure devel]# rpm -q neon-devel neon-devel-0.25.5-6 So I'm in some kind of bad state. Also, aren't we worried more about preventing this corruption in the first place? "WORKSFORME" seems like sticking our heads in the sand. I've got the old /var/lib/rpm directory from after the crash and before the recovery if interested. The easiest fix is rpm -e rpm-devel The rpm-devel package is not needed to use yum (or /bin/rpm). The mmap problem in 2.6.18->2.6.19 kernel's is almost certainly the cause of all the problems. Yet More forensic root cause analysis will not help. (In reply to comment #6) > The mmap problem in 2.6.18->2.6.19 kernel's is almost certainly the cause of all the problems. The mmap problem supposedly fixed in 2.6.20-rc3? Then why am I seeing it with 2.6.20-1.3104.fc7 which is supposedly based on 2.6.21-rc7-git5? Is it different issue, or did it not really get fixed? Things are pretty hosed for me now. I can do a few things after a "rpm -qa -vv" but I quickly get corruption again. I removed neon-devel (which triggered a corruption) then I could re-run the yum transaction (again corruption). Finally did "rpm --rebuilddb" and we'll see if that makes me more stable. So far so good... Dunno whether your problem is same or different than the mmap problem fixed in the kernel. The number of broken database reports has declined dramatically since the mmap problem was fixed in January. That's usually indicative of a fix, since rpmdb's are accessed daily everywhere. rpmdb corruption usually is related to NPTL locking being broken. So check your glibc installation and try a couple of kernels to see if the problem tracks with a kernel. Note that rpmdb problems can be persistent, so I'd suggest removing /var/lib/rpm/__db* files when you switch kernels. Just got this again after another fresh install on 20070508. No one else is having these problems. Have you tried rpm on a different box? Another install (Rawhide from 20070511) resulted in rpm corruption during the anaconda install. Then did another and got: Running Transaction Installing: elfutils-libs ######################### [1/5] error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index error: db4 error(-30987) from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found error: error(-30987) getting "" records from Provideversion index Installing: elfutils ######################### [2/5] Installing: rpm-build ######################### [3/5] Installing: redhat-rpm-config ######################### [4/5] Installing: rpmdevtools ######################### [5/5] Installed: rpmdevtools.noarch 0:5.3-1.fc6 Dependency Installed: elfutils.i386 0:0.127-1.fc7 elfutils-libs.i386 0:0.127-1.fc7 redhat-rpm-config.noarch 0:8.0.45-15.fc7 rpm-build.i386 0:4.4.2-46.fc7 Complete! during the first yum install after system came up. Ran memtest86+ overnight with no errors. I'm going to try to do more installs on other machines, but I don't have a lot available. Just did an install on a fairly different machine (Dual Athlon AMD-760/8 based machine using LVM vs Dell Inspiron 4150 no LVM) and got the same RPM corruption during my first yum install on the new system. Install method is the same - pxe boot, nfs install + multiple http repositories. Are all the machines x86_64? What is different about what you are using than others? There's a patch in bugzilla that will use fcntl rather than NPTL locking. I doubt you will be able to detect any difference: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=141614 but DB_PAGE_NOTFOUND is usually inconsistent cache because posix_mutexes through NPTL is fubar'd somehow. Switching to fcntl locking will avoid the fubar'age. The patch likely applies with little or no problem, little has changed. FWIW, I just saw a CentOS5/x86_64 failure that is not dissimilar to your failures. Too early to tell hardware vs software failure, but the box was just fine last week, so ... butt ugly failure, idn't it? Well, I thought things had gotten better for a while, but I've just reproduced with F7 final. We'll see how many other machines/people experience this... Orion - have you run memtest86+ on this box at all? What fs are you running, if this has just occured can you backup /var/lib/rpm and put on a website somewhere for analysis. When you say you are using multiple http repositories is anything in your package set potentially messing with /var/lib/rpm in scriptlets or doing rpm operations in scriptlets, can you reproduce on a http install of just f7 with no additional repos. memtest86+ checks out clean. I'm running ext3 on all my filesystems. I've got a n older copy of /var/lib/rpm here: http://www.cora.nwra.com/~orion/rpm.tar.gz This is from a failure on 4/25 and is from immediately after the failure. rpm -qa --scripts | grep rpm does not turn up anything suspicious. Nothing in my kickstarts scripts does anything with rpm. I'll try to reproduce with just F7. Okay, just did an install with just the F7 Fedora http repo. The /var/lib/rpm from just after install is at http://www.cora.nwra.com/~orion/rpm-orig.tar.gz Then I installed cfengine via yum and set up my extra repositories. Then I tried to install via yum all off the packages that I use that weren't in the Fedora repo. This failed during package download due to space on /var. I removed some packages to make room in /var and then did: rpm -Uvh AdobeReader_enu-7.0.9-1.i386.rpm fsplit-5.5-1.fc7.i386.rpm msttcorefonts-1.3-4.noarch.rpm RealPlayer-10.0.8.805-20060718.i586.rpm and removed those to make more room. Then I re-ran the yum install and that failed. /var/lib/rpm from after that is here: http://www.cora.nwra.com/~orion/rpm-bad.tar.gz some of the errors: pmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from db->get: DB_RUNRECOVERY: Fatal error, run database recovery rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from db->cursor: DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Basenames index using db3 - (-30977) xargs: yum: terminated by signal 11 the start of the errors has scrolled past by console buffer. I'm still seeing rpm database corruption fairly frequently during installs, both of current rawhide and "respun" Fedora 7 + updates. I have not seen corruption occurring after install for a while. If you can think of any kind of useful debugging instrumentation that could be put into the installer, I'd be happy to try it out. User pnasrat's account has been closed Reassigning to owner after bugzilla made a mess, sorry about the noise... I'm still seeing rpm database corruption fairly frequently during installs of current rawhide. If you can think of any kind of useful debugging instrumentation that could be put into the installer, I'd be happy to try it out as I seem to be able to reproduce pretty easily and it's causing me much grief. In fact, the last 3 times I've tried to install rawhide on my test laptop, it has failed with rpmdb corruption all 3 times. centos look in per jbj request /./ looks like much RawHide activity; possibly one of the background rpmdb accessing processes is long running and causing issues? We are just not seeing this on very similar hardware, and rather heavy RPMdb transactions. Also, are there any local archive RPM's in the mix which may be in play? -- Russ Herrold Random idea of the day, inspired by somewhat similar looking bug (#375931): are you by any chance installing i386 version on x86_64 hardware? Just trying to figure out *some* common factor in these cases... Nope, these are generally i386(i686) installs. I can get the corruption without any local repos as well. Ok, so much for that idea... One thing you could try is building rpm with this patch: http://fedora-arm.wantstofly.org/diffs-f/rpm-4.4.2.1-10.fc8/rpm-4.4.2-always-mlock.patch I'm told it helps to prevent db env corruption on ARM, if it does so on other archs too it'd at least be a good datapoint. I tried the above rpm patch without success. YMMV. Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. I have not seen this recently. I'll close. |