Bug 509605

Summary: preupgrade results in broken system (glibc, kernel, etc. not upgraded)
Product: [Fedora] Fedora Reporter: Kevin R. Page <redhat-bugzilla>
Component: preupgradeAssignee: Seth Vidal <skvidal>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: low    
Version: 10CC: wwoods
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-18 04:36:46 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
upgrade.log
none
anaconda debug dump none

Description Kevin R. Page 2009-07-03 21:04:01 EDT
Created attachment 350466 [details]
upgrade.log

I have just tried upgrading from F10 to F11 with preupgrade (which I've completed successfully on 2 other machines).

preupgrade seemed to run fine, and the upgrade itself (viewed remotely over vnc) also seemed to go fine. After the post-upgrade reboot, however, I couldn't reach the machine.

Gaining access to the machine and rebooting runlevel 3, I first noticed that the grub menu seemed to contain all the same F10 kernels as prior to upgrade, and no F11 kernel. On boot several services failed to start.

More worrying, when I tried to call rpm or yum to work out what kernel packages I had installed, rpm would fail with messages like:
rpmdb page 1 illegal page type or format
rpmdb PANIC: fatal region error detected

Looking at upgrade.log shows a worrying lack of upgraded packages (see attachment). No glibc upgrade, no new kernel.

I can gather more debug by running the install DVD in rescue mode, with direction, but sshd won't start so it can be a bit time consuming.

The only thing I can think of that was slightly unusual was that I'd been testing kernel 2.6.29.1-15.fc10.i686 before upgrade (which I guess has a relatively high version compared to F11?).

Preupgrade seems to think it finished it's job and cleaned up after itself (nothing in cache, grub entry removed).

Help.

1) would the lack of kernel and glibc upgrade be the likely problem?
2) searches suggest a corrupt rpm db might be an issue. Should I try rebuilding by dumping out the Packages list and re-importing (presuming those rpm tools will run)? I'm wary of trying this, given 1), and the possibility of making things worse.
Comment 1 Kevin R. Page 2009-07-03 21:06:42 EDT
Created attachment 350467 [details]
anaconda debug dump

After the preupgrade failure I thought I'd try using an install DVD to upgrade.

The DVD recognised the system as an F10 system(!), the option to update the bootloader was greyed out, and then the installer crashed. Attached is the debug.
Comment 2 Will Woods 2009-07-05 12:59:07 EDT
First off - sorry your system's in a bad state. There are tools for repairing the RPM database in circumstances as dire as these, but I'm not an expert at the repair process (Seth? Any suggestions here?). You might want to try making a backup copy of /var/lib/rpm and running rpm --rebuilddb.. there's also yum-complete-transaction, which you might run just to see if it thinks there's an incomplete transaction in progress.

Second - the DVD installer recognizing the system as F10 makes sense; nowhere in your upgrade.log is fedora-release mentioned. So your system still looks like a F10 system, at least to anaconda.

Finally, let's try to figure out how you got in this mess. The symptoms sound like an incomplete/interrupted upgrade - except the "FINISHED INSTALLING PACKAGES" message is present in the log file and it sounds like the post-install cleanup script ran, and the system rebooted itself, as expected at the end of a normal upgrade. So that's not it.

The only other explanation I can think of is that preupgrade somehow forgot to download (or was interrupted in the process of getting) a bunch of packages - fedora-release, glibc, etc. Do you happen to recall how many packages preupgrade downloaded? Did it seem unusually quick? 

Looking closer I note that yum-plugin-protectbase is installed. How is it configured? Was it installed on the other machines that ran preupgrade successfully? I've never tested with that plugin and it's possible that it interferes with preupgrade's depsolving process - that would explain why preupgrade never bothered trying to get replacement packages for glibc, for instance.

Were there any other extra yum plugins installed?
Comment 3 Kevin R. Page 2009-07-05 20:34:27 EDT
Hi, many thanks for the reply.

(In reply to comment #2)
> You might want to try making
> a backup copy of /var/lib/rpm and running rpm --rebuilddb..

So while rpm --rebuilddb seems to work, queries to rpm following this are still broken with:
rpmdb: page 9: illegal page type or format
rpmdb: PANIC: Invalid argument
rpmdb: /var/lib/rpm/Name: pgin failed for page 9
etc.

--rebuilddb returned a little too fast for my liking - pretty much instantly.

I'd also seen suggested to try:
/usr/lib/rpm/rpmdb_dump Packages.orig | /usr/lib/rpm/rpmdb_load Packages

but no luck here either:
rpmdb_dump: PANIC: fatal region error detected; run recovery
rpmdb_load: PANIC: fatal region error detected; run recovery

(and Packages looks too small ~12K).

> there's also
> yum-complete-transaction, which you might run just to see if it thinks there's
> an incomplete transaction in progress.

Any yum or derivative commands fail with a python traceback ending:
ImportError: No module named rpm


I'm not entirely sure whether my rpm dbs are corrupt, rpm is failing to run because of shared lib issues from the incomplete upgrade, or both. Opinions welcome. rpm still seems to bail out when running from a rescue DVD.

And of course, if there's any advice about resurrecting my rpm db...


> Second - the DVD installer recognizing the system as F10 makes sense; nowhere
> in your upgrade.log is fedora-release mentioned. So your system still looks
> like a F10 system, at least to anaconda.

Yes, that makes sense. However that it can't then upgrade the system - is that an indicator that the rpm db is corrupt?


> Finally, let's try to figure out how you got in this mess. The symptoms sound
> like an incomplete/interrupted upgrade - except the "FINISHED INSTALLING
> PACKAGES" message is present in the log file and it sounds like the
> post-install cleanup script ran, and the system rebooted itself, as expected at
> the end of a normal upgrade. So that's not it.

Yes, it does seem like too few packages were upgraded.

 
> The only other explanation I can think of is that preupgrade somehow forgot to
> download (or was interrupted in the process of getting) a bunch of packages -
> fedora-release, glibc, etc. Do you happen to recall how many packages
> preupgrade downloaded? Did it seem unusually quick? 


Hmmm, so the number 150 sticks in my mind, but I didn't explicitly check for the download number. And of course I've been seeing smaller downloads with preupgade, so I wouldn't say I was calibrated to look for an unusually small number (in hindsight...).

preupgrade definitely seemed happy when it was finished. I think there was an option to reboot now or later.


> Looking closer I note that yum-plugin-protectbase is installed. How is it
> configured?

enabled=0 in /etc/yum/pluginconf.d/protectbase.conf and there are no protect= settings in /etc/yum.repos.d/

I stopped using the plugin some time ago, though it's oversight it wasn't uninstalled completely.


> Was it installed on the other machines that ran preupgrade
> successfully?

No, it wasn't installed on the other machines.


> I've never tested with that plugin and it's possible that it
> interferes with preupgrade's depsolving process - that would explain why
> preupgrade never bothered trying to get replacement packages for glibc, for
> instance.
> 
> Were there any other extra yum plugins installed?  

Not that I remember.

/etc/yum/pluginconf.d/ contains blacklist.conf protectbase.conf refresh-packagekit.conf and whiteout.conf, but it's tricky to ask rpm who those belong to :-/
Comment 4 Kevin R. Page 2009-07-06 04:53:24 EDT
I should have mentioned, my tests in comment #3 were from the bloodied F10 I can boot to in runlevel 3.

I've just tried again using the F11 rescue disk and newly discovered --root=/mnt/sysimage option to rpm.

If I --rebuilddb this way I can then query rpm without error. But (from an rpm -qa) the rpmdb would seem to be empty.

I don't think I did anything rash on discovery of the problem, but the backup I made of /var/lib/rpm has the same empty package list, so I suppose I can't rule out doing something stupid (like maybe an initial --rebuilddb, could that trash it? I knew of the option previously).

From a recovery point of view, I'm thinking:
1) is it safe to use rpm from the rescue DVD? (though extra utilities in /usr/lib/rpm aren't there, so perhaps a live CD)
2) with a safe rpm can I, and how do I, go about saving or rebuilding an rpmdb?
3) would a DVD upgrade be able to cope from there?

I'm a little surprised that preupgrade and anaconda don't check for critical packages like fedora-release (glibc, kernel?) and will go ahead without them.

Final thing: I would hope it wouldn't have an effect, but the install run after preupgrade was over VNC (appended vnc to the grub.conf line preupgrade had created). I'm pretty sure the install completed ok (I remember seeing it reach the end of the progress bar), but being on a vnc client all I really saw was the client disconnect on reboot.
Comment 5 seth vidal 2009-07-06 11:05:31 EDT
Your rpmdb appears to be completely broken. and worse yet no rpm-python is available. I think Will might be on the right track about protectbase, though. It is probably going to be hard to track down the sequence of events which left thing in THIS bad of a place. I'm not even sure how/why anaconda ran.

At this point I'd suggest you backup your data and reinstall.
Comment 6 Kevin R. Page 2009-07-10 12:23:56 EDT
(In reply to comment #5)
> I think Will might be on the right track about protectbase, though.

Which is still a concern as far as working out what went wrong, because although it was installed, it was explicitly disabled.

Also of likely interest: all the RPMs in upgrade log were from updates. None were from the base install.

So it looks like preupgrade failed to get any packages that weren't from the updates repo.


I'm going to attempt to re-generate my rpmbdb [1] using /var/log/rpmpkgs (I'm investigating some kind of --justdb --noscripts --notriggers --nodeps horror). Anyway, as part of this I've had some wget scripts fetching packages from repositories for Fedora versions and updates as appropriate. And when fetching the last batch that were installed as part of the incomplete upgrade I noticed that none of the RPMs from upgrade.log could be fetched from releases/10/Everything/

[1] Yeah, I appreciate why you suggest a clean reinstall, but I'd really like to salvage this system if I can. The hardware is likely to be replaced with x86_64 in 6-12 months and I'm not sure I can face a reinstall now given I'll have to do one then whatever.
Comment 7 Kevin R. Page 2009-09-18 07:05:53 EDT
Question: I've just installed yum-plugin-priorities, which seems similar to protectbase (now uninstalled) on another machine (for jpackage) - is this something I should be concerned about wrt preupgrade in the future?

Update: Though I intend to do a complete re-install in the future, I managed to rebuild the rpmdb as outlined above and get going with a F11 DVD upgrade. I think there was something screwy going on with yum metadata though - after the upgrade I was having problems getting the latest updates with yum... lots of dependency problems.

This was solved with a yum clean metadata.

What was a bit worrying was that the packages that then cleanly updated were those from the main release repo. This was the same set of packages that preupgrade failed to fetch. i.e. both preupgrade, and yum after a DVD upgrade, weren't getting anything that wasn't in updates.

Seems like it might be sensible for preupgrade to clear yum metadata before it gets going?
Comment 8 Bug Zapper 2009-11-18 04:31:08 EST
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 9 Bug Zapper 2009-12-18 04:36:46 EST
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.