RH 7.1 RC 1 partially bad media recovery ----- Message Text ----- Anaconda is having a problem with just PART of the MM CD: The file /mnt/source/RedHat/RPMS/ypbind-1.7-5.i386.rpm cannot be opened -- This isdue to a missing file, a bad package, or bad media. Press <return> to try again <OK> --------------------------- RFE: Needs a (SKIP this file) option Rationale: As long as we identify an error condition of this type, offer to SKIP over it -- perhaps a "GIVE UP" option or other way to gracefully terminate the partial install with a proper reboot to umount cleanly the partial system -- although this is more questionable, because the system may well be non-bootable (I am mid-upgrade [this is on a 486/66 and has been running over an hour, so I am a bit distressed, and would like to SKIP at this point]-- we'll see ...)
This really is not an option - it can definately lead to a system with lurking disasters waiting to happen. Bad media just happens :(
Lots of present options also can lead to messed up systems -- the 'proceed without satisfying dependencies' options during resolution phase in TUI and GUI installs; the decision NOT to make a recovery floppy; ... I mentioned this on testers list but had not appended it in here ...
PLEASE reconsider ... there is just NO good recovery option available for a failure mid-upgrade as things are now with media discovered to be defective mid-process: Date: Thu, 12 Jul 2001 00:10:07 -0400 (EDT) From: R P Herrold <herrold> Cc: Red Hat Beta Team <testers-list> Subject: [testers] My first beta install failed David L. Gehrt wrote > The pause while awaiting the "running anaconda" seem excessively wrong > not to have a progress indicator. > XFree86-libs-4.0.3 failed due to missing file,bad package..." hit return > and it succeeded > XFree86-100dpi-fonts-4.0.3 failed due to missing file,bad package..." hit > return twice and it succeeded See: http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=30029 which was closed 'WONT" from the last test cycle -- Also on well -used 7.1 'official' CD's today, I got a media error. the ONLY options is 'retry' -- there just HAS to be a 'skip' option to get PAST defective media. > These last two comments SOUND like the CD problems I am reading about as > I catch up on my testers-list email. The first CD booted OK, but then > these two transient CD reading problems were not particularly serious. but thre is no graceful recovery option to allow getting PAST this in a 3/4 complete upgrade or install situation ... I'll reopen 30029 and post this in that Bugzilla -- Russ
I'm not sure what type of graceful recovery is possible when you have errors mid-way through an install/upgrade. I'm guessing you mean there should be a button to push to shutdown the machine cleanly?
Well, yes, but why not make it more useful? Undestand that, particularly in the case of an upgrade, a partial update, and also a non-graceful shutdown is very painful to recover from ... almost without exception (glibc package failure), it is better to limp through to the end, and then audit with rpm -Va and patch. Suggestions: ==================== 1. Offer to SKIP the unreadible package, and dump a note in the install/upgrade log -- This alerts the sysadm that there was a problem, which may need manual intervention. 2. At a minimum, in the case of INSTALLS only, offer a shutdown which properly umounts the (then currrently mounted) installation partitions. At this point, sometimes, an ''upgrade'' can be used to 'complete the install. AGAIN, log the fact to the install/upgrade log -- so that one can do a port-mortem and figure out what went wrong. [This actually provokes an idea on my part -- Why not 'tee' in all instances anaconda tracebacks, in the case of install failures, into /tmp/anaconda-traceback.txt, and/or the install/upgrade log, so that beta testers, and indeed customers finding errors, can give better reports, even if a proper floppy, with proper vfat modules, is not at hand. -- will RFE separately]
I think Suggestion #1 is a bad idea, because we don't have a way of knowing the importance of a given package at that time. What if that package is the kernel or glibc? If you skip that package, the system is hosed, and you've wasted your time with the rest of the install. I mean, knowing that the install media has some corruptions and wanting to go on with the install anyway seems pretty crazy to me. We get a fair number of requests for this, and I don't really understand why it's such a big deal. It's *much* easier to verify that your install media is good before you start the install than it is to catch the countless number of ways the installer can fail on bad media. If the user wants to go through the trouble of downloading and burning their own cd, then I think they bear some responsibility to make sure the download is good before they start the install. If there are defects with the retail cd's, the right thing to do is to take it back to the store and exchange it, not go on installing with bad media.
Re-opened after 7.2 release (taking out of Deferred state) ============================ The point is being missed: I had this occur in an ISP environment doing an upgrade; the upgrade, from Official RH CD's failed half way through an UC -- GLIBC had overwritten the prior version, but the new keneeernel (required by that glibc) was not yet in place. The ypbind which was aprt of the BASE in the upgrade was destined to be removed just as soon as the install was complete. Sure -- I can envision failure modes which are not recoverable -- but if any but about 5 or 6 packages fail, it is not fatal, just possibly uncomfortable -- Why increase teh pain by omitting a Skip. or even a graceful shutdown; Isn't the pain of a powerswitch shutdown much more thn a possibly missing package dependency?
Just got bitten again on my personal workstation with an AIC-7xxx and SCSI CD drive combo -- with CD's which work consistently on IDE drives. It's not the hardware; it's not the disks; it is SCSI controller resets this time-- and because there is no graceful umount option, or skip option, I'm screwed. If the MBR changes had happened, but 'not enough' to be able to get booted with a new kernel, I'd REALLY be in hot water. This (error recovery) _is_ a big deal, if one wants to be taken seriously in a marketplace. The Adaptec SCSI driver issues are most likely not going away.
Uggh ... actually, as it turned out, in trying to go from RH 7.1 to RH 7.2 and choosing to convert to ext3, it editt4ed teh fstab and send to a LABEL=/mountpoint change in the partition information on the HD in question. Also, it changed the fs type to ext3 and committed the change to the /etc/fstab and the remaining RH 7.1 kernel, NOT having been updated yet, was unable to mount ext3 FS type ... Took about 45 minutes to recover it. (attchment in a second)
Created attachment 37663 [details] snapshot of fstab and fdisk as alluded to ...
Deferred to future release.
Honest, I had nothing to do with this -- this guy is in the local LUG ============================================================= Date: Fri, 10 May 2002 14:03:50 -0400 (EDT) From: Scott Merrill <skippy> Reply-To: colug To: colug Subject: [COLUG] RH 7.3 installation I upgraded a 7.2 system, and elected to add a few extra packages. Since I only used the first two CDs for my 7.2 installation, I foolishly assumed I could get by with just the first two discs from 7.3. What a mistake that was. I ended up rebooting since the installer did not provide a convenient way to stop, and my CD burner was in the system I was upgrading! Thankfully the system rebooted successfully, and everything appears to be in order (sans those packages from disc 3). Who would I talk to in order to suggest that the Red Hat installer indicate which CD a particular package is on? Barring that, I'd like to suggest a way to cleanly abort the request for the next CD, if possible. _______________________________________________ colug mailing list colug ========================= Re-opening for the next cycle.
*** Bug 72481 has been marked as a duplicate of this bug. ***
new datapoint FYI -- Upgrade scenario: with post-limbo (Psyche boxed set image) CD's passing mediachacek, disk 2 on current production IDE based Dell hardware came up not reading. I infer the read pattern of mediacheck is sustained and linear; the read pattern of an install is sporadic with spindowns and random media location seeks, and by definition cannot be fully tested with all the possible custom install combinations; Upgrades would be even MORE random. and so another piece of media needed to be pulled and burned on a different burner (which worked, the host being offline for several hours during the process) A clean umount 'Abandon' option alone for a graceful shutdown when errors are detected (if not both the Skip and Abandon) sure makes sense.
*** This bug has been marked as a duplicate of 68376 ***
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.