Bug 174306
Summary: | 'Unconventional' disk druid installation (RAID/non-RAID) scratches MBR on first disk | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | David Tonhofer <bughunt> | ||||||||
Component: | anaconda | Assignee: | Peter Jones <pjones> | ||||||||
Status: | CLOSED NOTABUG | QA Contact: | Mike McLean <mikem> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4.0 | ||||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i686 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2008-07-24 17:56:48 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
David Tonhofer
2005-11-27 17:04:52 UTC
Created attachment 121519 [details]
Dumps of the MBRs of /dev/hde (bad) and /dev/hdg (good)
Created attachment 121520 [details]
Bad MBR on /dev/hde
Created attachment 121521 [details]
Good MBR on /dev/hdg
Uhhh...there might be something wrong with the disk. It's gone from the disk druid menu now. Maybe one should add SMART diagnostics to Anaconda? Will check hardware now. This problem is getting uncanny, I did a few (7) tests for which the result was as described above, but now I have had a run of installations in which the problem has mysteriously gone away. I have however checked that it is not a *fault* in the hardware (it could be consistent error in the Promise RAID controller for example, which, even if the RAID mode is disabled, still controls the harddisks, which is why they are named /dev/hde and /dev/hdg): I checked both disks, moved disks to another machine of the same type, same problematic behaviour until a few hours ago. I know there *is* something up with the hardware as I have encountered some problems re-reading the MBR after writing it in fdisk. To reiterate on the case 'MBR on /dev/hde' has been destroyed: * Install as described above, using the RAID cloning feature to set up software RAID partitions on "/dev/hde" and "/dev/hdg", modify the partition type of the first two RAID partitions on each disk to 'ext3' to get a place to put /boot onto, bind the remaining partitions into RAID mirrors, format etc. Later repeats say that it is unimportant whether there is a swap partition or not, the problem occurs regardless. * Proceed with installation. No problem until reboot. * After reboot, nothing bootable can be found, i.e. /dev/hde has no longer a valid MBR. * Reboot machine using RH ES 4 installation CD. Disk druid now consistently says that "/dev/hde" is gone (looks like it isn't installed). "/dev/hdg" is visible and correctly partitioned. This "disk invisiblity" problem may or may not be related to the MBR problem, but I suspect some hardware weirdness: * On the console, fdisk *can* find "/dev/hde". The kernel can also find it: In /tmp/syslog, the last message about that drive is "hde: unknown partition table". * I tried to set up a new partitioning on "/dev/hde" with "fdisk", but when writing the partition table, I get "error 5: I/O error. Kernel still uses the old table, the new table wil be used at the next reboot". * I zeroed the disk using 'autoclave' from the Ultimate Boot CD but after a reboot into RH ES 4 installation CD, disk druid could still not see the disk. * I wrote some binary garbage into the MBR of that disk using PTS Disk Editor to check whether the MBR could be properly written. Works. (However, these DOS-based programs cannot be reliably started on the present hardware, as said there is something funny with it) After a reboot into RH ES 4 installation CD, disk druid could still not see the disk. * However, an installation aborted before 'package installation' reveals that the MBRs are correctly written on both disks. Other things tried: Installation as above, with md device cloning, but keep "/boot" installed on a software RAID mirror. After reboot, both disks are good but the machine can't actually boot (i.e. you get the grub command line). No real surprise. I will do more tests, but this begins to look like some unfathomable thing. It's the hardware. There is probably something wrong with the on-board Promise RAID controller. (Promise Technology, Inc. PDC20265 (FastTrak100 Lite/Ultra100) (rev 02)) If I could solder it off, I would. The problem boils down to the fact that a simple 'reboot' of the machine won't work. After a reboot, harddisks cannot be properly read, i.e. /dev/hde seems to have no valid bootsector and even though /dev/hdg at least shows a valid partitioning, booting from it results in GRUB giving out unconventional characters then stopping after printing out "stage2". You actually have to "power cycle" the machine. After that, /dev/hde and /dev/hdg are both visible, booting works (though not off an md device) and the system comes up nicely. Disks have been set up as described on the 2005-11-27, with the /boot on /dev/hdg instead of /dev/hde to make sure a /boot is available. Sigh. It looks like the only interesting problem is: why does disk druid declare that the first harddisk does not exist, even if it does? Well, I guess you may close this bug. Repeat query for resolving bug (NOTABUG) Per final comment, and at reporter's suggestion, we're finally closing this nonbug. |