Bug 174306

Summary: 'Unconventional' disk druid installation (RAID/non-RAID) scratches MBR on first disk
Product: Red Hat Enterprise Linux 4 Reporter: David Tonhofer <bughunt>
Component: anacondaAssignee: Peter Jones <pjones>
Status: CLOSED NOTABUG QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 17:56:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Dumps of the MBRs of /dev/hde (bad) and /dev/hdg (good)
none
Bad MBR on /dev/hde
none
Good MBR on /dev/hdg none

Description David Tonhofer 2005-11-27 17:04:52 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7

Description of problem:
I have a Primergy L100 with two harddisks /dev/hde and /dev/hdg 
(these are the only disks on that system and these are factory settings)

(I have a setup similar to the one below on a Scaleo, but that one was
not set up through the installation procedure but 'after the fact' once
the /boot RAID broke.)

I want to install these disks as follows:

                /dev/hde1  /dev/hde2    /dev/hde3    /dev/hde4
               +----------+------------+------------+--------------+
/dev/hde --->  | /boot    |  md-RAID1  |  md-RAID1  | md-RAID1     |
               +----------+------------+------------+--------------+

                /dev/hdg1  /dev/hdg2    /dev/hdg3    /dev/hdg4
               +----------+------------+------------+--------------+
/dev/hdg --->  | /boot2   |  md-RAID1  |  md-RAID1  | md-RAID1     |
               +----------+------------+------------+--------------+

                          +------------+------------+--------------+
forming:                  |  /dev/md0  |  /dev/md1  | /dev/md2     |
                          +------------+------------+--------------+

                          +------------+------------+--------------+
onto which I put:         |   swap 1   |    swap 2  |   LVM        |
                          +------------+------------+--------------+

                                                    +--------------+
and onto that:                                      | root fs etc. |
                                                    +--------------+

That is, I do not want to have the /boot partition on an md
device (I have had a few problems with that approach) but the
rest I *do* want to have on a mirrored md device (incl. the swap
area).

How do I set this up?

* When disk druid comes up, configure the four parts above as 
  RAID1 devices on /dev/hde. Just 'force' the first part to be 
  a primary partition.

* Clone /dev/hde to /dev/hdg.

* Modify the filesystem type of /dev/hd1 and /dev/hde2 to be 'ext3'
  from 'software RAID' and set the mountpoints.

* Bind the other partitions into RAID devices and set up the
  filesystems.

After that installation proceeds without any problem. However, on the
first reboot, the partition table on /dev/hde is gone. /dev/hdg still
has a valid partitioning. SO I suppose something messed up the MBR on 
/dev/hde. Grub?

'dd' of the MBR on both discs shows that the /dev/hde MBR is shifted by
16 NUL byte. Attached is a dump (blocks obtained with dd if=/dev/hde bs=512 count=1) 

I have tried to install twice with the same results. Installing w/o
RAID devices on /dev/hde only works w/o any problem.


Version-Release number of selected component (if applicable):
Red Hat ES 4.0 Update 2 installation ISO images

How reproducible:
Always

Steps to Reproduce:
1. Install as described above
2. Reboot

  

Actual Results:  /dev/hde no longer has a valid partition table.

Expected Results:  /dev/hde should have a valid partition table.

Additional info:

Comment 1 David Tonhofer 2005-11-27 17:07:03 UTC
Created attachment 121519 [details]
Dumps of the MBRs of /dev/hde (bad) and /dev/hdg (good)

Comment 2 David Tonhofer 2005-11-27 17:09:32 UTC
Created attachment 121520 [details]
Bad MBR on /dev/hde

Comment 3 David Tonhofer 2005-11-27 17:10:46 UTC
Created attachment 121521 [details]
Good MBR on /dev/hdg

Comment 4 David Tonhofer 2005-11-27 18:13:07 UTC
Uhhh...there might be something wrong with the disk. It's gone from the disk
druid menu now. Maybe one should add SMART diagnostics to Anaconda? Will check
hardware now.

Comment 5 David Tonhofer 2005-11-29 20:33:49 UTC
This problem is getting uncanny, I did a few (7) tests for which the
result was as described above, but now I have had a run of installations
in which the problem has mysteriously gone away. 

I have however checked that it is not a *fault* in the hardware (it could
be consistent error in the Promise RAID controller for example, which, even
if the RAID mode is disabled, still controls the harddisks, which is why they
are named /dev/hde and /dev/hdg): I checked both disks, moved disks to another
machine of the same type, same problematic behaviour until a few hours ago.

I know there *is* something up with the hardware as I have encountered some
problems re-reading the MBR after writing it in fdisk. 

To reiterate on the case 'MBR on /dev/hde' has been destroyed:

* Install as described above, using the RAID cloning feature to set
  up software RAID partitions on "/dev/hde" and "/dev/hdg", modify the
  partition type of the first two RAID partitions on each disk to 'ext3'
  to get a place to put /boot onto, bind the remaining partitions into 
  RAID mirrors, format etc. Later repeats say that it is unimportant 
  whether there is a swap partition or not, the problem occurs regardless.
* Proceed with installation. No problem until reboot.
* After reboot, nothing bootable can be found, i.e. /dev/hde has no longer
  a valid MBR.
* Reboot machine using RH ES 4 installation CD. Disk druid now consistently
  says that "/dev/hde" is gone (looks like it isn't installed). "/dev/hdg"
  is visible and correctly partitioned. This "disk invisiblity" problem
  may or may not be related to the MBR problem, but I suspect some hardware
  weirdness: 

  * On the console, fdisk *can* find "/dev/hde". The kernel can also find it:
    In /tmp/syslog, the last message about that drive is 
    "hde: unknown partition table". 
  * I tried to set up a new partitioning on "/dev/hde"
    with "fdisk", but when writing the partition table, I get "error 5: 
    I/O error. Kernel still uses the old table, the new table wil be used
    at the next reboot".
  * I zeroed the disk using 'autoclave' from the Ultimate Boot CD but after
    a reboot into RH ES 4 installation CD, disk druid could still not see
    the disk.
  * I wrote some binary garbage into the MBR of that disk using PTS Disk 
    Editor to check whether the MBR could be properly written. Works.
    (However, these DOS-based programs cannot be reliably started on the
    present hardware, as said there is something funny with it) 
    After a reboot into RH ES 4 installation CD, disk druid could still not see
    the disk.
  * However, an installation aborted before 'package installation' reveals
    that the MBRs are correctly written on both disks. 

Other things tried:

Installation as above, with md device cloning, but keep "/boot" installed on a
software RAID mirror. After reboot, both disks are good but the machine can't
actually boot (i.e. you get the grub command line). No real surprise.

I will do more tests, but this begins to look like some unfathomable thing.








Comment 6 David Tonhofer 2005-12-08 12:34:51 UTC
It's the hardware.

There is probably something wrong with the on-board Promise RAID controller.
(Promise Technology, Inc. PDC20265 (FastTrak100 Lite/Ultra100) (rev 02))
If I could solder it off, I would.

The problem boils down to the fact that a simple 'reboot' of the machine won't
work. After a reboot, harddisks cannot be properly read, i.e. /dev/hde seems
to have no valid bootsector and even though /dev/hdg at least shows a valid
partitioning, booting from it results in GRUB giving out unconventional 
characters then stopping after printing out "stage2". 

You actually have to "power cycle" the machine. After that, /dev/hde and 
/dev/hdg are both visible, booting works (though not off an md device) and
the system comes up nicely. Disks have been set up as described on the
2005-11-27, with the /boot on /dev/hdg instead of /dev/hde to make sure
a /boot is available.

Sigh.

It looks like the only interesting problem is: why does disk druid declare
that the first harddisk does not exist, even if it does?

Well, I guess you may close this bug.



Comment 7 David Tonhofer 2007-10-27 20:49:58 UTC
Repeat query for resolving bug (NOTABUG)

Comment 8 Denise Dumas 2008-07-24 17:56:48 UTC
Per final comment, and at reporter's suggestion, we're finally closing this nonbug.