Bug 37281

Summary: Dropping Disk Geometry of RAID Drives on Reboot
Product: [Retired] Red Hat Linux Reporter: Calvin Webster <cwebster>
Component: util-linuxAssignee: Elliot Lee <sopwith>
Status: CLOSED NOTABUG QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: high    
Version: 6.2   
Target Milestone: ---   
Target Release: ---   
Hardware: sparc   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2001-07-16 13:32:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Calvin Webster 2001-04-23 23:49:55 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)


Successfully installed and configured raid level 5 on 6-drive array.
Successfully mounted and performed read-write operations on raid.
No errors during configuration or use of raid.

On reboot, raid failed to initialize. "cat /proc/mdstat" reveals some of 
the drives are not available.
Stopped raid.
Attempt to examine with "fdisk" results in error message: "Device contains 
neither a valid DOS partition table, nor Sun or SGI disklabel
Building a new sun disklabel..."
Rebuilt the partition tables from drive specs (see "Disk Geometry" 
in "Additional Information").
Reconfigured raid and tested as above without errors (See "Contents 
of /etc/raidtab" in "Additional Information").

Next reboot resulted in more corrupted geometry blocks. This happens on 
every reboot, but not necessarily with the same drives each time. Each 
time the disk geometry is rebuilt and written to disk and the raid is re-
created and tested without errors.

See "Additional Information" for platform data and details.

Reproducible: Always
Steps to Reproduce:
1. Deleted Sun default partitions and created single partition of 
type "Linux raid autodetect" (ID: fd) on each drive using disk geometry 
settings in "Additional Information".
2. Setup Level 5 RAID config by creating "/etc/raidtab" in "Additional 
Information".
3. Create RAID using "mkraid --really-force /dev/md0".
4. Check RAID status using "cat /proc/mdstat".
5. Create filesystem for RAID with striping using "mke2fs -b 4096 -R 
stride=32 /dev/md0".
6. Mount RAID in local filesystem using "mount -t 
ext2 /dev/md0 /usr/local/archive".
7. Wrote data to RAID and retrieved reliably multiple times over extended 
period.
8. Un-mounted and re-mounted RAID device and repeated read/write
operations.
9. Rebooted (maintaining power on system)

Actual Results:  Raid failed to initialize.
/var/log/messages and /proc/mdstat shows failed drive(s).


Expected Results:  Raid initializes, successfully mounts and retains data.

Vendor (Radiant Resources) says their Sun Tech Support denies it could be 
hardware related.
Vendor does not support Linux OS.

-----------------------------------------------------------
Hardware Platform (purchased from from "Radiant Resources"):
-----------------------------------------------------------

Motherboard: SPARCengine Ultra AXe
Expansion: Dual Channel SE Ultra SCSI, PCI
RAM: 256 MB, DRAM, 168-pin DIM, EDO, ECC
Storage:
Internal: 13 GB IDE
External: SE, Ultra SCSI: 109 GB Array (6-18GB Fujitsu)

CPU is a 1U rack-mount chassis with dual external Ultra SCSI.
Array is a 4U stand-alone chassis with 6 - 18GB Fujitsu drives.
-----------------------------------------------------------

-----------------------------------------------------------
# uname -a
Linux winggear 2.2.14-5.0 #1 Tue Mar 7 21:50:41 EST 2000 sparc64 unknown
-----------------------------------------------------------
raidtools Version 0.90.0 Release 6
-----------------------------------------------------------

-------------
Disk Geometry:
-------------
Heads:          19
Sectors/track:  248
Cylinders:      7506
Alt Cyls:       2    (default)
Phys Cyls:      7508 (default)
Rotation Spd:   7200
Interleave:     1    (default)
Extra sec/cyl:  0    (default)
-------------

------------------------
Contents of /etc/raidtab:
------------------------
#
# 'persistent' RAID5 setup, with one spare disk:
#
raiddev /dev/md0
    raid-level                5
    nr-raid-disks             5
    nr-spare-disks            1
    persistent-superblock     1
    chunk-size                128
 
    device                    /dev/sda1
    raid-disk                 0
    device                    /dev/sdb1
    raid-disk                 1
    device                    /dev/sdc1
    raid-disk                 2
    device                    /dev/sdd1
    raid-disk                 3
    device                    /dev/sde1
    raid-disk                 4
    device                    /dev/sdf1
    spare-disk                0
------------------------

--------------------------------------------------------
Contents of "/etc/sysconfig/hwconfig" (hardware profile):
--------------------------------------------------------
-
class: OTHER
bus: PCI
detached: 0
driver: unknown
desc: "Sun|Ultra IIi"
vendorId: 108e
deviceId: a000
pciType: 1
-
class: OTHER
bus: PCI
detached: 0
driver: ignore
desc: "Sun|Simba Advanced PCI Bridge"
vendorId: 108e
deviceId: 5000
pciType: 1
-
class: OTHER
bus: PCI
detached: 0
driver: ignore
desc: "Sun|Simba Advanced PCI Bridge"
vendorId: 108e
deviceId: 5000
pciType: 1
-
class: OTHER
bus: PCI
detached: 0
driver: unknown
desc: "DEC|DECchip 21152"
vendorId: 1011
deviceId: 0024
pciType: 1
-
class: OTHER
bus: PCI
detached: 0
driver: unknown
desc: "Sun|EBUS"
vendorId: 108e
deviceId: 1000
pciType: 1
-
class: OTHER
bus: PCI
detached: 0
driver: unknown
desc: "CMD Technology Inc|PCI0646"
vendorId: 1095
deviceId: 0646
pciType: 1
-
class: NETWORK
bus: PCI
detached: 0
device: eth
driver: sunhme
desc: "Sun|Happy Meal"
vendorId: 108e
deviceId: 1001
pciType: 1
-
class: SCSI
bus: PCI
detached: 0
driver: sym53c8xx
desc: "Symbios|53c875"
vendorId: 1000
deviceId: 000f
pciType: 1
-
class: SCSI
bus: PCI
detached: 0
driver: sym53c8xx
desc: "Symbios|53c875"
vendorId: 1000
deviceId: 000f
pciType: 1
-
class: VIDEO
bus: PCI
detached: 0
device: fb0
driver: Server:Mach64
desc: "ATI|3D Rage Pro 215GP"
vendorId: 1002
deviceId: 4750
pciType: 1
-
class: AUDIO
bus: SBUS
detached: 0
driver: cs4231
desc: "CS4231 EB2 DMA (PCI)"
width: 0
height: 0
freq: 0
monitor: 0
-
class: MOUSE
bus: PSAUX
detached: 0
device: psaux
driver: genericps/2
desc: "Generic PS/2 Mouse"
-
class: CDROM
bus: IDE
detached: 0
device: hdc
driver: ignore
desc: "CD-224E"
-
class: HD
bus: IDE
detached: 0
device: hda
driver: ignore
desc: "IBM-DTLA-307020"
physical: 16383/15/63
logical: 42528/15/63
-
class: HD
bus: SCSI
detached: 0
device: sda
driver: ignore
desc: "Fujitsu MAA3182S SUN18G"
host: 0
id: 0
channel: 0
lun: 0
-
class: HD
bus: SCSI
detached: 0
device: sdb
driver: ignore
desc: "Fujitsu MAA3182S SUN18G"
host: 0
id: 1
channel: 0
lun: 0
-
class: HD
bus: SCSI
detached: 0
device: sdc
driver: ignore
desc: "Fujitsu MAA3182S SUN18G"
host: 0
id: 2
channel: 0
lun: 0
-
class: HD
bus: SCSI
detached: 0
device: sdd
driver: ignore
desc: "Fujitsu MAA3182S SUN18G"
host: 0
id: 3
channel: 0
lun: 0
-
class: HD
bus: SCSI
detached: 0
device: sde
driver: ignore
desc: "Fujitsu MAA3182S SUN18G"
host: 0
id: 4
channel: 0
lun: 0
-
class: HD
bus: SCSI
detached: 0
device: sdf
driver: ignore
desc: "Fujitsu MAA3182S SUN18G"
host: 0
id: 5
channel: 0
lun: 0
-
class: KEYBOARD
bus: KEYBOARD
detached: 0
driver: ignore
desc: "Generic PS/2 Keyboard"
-------------------------------------

Comment 1 Calvin Webster 2001-04-24 20:27:55 UTC
[04-24-2001]

1. Found this possibly related FAQ on the European redhat site:

http://www.europe.redhat.com/documentation/HOWTO/Software-RAID-0.4x-HOWTO-5.php3

Q: I can't make md work with partitions on our latest SPARCstation 5. I suspect 
that this has something to do with disk-labels. 

A: Sun disk-labels sit in the first 1K of a partition. For RAID-1, the Sun disk-
label is not an issue since ext2fs will skip the label on every mirror. For 
other raid levels (0, linear and 4/5), this appears to be a problem; it has not 
yet (Dec 97) been addressed. 

2. Upgraded kernel to 2.2.19-6.2.1 then repartitioned with additional 3rd 
(whole) partition. Re-created raid, re-made filesystem and rebooted. Drops disk 
geometry on sdb, sdc, and sdd. Rebuilt raid and rebooted three times, each 
losing sdb, sdc, and sdd.

Here are relevant entries in /var/log/messages:

#######################
# Begin Boot Messages #
#######################

Apr 24 18:25:06 winggear kernel: SCSI device sda: hdwr sector= 512 bytes. 
Sectors= 35378533 [17274 MB] [17.3 GB]
Apr 24 18:25:06 winggear kernel:  sda: sda1
Apr 24 18:25:06 winggear kernel: sym53c875-0-<1,*>: FAST-20 WIDE SCSI 40.0 MB/s
(50 ns, offset 16)
Apr 24 18:25:06 winggear kernel: SCSI device sdb: hdwr sector= 512 bytes. 
Sectors= 35378533 [17274 MB] [17.3 GB]
Apr 24 18:25:06 winggear kernel:  sdb: unknown partition table
Apr 24 18:25:06 winggear kernel: sym53c875-0-<2,*>: FAST-20 WIDE SCSI 40.0 MB/s
(50 ns, offset 16)
Apr 24 18:25:06 winggear kernel: SCSI device sdc: hdwr sector= 512 bytes. 
Sectors= 35378533 [17274 MB] [17.3 GB]
Apr 24 18:25:06 winggear kernel:  sdc: unknown partition table
Apr 24 18:25:06 winggear kernel: sym53c875-0-<3,*>: FAST-20 WIDE SCSI 40.0 MB/s
(50 ns, offset 16)
Apr 24 18:25:06 winggear kernel: SCSI device sdd: hdwr sector= 512 bytes. 
Sectors= 35378533 [17274 MB] [17.3 GB]
Apr 24 18:25:06 winggear kernel:  sdd: unknown partition table
Apr 24 18:25:06 winggear kernel: sym53c875-0-<4,*>: FAST-20 WIDE SCSI 40.0 MB/s
(50 ns, offset 16)
Apr 24 18:25:06 winggear kernel: SCSI device sde: hdwr sector= 512 bytes. 
Sectors= 35378533 [17274 MB] [17.3 GB]
Apr 24 18:25:06 winggear kernel:  sde: sde1
Apr 24 18:25:06 winggear kernel: sym53c875-0-<5,*>: FAST-20 WIDE SCSI 40.0 MB/s
(50 ns, offset 16)
Apr 24 18:25:06 winggear kernel: SCSI device sdf: hdwr sector= 512 bytes. 
Sectors= 35378533 [17274 MB] [17.3 GB]
Apr 24 18:25:06 winggear kernel:  sdf: sdf1
Apr 24 18:25:06 winggear kernel: (read) sda1's sb offset: 17684032 [events: 
00000002]
Apr 24 18:25:06 winggear kernel: blkdev_open() failed: -6
Apr 24 18:25:06 winggear kernel: md: could not lock sdb1, zero-size? Marking 
faulty.
Apr 24 18:25:06 winggear kernel: could not import sdb1, trying to run array 
nevertheless.
Apr 24 18:25:06 winggear kernel: blkdev_open() failed: -6
Apr 24 18:25:06 winggear kernel: md: could not lock sdc1, zero-size? Marking 
faulty.
Apr 24 18:25:06 winggear kernel: could not import sdc1, trying to run array 
nevertheless.
Apr 24 18:25:06 winggear kernel: blkdev_open() failed: -6
Apr 24 18:25:06 winggear kernel: md: could not lock sdd1, zero-size? Marking 
faulty.
Apr 24 18:25:06 winggear kernel: could not import sdd1, trying to run array 
nevertheless.
Apr 24 18:25:06 winggear kernel: (read) sde1's sb offset: 17684032 [events: 
00000002]
Apr 24 18:25:06 winggear kernel: (read) sdf1's sb offset: 17684032 [events: 
00000002]
Apr 24 18:25:06 winggear kernel: autorun ...
Apr 24 18:25:06 winggear kernel: considering sdf1 ...
Apr 24 18:25:06 winggear kernel:   adding sdf1 ...
Apr 24 18:25:06 winggear kernel:   adding sde1 ...
Apr 24 18:25:06 winggear kernel:   adding sda1 ...
Apr 24 18:25:06 winggear kernel: created md0
Apr 24 18:25:06 winggear kernel: bind<sda1,1>
Apr 24 18:25:06 winggear kernel: bind<sde1,2>
Apr 24 18:25:06 winggear kernel: bind<sdf1,3>
Apr 24 18:25:06 winggear kernel: running: <sdf1><sde1><sda1>
Apr 24 18:25:06 winggear kernel: now!
Apr 24 18:25:06 winggear kernel: sdf1's event counter: 00000002
Apr 24 18:25:06 winggear kernel: sde1's event counter: 00000002
Apr 24 18:25:06 winggear kernel: sda1's event counter: 00000002
Apr 24 18:25:06 winggear kernel: md0: former device sdb1 is unavailable, 
removing from array!
Apr 24 18:25:06 winggear kernel: md0: former device sdc1 is unavailable, 
removing from array!
Apr 24 18:25:06 winggear kernel: md0: former device sdd1 is unavailable, 
removing from array!
Apr 24 18:25:06 winggear kernel: raid5 personality registered
Apr 24 18:25:06 winggear kernel: md0: max total readahead window set to 2048k
Apr 24 18:25:06 winggear kernel: md0: 4 data-disks, max readahead per data-
disk: 512k
Apr 24 18:25:06 winggear kernel: raid5: spare disk sdf1
Apr 24 18:25:06 winggear kernel: raid5: device sde1 operational as raid disk 4
Apr 24 18:25:06 winggear kernel: raid5: device sda1 operational as raid disk 0
Apr 24 18:25:06 winggear kernel: raid5: not enough operational devices for md0 
(3/5 failed)
Apr 24 18:25:06 winggear kernel: RAID5 conf printout:
Apr 24 18:25:06 winggear kernel:  --- rd:5 wd:2 fd:3
Apr 24 18:25:06 winggear kernel:  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1
Apr 24 18:25:06 winggear kernel:  disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 
00:00]
Apr 24 18:25:06 winggear kernel:  disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 
00:00]
Apr 24 18:25:06 winggear kernel:  disk 3, s:0, o:0, n:3 rd:3 us:1 dev:[dev 
00:00]
Apr 24 18:25:06 winggear kernel:  disk 4, s:0, o:1, n:4 rd:4 us:1 dev:sde1
Apr 24 18:25:06 winggear kernel:  disk 5, s:1, o:0, n:5 rd:5 us:1 dev:sdf1
Apr 24 18:25:06 winggear kernel:  disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Apr 24 18:25:06 winggear kernel:  disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Apr 24 18:25:06 winggear kernel:  disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Apr 24 18:25:06 winggear kernel:  disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Apr 24 18:25:06 winggear kernel: disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Apr 24 18:25:06 winggear kernel:  disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 
00:00]
Apr 24 18:25:06 winggear kernel: raid5: failed to run raid set md0
Apr 24 18:25:06 winggear kernel: pers->run() failed ...
Apr 24 18:25:06 winggear kernel: do_md_run() returned -22
Apr 24 18:25:06 winggear kernel: unbind<sdf1,2>
Apr 24 18:25:06 winggear kernel: export_rdev(sdf1)
Apr 24 18:25:06 winggear kernel: unbind<sde1,1>
Apr 24 18:25:06 winggear kernel: export_rdev(sde1)
Apr 24 18:25:06 winggear kernel: unbind<sda1,0>
Apr 24 18:25:06 winggear kernel: export_rdev(sda1)
Apr 24 18:25:06 winggear kernel: md0 stopped.
Apr 24 18:25:06 winggear kernel: ... autorun DONE.
#####################
# End Boot Messages #
#####################


Comment 2 Calvin Webster 2001-04-25 14:58:28 UTC
Bug is not in "raidtools", but in "fdisk" built-in defaults.

Found this comment in the man page for "fdisk":

"Do not start a partition that  actually  uses  its  first  sector (like  a  
swap  partition)  at cylinder 0, since that will destroy the disklabel."

Apparently, "fdisk" doesn't follow its own advice when creating default 
partitions. The 1st partition's default starting cylinder is always "0".

Overriding the default, I re-configured the 1st partition to use cylinders 1-
7506 instead of 0-7506. Disk label containing geometry information and 
partition tables no longer gets corrupted on restart.

Comment 3 Elliot Lee 2001-07-18 18:10:23 UTC
Sorry, AFAICS this behaviour can't be forced by fdisk since it is highly
dependant on the type of partition table being used. It's basically a case of
just reading the docs properly. :(