From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0) Successfully installed and configured raid level 5 on 6-drive array. Successfully mounted and performed read-write operations on raid. No errors during configuration or use of raid. On reboot, raid failed to initialize. "cat /proc/mdstat" reveals some of the drives are not available. Stopped raid. Attempt to examine with "fdisk" results in error message: "Device contains neither a valid DOS partition table, nor Sun or SGI disklabel Building a new sun disklabel..." Rebuilt the partition tables from drive specs (see "Disk Geometry" in "Additional Information"). Reconfigured raid and tested as above without errors (See "Contents of /etc/raidtab" in "Additional Information"). Next reboot resulted in more corrupted geometry blocks. This happens on every reboot, but not necessarily with the same drives each time. Each time the disk geometry is rebuilt and written to disk and the raid is re- created and tested without errors. See "Additional Information" for platform data and details. Reproducible: Always Steps to Reproduce: 1. Deleted Sun default partitions and created single partition of type "Linux raid autodetect" (ID: fd) on each drive using disk geometry settings in "Additional Information". 2. Setup Level 5 RAID config by creating "/etc/raidtab" in "Additional Information". 3. Create RAID using "mkraid --really-force /dev/md0". 4. Check RAID status using "cat /proc/mdstat". 5. Create filesystem for RAID with striping using "mke2fs -b 4096 -R stride=32 /dev/md0". 6. Mount RAID in local filesystem using "mount -t ext2 /dev/md0 /usr/local/archive". 7. Wrote data to RAID and retrieved reliably multiple times over extended period. 8. Un-mounted and re-mounted RAID device and repeated read/write operations. 9. Rebooted (maintaining power on system) Actual Results: Raid failed to initialize. /var/log/messages and /proc/mdstat shows failed drive(s). Expected Results: Raid initializes, successfully mounts and retains data. Vendor (Radiant Resources) says their Sun Tech Support denies it could be hardware related. Vendor does not support Linux OS. ----------------------------------------------------------- Hardware Platform (purchased from from "Radiant Resources"): ----------------------------------------------------------- Motherboard: SPARCengine Ultra AXe Expansion: Dual Channel SE Ultra SCSI, PCI RAM: 256 MB, DRAM, 168-pin DIM, EDO, ECC Storage: Internal: 13 GB IDE External: SE, Ultra SCSI: 109 GB Array (6-18GB Fujitsu) CPU is a 1U rack-mount chassis with dual external Ultra SCSI. Array is a 4U stand-alone chassis with 6 - 18GB Fujitsu drives. ----------------------------------------------------------- ----------------------------------------------------------- # uname -a Linux winggear 2.2.14-5.0 #1 Tue Mar 7 21:50:41 EST 2000 sparc64 unknown ----------------------------------------------------------- raidtools Version 0.90.0 Release 6 ----------------------------------------------------------- ------------- Disk Geometry: ------------- Heads: 19 Sectors/track: 248 Cylinders: 7506 Alt Cyls: 2 (default) Phys Cyls: 7508 (default) Rotation Spd: 7200 Interleave: 1 (default) Extra sec/cyl: 0 (default) ------------- ------------------------ Contents of /etc/raidtab: ------------------------ # # 'persistent' RAID5 setup, with one spare disk: # raiddev /dev/md0 raid-level 5 nr-raid-disks 5 nr-spare-disks 1 persistent-superblock 1 chunk-size 128 device /dev/sda1 raid-disk 0 device /dev/sdb1 raid-disk 1 device /dev/sdc1 raid-disk 2 device /dev/sdd1 raid-disk 3 device /dev/sde1 raid-disk 4 device /dev/sdf1 spare-disk 0 ------------------------ -------------------------------------------------------- Contents of "/etc/sysconfig/hwconfig" (hardware profile): -------------------------------------------------------- - class: OTHER bus: PCI detached: 0 driver: unknown desc: "Sun|Ultra IIi" vendorId: 108e deviceId: a000 pciType: 1 - class: OTHER bus: PCI detached: 0 driver: ignore desc: "Sun|Simba Advanced PCI Bridge" vendorId: 108e deviceId: 5000 pciType: 1 - class: OTHER bus: PCI detached: 0 driver: ignore desc: "Sun|Simba Advanced PCI Bridge" vendorId: 108e deviceId: 5000 pciType: 1 - class: OTHER bus: PCI detached: 0 driver: unknown desc: "DEC|DECchip 21152" vendorId: 1011 deviceId: 0024 pciType: 1 - class: OTHER bus: PCI detached: 0 driver: unknown desc: "Sun|EBUS" vendorId: 108e deviceId: 1000 pciType: 1 - class: OTHER bus: PCI detached: 0 driver: unknown desc: "CMD Technology Inc|PCI0646" vendorId: 1095 deviceId: 0646 pciType: 1 - class: NETWORK bus: PCI detached: 0 device: eth driver: sunhme desc: "Sun|Happy Meal" vendorId: 108e deviceId: 1001 pciType: 1 - class: SCSI bus: PCI detached: 0 driver: sym53c8xx desc: "Symbios|53c875" vendorId: 1000 deviceId: 000f pciType: 1 - class: SCSI bus: PCI detached: 0 driver: sym53c8xx desc: "Symbios|53c875" vendorId: 1000 deviceId: 000f pciType: 1 - class: VIDEO bus: PCI detached: 0 device: fb0 driver: Server:Mach64 desc: "ATI|3D Rage Pro 215GP" vendorId: 1002 deviceId: 4750 pciType: 1 - class: AUDIO bus: SBUS detached: 0 driver: cs4231 desc: "CS4231 EB2 DMA (PCI)" width: 0 height: 0 freq: 0 monitor: 0 - class: MOUSE bus: PSAUX detached: 0 device: psaux driver: genericps/2 desc: "Generic PS/2 Mouse" - class: CDROM bus: IDE detached: 0 device: hdc driver: ignore desc: "CD-224E" - class: HD bus: IDE detached: 0 device: hda driver: ignore desc: "IBM-DTLA-307020" physical: 16383/15/63 logical: 42528/15/63 - class: HD bus: SCSI detached: 0 device: sda driver: ignore desc: "Fujitsu MAA3182S SUN18G" host: 0 id: 0 channel: 0 lun: 0 - class: HD bus: SCSI detached: 0 device: sdb driver: ignore desc: "Fujitsu MAA3182S SUN18G" host: 0 id: 1 channel: 0 lun: 0 - class: HD bus: SCSI detached: 0 device: sdc driver: ignore desc: "Fujitsu MAA3182S SUN18G" host: 0 id: 2 channel: 0 lun: 0 - class: HD bus: SCSI detached: 0 device: sdd driver: ignore desc: "Fujitsu MAA3182S SUN18G" host: 0 id: 3 channel: 0 lun: 0 - class: HD bus: SCSI detached: 0 device: sde driver: ignore desc: "Fujitsu MAA3182S SUN18G" host: 0 id: 4 channel: 0 lun: 0 - class: HD bus: SCSI detached: 0 device: sdf driver: ignore desc: "Fujitsu MAA3182S SUN18G" host: 0 id: 5 channel: 0 lun: 0 - class: KEYBOARD bus: KEYBOARD detached: 0 driver: ignore desc: "Generic PS/2 Keyboard" -------------------------------------
[04-24-2001] 1. Found this possibly related FAQ on the European redhat site: http://www.europe.redhat.com/documentation/HOWTO/Software-RAID-0.4x-HOWTO-5.php3 Q: I can't make md work with partitions on our latest SPARCstation 5. I suspect that this has something to do with disk-labels. A: Sun disk-labels sit in the first 1K of a partition. For RAID-1, the Sun disk- label is not an issue since ext2fs will skip the label on every mirror. For other raid levels (0, linear and 4/5), this appears to be a problem; it has not yet (Dec 97) been addressed. 2. Upgraded kernel to 2.2.19-6.2.1 then repartitioned with additional 3rd (whole) partition. Re-created raid, re-made filesystem and rebooted. Drops disk geometry on sdb, sdc, and sdd. Rebuilt raid and rebooted three times, each losing sdb, sdc, and sdd. Here are relevant entries in /var/log/messages: ####################### # Begin Boot Messages # ####################### Apr 24 18:25:06 winggear kernel: SCSI device sda: hdwr sector= 512 bytes. Sectors= 35378533 [17274 MB] [17.3 GB] Apr 24 18:25:06 winggear kernel: sda: sda1 Apr 24 18:25:06 winggear kernel: sym53c875-0-<1,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16) Apr 24 18:25:06 winggear kernel: SCSI device sdb: hdwr sector= 512 bytes. Sectors= 35378533 [17274 MB] [17.3 GB] Apr 24 18:25:06 winggear kernel: sdb: unknown partition table Apr 24 18:25:06 winggear kernel: sym53c875-0-<2,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16) Apr 24 18:25:06 winggear kernel: SCSI device sdc: hdwr sector= 512 bytes. Sectors= 35378533 [17274 MB] [17.3 GB] Apr 24 18:25:06 winggear kernel: sdc: unknown partition table Apr 24 18:25:06 winggear kernel: sym53c875-0-<3,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16) Apr 24 18:25:06 winggear kernel: SCSI device sdd: hdwr sector= 512 bytes. Sectors= 35378533 [17274 MB] [17.3 GB] Apr 24 18:25:06 winggear kernel: sdd: unknown partition table Apr 24 18:25:06 winggear kernel: sym53c875-0-<4,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16) Apr 24 18:25:06 winggear kernel: SCSI device sde: hdwr sector= 512 bytes. Sectors= 35378533 [17274 MB] [17.3 GB] Apr 24 18:25:06 winggear kernel: sde: sde1 Apr 24 18:25:06 winggear kernel: sym53c875-0-<5,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 16) Apr 24 18:25:06 winggear kernel: SCSI device sdf: hdwr sector= 512 bytes. Sectors= 35378533 [17274 MB] [17.3 GB] Apr 24 18:25:06 winggear kernel: sdf: sdf1 Apr 24 18:25:06 winggear kernel: (read) sda1's sb offset: 17684032 [events: 00000002] Apr 24 18:25:06 winggear kernel: blkdev_open() failed: -6 Apr 24 18:25:06 winggear kernel: md: could not lock sdb1, zero-size? Marking faulty. Apr 24 18:25:06 winggear kernel: could not import sdb1, trying to run array nevertheless. Apr 24 18:25:06 winggear kernel: blkdev_open() failed: -6 Apr 24 18:25:06 winggear kernel: md: could not lock sdc1, zero-size? Marking faulty. Apr 24 18:25:06 winggear kernel: could not import sdc1, trying to run array nevertheless. Apr 24 18:25:06 winggear kernel: blkdev_open() failed: -6 Apr 24 18:25:06 winggear kernel: md: could not lock sdd1, zero-size? Marking faulty. Apr 24 18:25:06 winggear kernel: could not import sdd1, trying to run array nevertheless. Apr 24 18:25:06 winggear kernel: (read) sde1's sb offset: 17684032 [events: 00000002] Apr 24 18:25:06 winggear kernel: (read) sdf1's sb offset: 17684032 [events: 00000002] Apr 24 18:25:06 winggear kernel: autorun ... Apr 24 18:25:06 winggear kernel: considering sdf1 ... Apr 24 18:25:06 winggear kernel: adding sdf1 ... Apr 24 18:25:06 winggear kernel: adding sde1 ... Apr 24 18:25:06 winggear kernel: adding sda1 ... Apr 24 18:25:06 winggear kernel: created md0 Apr 24 18:25:06 winggear kernel: bind<sda1,1> Apr 24 18:25:06 winggear kernel: bind<sde1,2> Apr 24 18:25:06 winggear kernel: bind<sdf1,3> Apr 24 18:25:06 winggear kernel: running: <sdf1><sde1><sda1> Apr 24 18:25:06 winggear kernel: now! Apr 24 18:25:06 winggear kernel: sdf1's event counter: 00000002 Apr 24 18:25:06 winggear kernel: sde1's event counter: 00000002 Apr 24 18:25:06 winggear kernel: sda1's event counter: 00000002 Apr 24 18:25:06 winggear kernel: md0: former device sdb1 is unavailable, removing from array! Apr 24 18:25:06 winggear kernel: md0: former device sdc1 is unavailable, removing from array! Apr 24 18:25:06 winggear kernel: md0: former device sdd1 is unavailable, removing from array! Apr 24 18:25:06 winggear kernel: raid5 personality registered Apr 24 18:25:06 winggear kernel: md0: max total readahead window set to 2048k Apr 24 18:25:06 winggear kernel: md0: 4 data-disks, max readahead per data- disk: 512k Apr 24 18:25:06 winggear kernel: raid5: spare disk sdf1 Apr 24 18:25:06 winggear kernel: raid5: device sde1 operational as raid disk 4 Apr 24 18:25:06 winggear kernel: raid5: device sda1 operational as raid disk 0 Apr 24 18:25:06 winggear kernel: raid5: not enough operational devices for md0 (3/5 failed) Apr 24 18:25:06 winggear kernel: RAID5 conf printout: Apr 24 18:25:06 winggear kernel: --- rd:5 wd:2 fd:3 Apr 24 18:25:06 winggear kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda1 Apr 24 18:25:06 winggear kernel: disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 00:00] Apr 24 18:25:06 winggear kernel: disk 2, s:0, o:0, n:2 rd:2 us:1 dev:[dev 00:00] Apr 24 18:25:06 winggear kernel: disk 3, s:0, o:0, n:3 rd:3 us:1 dev:[dev 00:00] Apr 24 18:25:06 winggear kernel: disk 4, s:0, o:1, n:4 rd:4 us:1 dev:sde1 Apr 24 18:25:06 winggear kernel: disk 5, s:1, o:0, n:5 rd:5 us:1 dev:sdf1 Apr 24 18:25:06 winggear kernel: disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Apr 24 18:25:06 winggear kernel: disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Apr 24 18:25:06 winggear kernel: disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Apr 24 18:25:06 winggear kernel: disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Apr 24 18:25:06 winggear kernel: disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Apr 24 18:25:06 winggear kernel: disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Apr 24 18:25:06 winggear kernel: raid5: failed to run raid set md0 Apr 24 18:25:06 winggear kernel: pers->run() failed ... Apr 24 18:25:06 winggear kernel: do_md_run() returned -22 Apr 24 18:25:06 winggear kernel: unbind<sdf1,2> Apr 24 18:25:06 winggear kernel: export_rdev(sdf1) Apr 24 18:25:06 winggear kernel: unbind<sde1,1> Apr 24 18:25:06 winggear kernel: export_rdev(sde1) Apr 24 18:25:06 winggear kernel: unbind<sda1,0> Apr 24 18:25:06 winggear kernel: export_rdev(sda1) Apr 24 18:25:06 winggear kernel: md0 stopped. Apr 24 18:25:06 winggear kernel: ... autorun DONE. ##################### # End Boot Messages # #####################
Bug is not in "raidtools", but in "fdisk" built-in defaults. Found this comment in the man page for "fdisk": "Do not start a partition that actually uses its first sector (like a swap partition) at cylinder 0, since that will destroy the disklabel." Apparently, "fdisk" doesn't follow its own advice when creating default partitions. The 1st partition's default starting cylinder is always "0". Overriding the default, I re-configured the 1st partition to use cylinders 1- 7506 instead of 0-7506. Disk label containing geometry information and partition tables no longer gets corrupted on restart.
Sorry, AFAICS this behaviour can't be forced by fdisk since it is highly dependant on the type of partition table being used. It's basically a case of just reading the docs properly. :(