Description of problem: Trying to set up a new system using F10 live CD. Two SATA drives are to be arranged as follows: sda ----+---->sda1----->/boot +---->sda2----->software RAID mirror part 1 (md1) sdb ----+---->sdb1----->/bootspare +---->sdb2----->software RAID mirror part 2 (md1) the software RAID mirror will be filled with an LVM Logical Volume which will then be filled with swap, root, etc.. This works, however I had some trouble with other setups (see bug#478751), so I decided on the present one. Should be straigtforward, but surprises await. Anaconda sets up the partitions, and starts formatting, but then breaks off with the message: "An error occurred trying to format sdb1. This problem is serious, and the install cannot continue. Press <Enter> to exit the installer." ...this is not very informative (could a perror() call help?), but looking at anaconda's log file, the last command seems to be: 21:53:23 INFO : formatting /bootspare as ext3 21:53:23 INFO : Format command: ['mke2fs', '/dev/sdb1', '-t', 'ext3'] Trying mke2fs /dev/sdb1 -t ext3 [root@localhost ~]# mke2fs /dev/sdb1 -t ext3 mke2fs 1.41.3 (12-Oct-2008) /dev/sdb1 is apparently in use by the system; will not make a filesystem here! The first time I saw this, I though there might be some RAID magic leftover from an earlier installation, so I tried this: dd if=/dev/zero of=/dev/sdb1 count=200 But it didn't help. The status of the software RAID md1 (there is no md0) is: --------------------- [root@localhost ~]# mdadm --detail /dev/md1 /dev/md1: Version : 0.90 Creation Time : Sun Jan 4 18:49:07 2009 Raid Level : raid1 Array Size : 156087424 (148.86 GiB 159.83 GB) Used Dev Size : 156087424 (148.86 GiB 159.83 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Sun Jan 4 21:56:58 2009 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 52de844c:d5bac52d:7e2a17b6:9576b63d Events : 0.12 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 18 1 active sync /dev/sdb2 --------------------- Version-Release number of selected component (if applicable): anaconda-11.4.1.62-1.x86_64 How reproducible: Encountered twice in a row, with reboot in between.
Created attachment 328153 [details] Anaconda's log
Created attachment 328154 [details] Anaconda's not-very-helpful message ("Computer says no")
Tried again, leaving the /dev/sdb1 alone, with no reboot in between. Installation seems to have worked this time.
Installation worked well, but the resulting RAID array is only moderately successful for unknown reasons. The single RAID array is blown from the get go: Jan 4 23:48:56 gefjun kernel: md: raid6 personality registered for level 6 Jan 4 23:48:56 gefjun kernel: md: raid5 personality registered for level 5 Jan 4 23:48:56 gefjun kernel: md: raid4 personality registered for level 4 Jan 4 23:48:56 gefjun kernel: md: md1 stopped. Jan 4 23:48:56 gefjun kernel: md: bind<sda2> Jan 4 23:48:56 gefjun kernel: md: bind<sdb2> Jan 4 23:48:56 gefjun kernel: md: kicking non-fresh sda2 from array! Jan 4 23:48:56 gefjun kernel: md: unbind<sda2> Jan 4 23:48:56 gefjun kernel: md: export_rdev(sda2) Jan 4 23:48:56 gefjun kernel: raid1: raid set md1 active with 1 out of 2 mirrors Ok, so that is ANOTHER problem. On the other hand, the system still holds fast on /dev/sdb1 (though how exactly I don't know, it isn't mounted for sure): [root@gefjun log]# mke2fs /dev/sdb1 -t ext3 mke2fs 1.41.3 (12-Oct-2008) /dev/sdb1 is apparently in use by the system; will not make a filesystem here! [root@gefjun log]# fdisk -l Disk /dev/sda: 160.0 GB, 160041885696 bytes 255 heads, 63 sectors/track, 19457 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x000c622d Device Boot Start End Blocks Id System /dev/sda1 * 1 25 200781 83 Linux /dev/sda2 26 19457 156087540 fd Linux raid autodetect Disk /dev/sdb: 160.0 GB, 160041885696 bytes 255 heads, 63 sectors/track, 19457 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x9cd19cd1 Device Boot Start End Blocks Id System /dev/sdb1 * 1 25 200781 83 Linux /dev/sdb2 26 19457 156087540 fd Linux raid autodetect
Looking further, one half of the RAID seems to somehow have been shanghaied into a RAID array that no-one ordered. I wonder how that is possible: [root@gefjun log]# mdadm --detail /dev/md_d1 mdadm: Unknown keyword mdadm: /dev/md_d1: Version : 0.90 Creation Time : Mon Jan 5 00:49:07 2009 Raid Level : raid1 Array Size : 156087424 (148.86 GiB 159.83 GB) Used Dev Size : 156087424 (148.86 GiB 159.83 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Mon Jan 5 03:36:04 2009 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 52de844c:d5bac52d:7e2a17b6:9576b63d Events : 0.10 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 0 0 1 removed But enough of this! [root@gefjun log]# mdadm --manage --stop /dev/md_d1 mdadm: Unknown keyword mdadm: mdadm: stopped /dev/md_d1 [root@gefjun log]# mdadm --manage /dev/md1 --add /dev/sda2 mdadm: Unknown keyword mdadm: mdadm: re-added /dev/sda2 [root@gefjun log]# mdadm --detail /dev/md1 mdadm: Unknown keyword mdadm: /dev/md1: Version : 0.90 Creation Time : Mon Jan 5 00:49:07 2009 Raid Level : raid1 Array Size : 156087424 (148.86 GiB 159.83 GB) Used Dev Size : 156087424 (148.86 GiB 159.83 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Sun Jan 4 23:39:08 2009 State : active, degraded, recovering Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Rebuild Status : 0% complete UUID : 52de844c:d5bac52d:7e2a17b6:9576b63d Events : 0.2633 Number Major Minor RaidDevice State 2 8 2 0 spare rebuilding /dev/sda2 1 8 18 1 active sync /dev/sdb2
I wonder if this error message from the log file has something to do with it: 21:50:57 ERROR : raid set inconsistency for md1: all drives in this raid set do not agree on raid parameters. Skipping raid device
Update on this issue. Even though the machine is now happily running, /dev/sdb1 still cannot be formatted: [root@gefjun ~]# mke2fs /dev/sdb1 -t ext3 mke2fs 1.41.3 (12-Oct-2008) /dev/sdb1 is apparently in use by the system; will not make a filesystem here! A hexdump shows that there are zeros up to 0x19000, after which comes a remainder of stuff from an earlier installation (matches the /boot partition): [root@gefjun ~]# hexdump -C /dev/sdb1 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00019000 64 20 00 00 64 60 00 00 64 a0 00 00 64 e0 00 00 |d ..d`..d...d...| 00019010 64 20 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |d ..............| 00019020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00019400 65 20 00 00 65 60 00 00 65 a0 00 00 65 e0 00 00 |e ..e`..e...e...| 00019410 65 20 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |e ..............| 00019420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00019800 66 20 00 00 66 60 00 00 66 a0 00 00 66 e0 00 00 |f ..f`..f...f...| 00019810 66 20 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |f ..............| 00019820 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00019c00 67 20 00 00 67 60 00 00 67 a0 00 00 67 e0 00 00 |g ..g`..g...g...| 00019c10 67 20 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |g ..............| 00019c20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0001a000 68 20 00 00 68 60 00 00 68 a0 00 00 68 e0 00 00 |h ..h`..h...h...| 0001a010 68 20 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |h ..............| 0001a020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * Looking at /var/log/messages: Jan 11 14:16:35 gefjun kernel: sdb: sdb1 sdb2 Jan 11 14:16:35 gefjun kernel: sd 1:0:0:0: [sdb] Attached SCSI disk Jan 11 14:16:35 gefjun kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0 .... Jan 11 14:16:35 gefjun kernel: md: bind<sdb1> ... So the md-devices seem to like sdb1. Grepping just messages containing "md" Jan 11 14:16:35 gefjun kernel: md: raid1 personality registered for level 1 Jan 11 14:16:35 gefjun kernel: md: raid6 personality registered for level 6 Jan 11 14:16:35 gefjun kernel: md: raid5 personality registered for level 5 Jan 11 14:16:35 gefjun kernel: md: raid4 personality registered for level 4 Jan 11 14:16:35 gefjun kernel: md: md1 stopped. Jan 11 14:16:35 gefjun kernel: md: bind<sdb2> Jan 11 14:16:35 gefjun kernel: md: bind<sda2> Jan 11 14:16:35 gefjun kernel: raid1: raid set md1 active with 2 out of 2 mirrors Jan 11 14:16:35 gefjun kernel: scsi4 : pata_amd Jan 11 14:16:35 gefjun kernel: scsi5 : pata_amd Jan 11 14:16:35 gefjun kernel: ata5: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14 Jan 11 14:16:35 gefjun kernel: ata6: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15 Jan 11 14:16:35 gefjun kernel: md: bind<sdb1> Jan 11 14:16:35 gefjun kernel: md: md_d1 stopped. Will try to zero the whole partition, not only the start, then reboot.
[root@gefjun ~]# dd if=/dev/zero of=/dev/sdb1 dd: writing to `/dev/sdb1': No space left on device 401563+0 records in 401562+0 records out 205599744 bytes (206 MB) copied, 1.2918 s, 159 MB/s [root@gefjun ~]# hexdump -C /dev/sdb1 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 0c413400
Ok -- that did it. "mke2fs /dev/sdb1 -t ext3" now works. Does the software RAID depend on what's at the _end_ of the partition? The "md" messages from /var/log/message: Jan 11 15:43:24 gefjun kernel: md: raid1 personality registered for level 1 Jan 11 15:43:24 gefjun kernel: md: raid6 personality registered for level 6 Jan 11 15:43:24 gefjun kernel: md: raid5 personality registered for level 5 Jan 11 15:43:24 gefjun kernel: md: raid4 personality registered for level 4 Jan 11 15:43:24 gefjun kernel: md: md1 stopped. Jan 11 15:43:24 gefjun kernel: md: bind<sdb2> Jan 11 15:43:24 gefjun kernel: md: bind<sda2> Jan 11 15:43:24 gefjun kernel: raid1: raid set md1 active with 2 out of 2 mirrors Jan 11 15:43:24 gefjun kernel: scsi4 : pata_amd Jan 11 15:43:24 gefjun kernel: scsi5 : pata_amd Jan 11 15:43:24 gefjun kernel: ata5: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14 Jan 11 15:43:24 gefjun kernel: ata6: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15 Jan 11 15:43:24 gefjun kernel: md: md_d1 stopped.
Apparently, RAID metadata is indeed stored towards the end of the partition. To conclude, there possibly should be a red-button "partition destruction" option in anaconda to completely erase a partition as a last resort.
We have made extensive changes to the partitioning code for F11 beta, such that it is very difficult to tell whether your bug is still relevant or not. Please test with either the latest rawhide you have access to or F11 and let us know whether you are still seeing this problem. Thanks for the bug report. In particular, I know I have seen patches go in to address the persistence of metadata problem. There's been a whole lot of RAID work recently so I suspect this problem should be fixed.