Hide Forgot
Description of problem: I have a machine with two identical disks onto which I am installing using mdraid RAID 1 via network install kickstart like this: zerombr clearpart --all --initlabel ignoredisk --only-use=sda,sdb bootloader --location=mbr --driveorder=sda,sdb # /boot part raid.01 --asprimary --size=1024 --ondisk=sda part raid.02 --asprimary --size=1024 --ondisk=sdb # / # Note that we add --grow here. We'd need to remove this if the two disks weren't the same size! part raid.11 --asprimary --size=61440 --ondisk=sda --grow part raid.12 --asprimary --size=61440 --ondisk=sdb --grow # <swap> part raid.21 --asprimary --size=4096 --ondisk=sda part raid.22 --asprimary --size=4096 --ondisk=sdb # Format /boot and /. raid /boot --fstype=ext4 --level=1 --device=md0 raid.01 raid.02 raid / --fstype=ext4 --level=1 --device=md1 raid.11 raid.12 raid swap --fstype=swap --level=1 --device=md2 raid.21 raid.22 Consistently, when I reinstall this machine, the first time I try, anaconda bails showing an error message "Could not commit to disk /dev/sdb". If I then reboot the machine and restart the installation, it installs fine. I thought at first this was some issue with a persistent raid superblock being found on the disks somehow and confusing things, so I've gone to some lengths in the kickstart to try and obliterate the superblock, so I don't think this is the root cause. I am now scratching my head. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 502798 [details] Anaconda log
Created attachment 502799 [details] Kickstart
Created attachment 502800 [details] ks-pre log
Created attachment 502801 [details] storage.log
Created attachment 502802 [details] syslog
Seems that this has been seen elsewhere too: https://www.redhat.com/archives/rhelv6-beta-list/2010-May/msg00177.html
Created attachment 502817 [details] traceback
From your ks-pre log: find /dev -name md[0-9]* -exec umount '{}' \; + find /dev -name md0.dmfhJB md1.vGDHcZ md2.KU252P -exec umount '{}' ';' + sleep 10 find: paths must precede expression: md1.vGDHcZ Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression] find /dev -name md[0-9]* -exec mdadm -S '{}' \; + find /dev -name md0.dmfhJB md1.vGDHcZ md2.KU252P -exec mdadm -S '{}' ';' find: paths must precede expression: md1.vGDHcZ Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression] You should escape the '*' in your find expression -- otherwise it will get expanded as a glob in the current directory by the shell, eg: If in the current dir you have fooX, fooY and run this command: find /somedir -name foo* -exec <whatever> the command you are actually running is this: find /somedir -name fooX fooY -exec <whatever> regardless of what is in /somedir. This is probably not what you want. If /somedir contains fooA and fooZ your find will fail because of the shell expansion. To fix it, just use this instead: find /somedir --name foo\* -exec <whatever>
The end result of this is that you are never deactivating the RAID arrays and thus are putting the system into an inconsistent state before running anaconda. Parted thinks the disks have no partitions but the kernel thinks otherwise because those preexisting partitions have been held open by md.
Hello David, Yes, you're right, fixing that fixed the problem - seems I was on the right track to understanding the problem being that the presence of the superblock was the problem. All that stuff in %pre was just trying to work around the problem really. Shouldn't anaconda take care of properly removing raid superblocks etc by virtue of the "clearpart --all" ? This is probably more of an RFE than a bug report, but still worth considering I think. Thanks again for the pointer.
clearpart should remove the raid superblocks. If you'd like to attach logs showing what happens when you omit the %pre I would be happy to take a look at them to see what's going on.
OK, looking a bit closer here is what I find: clearpart does successfully remove the superblock if nothing in %pre has caused md to activate the RAID array. Good. Previously, I was activating the raid array(s) in %pre in order to save the old ssh host keys prior to installation, and importantly, not subsequently shutting down the raid array(s), and this causes anaconda to fall over in a rather odd way. Anaconda gets quite a long way towards making the new raid array before failing (in the manner I originally reported). One could of course argue that I'm an idiot for not deactivating the raid arrays at the end of %pre, and that that anaconda can't protect against %pre lunacy. That said, I could imagine that either of these anaconda behaviours would be better in such situations: 1) clearpart deactivates any raid arrays that are active but are part of the set of devices about to be (re)partitioned; or 2) clearpart checks first to see if any of the devices (raid or otherwise) that are about to be partitioned are activated/mounted before proceeding and errors out at that point before writing anything to the disks. Option 2 is probably the safer bet, as option 1 is probably a good bullet to shoot yourself in the foot with :). I'll attach new logs in a second.
Created attachment 502865 [details] logs, kickstart etc demonstrating failure mode
If you remove --initlabel from your clearpart command you might get the behavior you were hoping for. I'm not sure, though, since there isn't any valid case for testing already-active mdraid or lvm in RHEL.