Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 710447

Summary:

Anaconda fails with "Could not commit to disk /dev/sdb" with software raid

Product:

Red Hat Enterprise Linux 6

Reporter:

Jonathan Underwood <jonathan.underwood>

Component:

anaconda

Assignee:

Anaconda Maintenance Team <anaconda-maint-list>

Status:

CLOSED NOTABUG

QA Contact:

Release Test Team <release-test-team-automation>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.0

CC:

mpoole

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-06-03 15:29:37 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Anaconda log	none
Kickstart	none
ks-pre log	none
storage.log	none
syslog	none
traceback	none
logs, kickstart etc demonstrating failure mode	none

Description Jonathan Underwood 2011-06-03 12:32:51 UTC

Description of problem:
I have a machine with two identical disks onto which I am installing using mdraid RAID 1 via network install kickstart like this:

zerombr
clearpart --all --initlabel 
ignoredisk --only-use=sda,sdb
bootloader  --location=mbr --driveorder=sda,sdb
# /boot
part raid.01 --asprimary --size=1024 --ondisk=sda
part raid.02 --asprimary --size=1024 --ondisk=sdb
# /
# Note that we add --grow here. We'd need to remove this if the two disks weren't the same size!
part raid.11 --asprimary --size=61440 --ondisk=sda --grow 
part raid.12 --asprimary --size=61440 --ondisk=sdb --grow
# <swap>
part raid.21 --asprimary --size=4096 --ondisk=sda
part raid.22 --asprimary --size=4096 --ondisk=sdb
# Format /boot and /.
raid /boot --fstype=ext4 --level=1 --device=md0 raid.01 raid.02
raid /     --fstype=ext4 --level=1 --device=md1 raid.11 raid.12
raid swap  --fstype=swap --level=1 --device=md2 raid.21 raid.22

Consistently, when I reinstall this machine, the first time I try, anaconda bails showing an error message "Could not commit to disk /dev/sdb". If I then reboot the machine and restart the installation, it installs fine. I thought at first this was some issue with a persistent raid superblock being found on the disks somehow and confusing things, so I've gone to some lengths in the kickstart to try and obliterate the superblock, so I don't think this is the root cause. I am now scratching my head.




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jonathan Underwood 2011-06-03 12:33:36 UTC

Created attachment 502798 [details]
Anaconda log

Comment 2 Jonathan Underwood 2011-06-03 12:34:23 UTC

Created attachment 502799 [details]
Kickstart

Comment 3 Jonathan Underwood 2011-06-03 12:34:59 UTC

Created attachment 502800 [details]
ks-pre log

Comment 4 Jonathan Underwood 2011-06-03 12:35:29 UTC

Created attachment 502801 [details]
storage.log

Comment 5 Jonathan Underwood 2011-06-03 12:36:01 UTC

Created attachment 502802 [details]
syslog

Comment 6 Jonathan Underwood 2011-06-03 12:37:57 UTC

Seems that this has been seen elsewhere too:

https://www.redhat.com/archives/rhelv6-beta-list/2010-May/msg00177.html

Comment 8 Jonathan Underwood 2011-06-03 14:42:36 UTC

Created attachment 502817 [details]
traceback

Comment 9 David Lehman 2011-06-03 15:27:12 UTC

From your ks-pre log:

find /dev -name md[0-9]* -exec umount '{}' \;
+ find /dev -name md0.dmfhJB md1.vGDHcZ md2.KU252P -exec umount '{}' ';'
+ sleep 10
find: paths must precede expression: md1.vGDHcZ
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
find /dev -name md[0-9]* -exec mdadm -S '{}' \;
+ find /dev -name md0.dmfhJB md1.vGDHcZ md2.KU252P -exec mdadm -S '{}' ';'
find: paths must precede expression: md1.vGDHcZ
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]


You should escape the '*' in your find expression -- otherwise it will get expanded as a glob in the current directory by the shell, eg:

If in the current dir you have fooX, fooY and run this command:

  find /somedir -name foo* -exec <whatever>

the command you are actually running is this:

  find /somedir -name fooX fooY -exec <whatever>

regardless of what is in /somedir. This is probably not what you want. If /somedir contains fooA and fooZ your find will fail because of the shell expansion. To fix it, just use this instead:

  find /somedir --name foo\* -exec <whatever>

Comment 10 David Lehman 2011-06-03 15:29:37 UTC

The end result of this is that you are never deactivating the RAID arrays and thus are putting the system into an inconsistent state before running anaconda. Parted thinks the disks have no partitions but the kernel thinks otherwise because those preexisting partitions have been held open by md.

Comment 11 Jonathan Underwood 2011-06-03 15:42:56 UTC

Hello David,

Yes, you're right, fixing that fixed the problem - seems I was on the right track to understanding the problem being that the presence of the superblock was the problem. All that stuff in %pre was just trying to work around the problem really. Shouldn't anaconda take care of properly removing raid superblocks etc by virtue of the "clearpart --all" ? This is probably more of an RFE than a bug report, but still worth considering I think.

Thanks again for the pointer.

Comment 12 David Lehman 2011-06-03 15:57:35 UTC

clearpart should remove the raid superblocks. If you'd like to attach logs showing what happens when you omit the %pre I would be happy to take a look at them to see what's going on.

Comment 13 Jonathan Underwood 2011-06-03 16:55:56 UTC

OK, looking a bit closer here is what I find:

clearpart does successfully remove the superblock if nothing in %pre has caused md to activate the RAID array. Good.

Previously, I was activating the raid array(s) in %pre in order to save the old ssh host keys prior to installation, and importantly, not subsequently shutting down the raid array(s), and this causes anaconda to fall over in a rather odd way. Anaconda gets quite a long way towards making the new raid array before failing (in the manner I originally reported).

One could of course argue that I'm an idiot for not deactivating the raid arrays at the end of %pre, and that that anaconda can't protect against %pre lunacy. That said, I could imagine that either of these anaconda behaviours would be better in such situations:

1) clearpart deactivates any raid arrays that are active but are part of the set of devices about to be (re)partitioned; or

2) clearpart checks first to see if any of the devices (raid or otherwise) that are about to be partitioned are activated/mounted before proceeding and errors out at that point before writing anything to the disks.

Option 2 is probably the safer bet, as option 1 is probably a good bullet to shoot yourself in the foot with :).

I'll attach new logs in a second.

Comment 14 Jonathan Underwood 2011-06-03 17:07:25 UTC

Created attachment 502865 [details]
logs, kickstart etc demonstrating failure mode

Comment 15 David Lehman 2011-06-03 17:27:37 UTC

If you remove --initlabel from your clearpart command you might get the behavior you were hoping for. I'm not sure, though, since there isn't any valid case for testing already-active mdraid or lvm in RHEL.