444237 – mdadm failed to RUN_ARRAY invalid argument

Bug 444237 - mdadm failed to RUN_ARRAY invalid argument

Summary: mdadm failed to RUN_ARRAY invalid argument

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mdadm
Sub Component:
Version:	9
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Doug Ledford
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-04-25 22:20 UTC by Duarte Diogo
Modified:	2018-04-11 14:56 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-11-19 14:47:43 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Duarte Diogo 2008-04-25 22:20:47 UTC

Description of problem:
Config: md0 raid1 /boot
md1 raid 1 /swapp
md2 raid 10 /
md3 raid 10 /home
After a fresh install, with 4 500Gb disks, mdadm gives "mdadm: failed to
run_array /dev/md3: Invalid argument" error in md3 (raid 10 /home) and md1 (raid
1 swap) and stops. md0 (raid 1 /boot) and md2 (raid 10 /) works fine.
If I start only the console, all raid devices work.
In the repair console, I can make mdadm -A --run /dev/md1 and can start de array
and made a rebuild.

If I reboot, even after the array rebuild, I get the same error.

Version-Release number of selected component (if applicable):
Fedora 9 Preview Release (I don't remember the kernel version)

How reproducible:
Installing raid arrays

Steps to Reproduce:
1. Start a fresh install
2. Make raid 10 arrays (with or without LVM)
3. Reboot
  
Actual results:
System stops at the boot time and opens a recover console

Expected results:
A running system

Additional info:

Comment 1 Doug Ledford 2008-04-25 22:40:37 UTC

Can you tell me the exact version of mdadm in use on your system?  The latest is
mdadm-2.6.4-4 and was updated to solve some issues related to incremental
assembly, which may be the problem you are seeing.

Comment 2 Duarte Diogo 2008-04-28 14:22:48 UTC

Yes, this is the mdadm-2.6.4-3.
Ok, I will try the other one.
Thanks

Comment 3 Bug Zapper 2008-05-14 10:11:19 UTC

Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 4 Duarte Diogo 2008-05-15 23:59:13 UTC

I made a Fedora 8 fresh install with the same RAID configuration, and all the
arrays works well. I also can disconet or remove one drive from the array, and
mount another and make a recover.
Today I upgraded to Fedora 9 with a Fedora 9 x86_64 DVD and I get the same problem.
I already started the system with a recue disc and made a yum update.
Just for curiosity, with the rescue disc, all the arrays mount well. I made cat
/proc/mdstat and I get:

Personalities: [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear]
md2: active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1]
51198976 blocks 256K chunks 2 near-copies [4/4] [UUUU]

md1: active raid1 sdc3[0] sdd3[1]
2048192 blocks [2/2] [UU]

md0: active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
104320 blocks [4/4] [UUUU]

md3: active raid10 sda3[0] sdd5[3] sdc5[2] sdb3[1]
921263104 blocks 256K chunks 2 near-copies [4/4] [UUUU]

All my system was updated with the last kernel 2.6.25-14.fc9.x86_64.

I already try to disable Selinux, but is the same.

Dmesg tell me:
md: bind<sda3>
md: bind<sda1>
mdadm[1246]: segfault at 0 ip 408b5c sp 7fff18743850 error 4 in mdadm[400000+2f000]
md: bind<sdd1>
md: bind<sdd5>
mdadm[1247]: segfault at 0 ip 408b5c sp 7fff3cc429b0 error 4 in mdadm[400000+2f000]
md: bind<sdc1>
md: bind<sdb1>
md: bind<sdc5>
mdadm[1282]: segfault at 0 ip 408b5c sp 7fff87bc4820 error 4 in mdadm[400000+2f000]
md: raid1 personality registered for level 1
raid1: raid set md0 active with 4 out of 4 mirrors
md: bind<sdb3>
mdadm[1300]: segfault at 0 ip 408b5c sp 7fff7ab747d0 error 4 in mdadm[400000+2f000]

Comment 5 Dave Pickerill 2008-06-09 02:29:29 UTC

I'm having the same problem with the 32 bit version.  I've had it on 2 different
machines.  One machine has been running Fedora 7 in this configuration since a
few weeks after Fedora7 came out.  4 Sata drives, 250 GB each /dev/md0 is 102 MB
raid1 across all 4 drivcs.   /dev/md2 is 500 MB RAID 0 across all drives as the
swap file.  /dev/md2 is about 200GB RAID10 across all 4 drives /dev/md3 is the
remainder about 250GB RAID10 across all 4 drives.  When I did an upgrade
install, it behaved the same way and I had to re-install Fedora7 to get my data
back.  The second machine I built to try and figure out what I did wrong.  It
has 4 IDE 60GB drives with the same type of RAID devices.  It exhibits the same
behavior on a 7 to 9 upgrade, or a fresh 9 install.  The odd thing is, all
drives mount up just fine in the rescue mode of Fedora9.  Just won't mount on a
normal boot.  I get the same segfault messages in dmesg but with different numbers

Comment 6 Duarte Diogo 2008-06-09 02:53:24 UTC

Yes, this bug is also reported in
https://bugzilla.redhat.com/show_bug.cgi?id=444237 and
https://bugzilla.redhat.com/show_bug.cgi?id=447818
I made a workarround removing the /etc/udev/rules.d/70-mdadm.rules file and
adding this to the rc.sysinit in line 321, before # Device mapper & related
initialization:
#RAID SETUP
update_boot_stage RCraid
if [ -f /etc/mdadm.conf ]; then
/sbin/mdadm -A -s --auto=yes
fi

After that, the system boots in the same way as the previous versions.

Comment 7 Doug Ledford 2008-06-26 23:57:59 UTC

I've made two changes, either one of which could solve your problem.

First, I updated the mdadm to 2.6.7 (latest upstream, multiple bug fixes, could
solve your problem by itself).

Second I updated the udev rule that calls mdadm to include the --scan and --run
options on the mdadm command, which likewise had been causing problems for
people and may solve your problem.

However, since I couldn't reproduce your problem here, I'm not sure if your
problem is fixed.  If you could test the packages that will show up in fedora 9
updates testing repo shortly, I would appreciate your feedback on whether or not
it solves your problem.

Comment 8 Fedora Update System 2008-06-27 00:07:25 UTC

mdadm-2.6.7-1.fc9 has been submitted as an update for Fedora 9

Comment 9 Fedora Update System 2008-06-28 22:15:38 UTC

mdadm-2.6.7-1.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mdadm'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-5804

Comment 10 Duarte Diogo 2008-06-29 03:21:41 UTC

I made the update, and I get 3 error messages like this: Starting udev:
udev-event [1033]: run_program: '/sbin/mdadm' abnormal exit

Now I have the 70-mdadm.rules, but to get the system up, I need to add:
#RAID SETUP
update_boot_stage RCraid
if [ -f /etc/mdadm.conf ]; then
/sbin/mdadm -A -s --auto=yes
fi

to the rc.sysinit

The errors are related with RAID10 partitions, I think.
My mdadm.conf:

DEVICE partitions
MAILADDR root
ARRAY /dev/md3 level=raid10 num-devices=4 UUID=829e719d:221d9329:7e352b4e:af73acce
ARRAY /dev/md0 level=raid1 num-devices=4 UUID=17aea36e:835d634e:6227f0b9:c59154f4
ARRAY /dev/md2 level=raid10 num-devices=4 UUID=c5aa9f9d:dc152bf5:d924f7ac:4fc8274b

At this time I have degraded arrays because I have one HDD down and removed from
the mdadm.conf, but that situations don't matter to the problem.

Comment 11 Matěj Cepl 2008-07-02 07:00:02 UTC

Just note from your friendly bug triager -- we don't use FALS_QA state in Fedora
bugs. I think, that the correct status of this bug according to
https://fedoraproject.org/wiki/BugZappers/BugStatusWorkFlow is ASSIGNED. Please,
correct this bug to the right state, if I am wrong.

Comment 12 Dimitri Maziuk 2008-09-18 22:00:23 UTC

yum --enablerepo=updates-testing update mdadm : No packages marked for update.
With mdadm-2.6.4-4.fc9.x86_64 the problem is still here:
 udevd-event[1234]: run-program: '/sbin/mdadm' abnormal exit
for each disk in the raid, it looks like (it'd help if these were logged in a file somewhere). Happens on two machines, both with raid-10s.

Comment 13 Matěj Cepl 2008-09-19 10:25:35 UTC

Dimitri, take a look at http://koji.fedoraproject.org/koji/buildinfo?buildID=53941

Comment 14 Dimitri Maziuk 2008-10-06 20:05:28 UTC

(In reply to comment #13)
> Dimitri, take a look at
> http://koji.fedoraproject.org/koji/buildinfo?buildID=53941

Didn't help. Same error, plus it's now rejecting one of the disks (why TF is it "non-fresh"? -- re-adding it afterwards doesn't produce any error messages).

Oct  6 14:55:31 octopus kernel: sd 3:0:0:0: [sdc] Attached SCSI disk
Oct  6 14:55:31 octopus kernel: sd 3:0:0:0: Attached scsi generic sg2 type 0
Oct  6 14:55:31 octopus kernel: ACPI: PCI Interrupt 0000:00:05.1[B] -> Link [LSA1] -> GSI 23 (level, low) -> IRQ 23
Oct  6 14:55:31 octopus kernel: sata_nv 0000:00:05.1: Using SWNCQ mode
Oct  6 14:55:31 octopus kernel: scsi4 : sata_nv
Oct  6 14:55:31 octopus kernel: scsi5 : sata_nv
Oct  6 14:55:31 octopus kernel: ata5: SATA max UDMA/133 cmd 0xc880 ctl 0xc800 bmdma 0xc080 irq 23
Oct  6 14:55:31 octopus kernel: ata6: SATA max UDMA/133 cmd 0xc480 ctl 0xc400 bmdma 0xc088 irq 23
Oct  6 14:55:31 octopus kernel: md: bind<sdb1>
Oct  6 14:55:31 octopus kernel: mdadm[1562]: segfault at 0 ip 407f8b sp 7fff0d556470 error 4 in mdadm[400000+2a000]
Oct  6 14:55:31 octopus kernel: md: bind<sdc1>
Oct  6 14:55:31 octopus kernel: mdadm[1609]: segfault at 0 ip 407f8b sp 7fff7b26bd70 error 4 in mdadm[400000+2a000]
Oct  6 14:55:31 octopus kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct  6 14:55:31 octopus kernel: ata5.00: ATA-8: ST31000340AS, SD15, max UDMA/133
Oct  6 14:55:31 octopus kernel: ata5.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
Oct  6 14:55:31 octopus kernel: ata5.00: configured for UDMA/133
Oct  6 14:55:31 octopus kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct  6 14:55:31 octopus kernel: ata6.00: ATA-8: ST31000340AS, SD15, max UDMA/133
Oct  6 14:55:31 octopus kernel: ata6.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
Oct  6 14:55:31 octopus kernel: ata6.00: configured for UDMA/133
Oct  6 14:55:31 octopus kernel: scsi 4:0:0:0: Direct-Access     ATA      ST31000340AS     SD15 PQ: 0 ANSI: 5
Oct  6 14:55:31 octopus kernel: sd 4:0:0:0: [sdd] 1953525168 512-byte hardware sectors (1000205 MB)
Oct  6 14:55:31 octopus kernel: sd 4:0:0:0: [sdd] Write Protect is off
Oct  6 14:55:31 octopus kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct  6 14:55:31 octopus kernel: sd 4:0:0:0: [sdd] 1953525168 512-byte hardware sectors (1000205 MB)
Oct  6 14:55:31 octopus kernel: sd 4:0:0:0: [sdd] Write Protect is off
Oct  6 14:55:31 octopus kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct  6 14:55:31 octopus kernel: sdd: sdd1
Oct  6 14:55:31 octopus kernel: sd 4:0:0:0: [sdd] Attached SCSI disk
Oct  6 14:55:31 octopus kernel: sd 4:0:0:0: Attached scsi generic sg3 type 0
Oct  6 14:55:31 octopus kernel: scsi 5:0:0:0: Direct-Access     ATA      ST31000340AS     SD15 PQ: 0 ANSI: 5
Oct  6 14:55:31 octopus kernel: sd 5:0:0:0: [sde] 1953525168 512-byte hardware sectors (1000205 MB)
Oct  6 14:55:31 octopus kernel: sd 5:0:0:0: [sde] Write Protect is off
Oct  6 14:55:31 octopus kernel: sd 5:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct  6 14:55:31 octopus kernel: sd 5:0:0:0: [sde] 1953525168 512-byte hardware sectors (1000205 MB)
Oct  6 14:55:31 octopus kernel: sd 5:0:0:0: [sde] Write Protect is off
Oct  6 14:55:31 octopus kernel: sd 5:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct  6 14:55:31 octopus kernel: sde: sde1
Oct  6 14:55:31 octopus kernel: sd 5:0:0:0: [sde] Attached SCSI disk
Oct  6 14:55:31 octopus kernel: sd 5:0:0:0: Attached scsi generic sg4 type 0
Oct  6 14:55:31 octopus kernel: ACPI: PCI Interrupt 0000:00:05.2[C] -> Link [LSA2] -> GSI 22 (level, low) -> IRQ 22
Oct  6 14:55:31 octopus kernel: sata_nv 0000:00:05.2: Using SWNCQ mode
Oct  6 14:55:31 octopus kernel: scsi6 : sata_nv
Oct  6 14:55:31 octopus kernel: scsi7 : sata_nv
Oct  6 14:55:31 octopus kernel: ata7: SATA max UDMA/133 cmd 0xc000 ctl 0xbc00 bmdma 0xb480 irq 22
Oct  6 14:55:31 octopus kernel: ata8: SATA max UDMA/133 cmd 0xb880 ctl 0xb800 bmdma 0xb488 irq 22
Oct  6 14:55:31 octopus kernel: md: bind<sdd1>
Oct  6 14:55:31 octopus kernel: mdadm[1674]: segfault at 0 ip 407f8b sp 7fff79dbb8c0 error 4 in mdadm[400000+2a000]
Oct  6 14:55:31 octopus kernel: md: bind<sde1>
Oct  6 14:55:31 octopus kernel: mdadm[1675]: segfault at 0 ip 407f8b sp 7fffabd22dc0 error 4 in mdadm[400000+2a000]
Oct  6 14:55:31 octopus kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct  6 14:55:31 octopus kernel: ata7.00: ATA-8: ST31000340AS, SD15, max UDMA/133
Oct  6 14:55:31 octopus kernel: ata7.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
Oct  6 14:55:31 octopus kernel: ata7.00: configured for UDMA/133
Oct  6 14:55:31 octopus kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Oct  6 14:55:31 octopus kernel: ata8.00: ATA-8: ST31000340AS, SD15, max UDMA/133
Oct  6 14:55:31 octopus kernel: ata8.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 31/32)
Oct  6 14:55:31 octopus kernel: ata8.00: configured for UDMA/133
Oct  6 14:55:31 octopus kernel: scsi 6:0:0:0: Direct-Access     ATA      ST31000340AS     SD15 PQ: 0 ANSI: 5
Oct  6 14:55:31 octopus kernel: sd 6:0:0:0: [sdf] 1953525168 512-byte hardware sectors (1000205 MB)
Oct  6 14:55:31 octopus kernel: sd 6:0:0:0: [sdf] Write Protect is off
Oct  6 14:55:31 octopus kernel: sd 6:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct  6 14:55:31 octopus kernel: sd 6:0:0:0: [sdf] 1953525168 512-byte hardware sectors (1000205 MB)
Oct  6 14:55:31 octopus kernel: sd 6:0:0:0: [sdf] Write Protect is off
Oct  6 14:55:31 octopus kernel: sd 6:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct  6 14:55:31 octopus kernel: sdf: sdf1
Oct  6 14:55:31 octopus kernel: sd 6:0:0:0: [sdf] Attached SCSI disk
Oct  6 14:55:31 octopus kernel: sd 6:0:0:0: Attached scsi generic sg5 type 0
Oct  6 14:55:31 octopus kernel: scsi 7:0:0:0: Direct-Access     ATA      ST31000340AS     SD15 PQ: 0 ANSI: 5
Oct  6 14:55:31 octopus kernel: sd 7:0:0:0: [sdg] 1953525168 512-byte hardware sectors (1000205 MB)
Oct  6 14:55:31 octopus kernel: sd 7:0:0:0: [sdg] Write Protect is off
Oct  6 14:55:31 octopus kernel: sd 7:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct  6 14:55:31 octopus kernel: sd 7:0:0:0: [sdg] 1953525168 512-byte hardware sectors (1000205 MB)
Oct  6 14:55:31 octopus kernel: sd 7:0:0:0: [sdg] Write Protect is off
Oct  6 14:55:31 octopus kernel: sd 7:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct  6 14:55:31 octopus kernel: sdg: sdg1
Oct  6 14:55:31 octopus kernel: sd 7:0:0:0: [sdg] Attached SCSI disk
Oct  6 14:55:31 octopus kernel: sd 7:0:0:0: Attached scsi generic sg6 type 0
Oct  6 14:55:31 octopus kernel: i2c-adapter i2c-0: nForce2 SMBus adapter at 0x2d00
Oct  6 14:55:31 octopus kernel: i2c-adapter i2c-1: nForce2 SMBus adapter at 0x2e00
Oct  6 14:55:31 octopus kernel: forcedeth: Reverse Engineered nForce ethernet driver. Version 0.61.
Oct  6 14:55:31 octopus kernel: ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 20
Oct  6 14:55:31 octopus kernel: ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LMAC] -> GSI 20 (level, low) -> IRQ 20
Oct  6 14:55:31 octopus kernel: forcedeth 0000:00:08.0: ifname eth0, PHY OUI 0x5043 @ 2, addr 00:30:48:c3:d2:52
Oct  6 14:55:31 octopus kernel: forcedeth 0000:00:08.0: highdma csum vlan pwrctl mgmt timirq gbit lnktim msi desc-v3
Oct  6 14:55:31 octopus kernel: ACPI: PCI Interrupt Link [LMAD] enabled at IRQ 20
Oct  6 14:55:31 octopus kernel: ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMAD] -> GSI 20 (level, low) -> IRQ 20
Oct  6 14:55:31 octopus kernel: forcedeth 0000:00:09.0: ifname eth1, PHY OUI 0x5043 @ 3, addr 00:30:48:c3:d2:53
Oct  6 14:55:31 octopus kernel: forcedeth 0000:00:09.0: highdma csum vlan pwrctl mgmt timirq gbit lnktim msi desc-v3
Oct  6 14:55:31 octopus kernel: shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
Oct  6 14:55:31 octopus kernel: md: bind<sdf1>
Oct  6 14:55:31 octopus kernel: mdadm[1758]: segfault at 0 ip 407f8b sp 7fff18132c40 error 4 in mdadm[400000+2a000]
Oct  6 14:55:31 octopus kernel: md: bind<sdg1>
Oct  6 14:55:31 octopus kernel: mdadm[1759]: segfault at 0 ip 407f8b sp 7fffa531fe20 error 4 in mdadm[400000+2a000]
Oct  6 14:55:31 octopus kernel: device-mapper: uevent: version 1.0.3
Oct  6 14:55:31 octopus kernel: device-mapper: ioctl: 4.13.0-ioctl (2007-10-18) initialised: dm-devel
Oct  6 14:55:31 octopus kernel: device-mapper: multipath: version 1.0.5 loaded
Oct  6 14:55:31 octopus kernel: md: md0 stopped.
Oct  6 14:55:31 octopus kernel: md: unbind<sdg1>
Oct  6 14:55:31 octopus kernel: md: export_rdev(sdg1)
Oct  6 14:55:31 octopus kernel: md: unbind<sdf1>
Oct  6 14:55:31 octopus kernel: md: export_rdev(sdf1)
Oct  6 14:55:31 octopus kernel: md: unbind<sde1>
Oct  6 14:55:31 octopus kernel: md: export_rdev(sde1)
Oct  6 14:55:31 octopus kernel: md: unbind<sdd1>
Oct  6 14:55:31 octopus kernel: md: export_rdev(sdd1)
Oct  6 14:55:31 octopus kernel: md: unbind<sdc1>
Oct  6 14:55:31 octopus kernel: md: export_rdev(sdc1)
Oct  6 14:55:31 octopus kernel: md: unbind<sdb1>
Oct  6 14:55:31 octopus kernel: md: export_rdev(sdb1)
Oct  6 14:55:31 octopus kernel: md: bind<sdc1>
Oct  6 14:55:31 octopus kernel: md: bind<sdd1>
Oct  6 14:55:31 octopus kernel: md: bind<sde1>
Oct  6 14:55:31 octopus kernel: md: bind<sdf1>
Oct  6 14:55:31 octopus kernel: md: bind<sdg1>
Oct  6 14:55:31 octopus kernel: md: bind<sdb1>
Oct  6 14:55:31 octopus kernel: md: kicking non-fresh sdc1 from array!
Oct  6 14:55:31 octopus kernel: md: unbind<sdc1>
Oct  6 14:55:31 octopus kernel: md: export_rdev(sdc1)
Oct  6 14:55:31 octopus kernel: md: raid10 personality registered for level 10
Oct  6 14:55:31 octopus kernel: raid10: raid set md0 active with 5 out of 6 devices

Comment 15 Jan "Yenya" Kasprzak 2008-10-21 12:40:44 UTC

I have just ran into the same problem - one md volume (out of four) is not
assembled correctly during the system boot (with eight "udevd-event[...]: run-program: '/sbin/mdadm' abnormal exit" messages.

I have four volumes:

md0 (/, raid10 of 8 partitions)
md1 (swap, raid10 of 8 partitions)
md2 (/export, raid10 of 8 partitions)
md3 (/boot, raid1 of 8 partitions)

md2 is the one which fails to start during boot, other partitions are set up correctly. Md2 it is partly configured in /proc/mdstat:

md2: inactive sdf3[1](S) sdd3[1](S) ...
     xxxxx blocks

mdadm --assemble finds it correctly. Even upgrading to mdadm-2.6.7-1.fc9.x86_64 from Koji (as mentioned in comment #13) did not help.

This is Fedora 9 installed from DVD, and yum updated.

Comment 16 josip 2008-10-25 16:05:31 UTC

I ran into the same problem after the latest yum update (the same system used to work fine using last month's software).  BTW, the latest mdadm in Fedora 9 updates is mdadm-2.6.4-4.fc9.x86_64 at this time.

The problem is triggered when udev rule activates:

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \
        RUN+="/sbin/mdadm --incremental $root/%k"

According to bug #447818, the "--incremental" doesn't work properly, while "--assemble" did.  However, this udev rule belongs to the mdadm package which hasn't changed since June, i.e. it used to work on my machine until today.  In summary:

Sep. 20: F9+updates works fine (no mdadm segfaults)
Oct. 25: (yum update applied 303 updates including new kernel)
Oct. 25: F9+updates broke udev startup (mdadm segfaults)

Interestingly enough, even though mdadm segfaults when invoked via udev rules, my RAID partitions come up and work normally:

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [raid10] 
md4 : active raid6 sdd4[3] sde4[4] sdb4[1] sdf4[5] sdc4[2] sda4[0]
      1884970240 blocks level 6, 32k chunk, algorithm 2 [6/6] [UUUUUU]
      
md3 : active raid10 sdc3[0] sdf3[3] sde3[2] sdd3[1]
      31262336 blocks 64K chunks 2 near-copies [4/4] [UUUU]
      
md2 : active raid10 sdc1[0] sdf1[3] sde1[2] sdd1[1]
      1011840 blocks 64K chunks 2 near-copies [4/4] [UUUU]
      
md0 : active raid1 sdb1[1] sda1[0]
      505920 blocks [2/2] [UU]
      
md1 : active raid1 sda3[0] sdb3[1]
      15631168 blocks [2/2] [UU]
      
unused devices: <none>

Comment 17 josip 2008-10-25 16:47:22 UTC

One more thing: New initscripts-8.76.4-1.x86_64 (built on Oct. 14) already includes changes to rc.sysinit equivalent to the comment #6 above:

# Start any MD RAID arrays that haven't been started yet
[ -f /etc/mdadm.conf -a -x /sbin/mdadm ] && /sbin/mdadm -As --auto=yes --run

Could this change be messing something up?  The original rc.sysinit without those lines used to work fine before initscripts was upgraded.  BTW, neither udev nor mdadm packages got changed since things used to work.  The mdadm package is still the original F9 version built in April, and udev dates back to August.

I'll try commenting out those new lines in rc.sysinit and hope for the best...

Comment 18 josip 2008-10-25 16:57:35 UTC

...and the conclusion is that those new lines in rc.sysinit are essential for the system to boot, i.e. don't comment them out.  Those mdadm segfaults must be due to something else (new kernel?)...

Comment 19 Dimitri Maziuk 2008-10-27 18:49:27 UTC

(In reply to comment #18)
> ...and the conclusion is that those new lines in rc.sysinit are essential for
> the system to boot, i.e. don't comment them out.  Those mdadm segfaults must be
> due to something else (new kernel?)...

I set up my machines in September and they had this problem since day one.

The change with latest initscripts and kernel is that arrays now get started despite mdadm segfaults: previously boot dropped out to "enter root password" trying to fsck /dev/md0.

Comment 20 Fedora Update System 2008-10-30 13:54:58 UTC

mdadm-2.6.7.1-1.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/mdadm-2.6.7.1-1.fc9

Comment 21 Fedora Update System 2008-10-31 10:26:00 UTC

mdadm-2.6.7.1-1.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update mdadm'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-9325

Comment 22 Fedora Update System 2008-11-19 14:47:22 UTC

mdadm-2.6.7.1-1.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 23 Kevin Monroe 2008-11-27 16:22:11 UTC

FYI, I hit the "run_program: '/sbin/mdadm' abnormal exit" problem as soon as I upgraded mdadm to 2.6.7.1-1.fc9. The details of my situation are in comment 29 in 447818:

https://bugzilla.redhat.com/show_bug.cgi?id=447818

The short version is that I had to rename my array from /dev/md0 to /dev/md_d0 in /etc/mdadm.conf and /etc/fstab. It seems as though mdadm cannot correctly assemble an array of partitions if the device is called /dev/mdX. It needs to be called /dev/md_dX.

Note You need to log in before you can comment on or make changes to this bug.