Bug 523862 - mdadm craps at boot
Summary: mdadm craps at boot
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: mdadm
Version: rawhide
Hardware: All
OS: Linux
low
urgent
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 524381 (view as bug list)
Depends On:
Blocks: F12Blocker, F12FinalBlocker F12Beta, F12BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2009-09-16 21:56 UTC by Nicolas Mailhot
Modified: 2009-10-04 17:01 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-10-03 17:10:12 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
dmesg (55.41 KB, text/plain)
2009-09-16 23:23 UTC, Nicolas Mailhot
no flags Details
Backtrace (9.08 KB, text/plain)
2009-09-22 20:50 UTC, Alexey Torkhov
no flags Details

Description Nicolas Mailhot 2009-09-16 21:56:31 UTC
Description of problem:

see https://bugzilla.redhat.com/show_bug.cgi?id=521959#c13


Version-Release number of selected component (if applicable):

http://koji.fedoraproject.org/koji/buildinfo?buildID=132143

Comment 1 Nicolas Mailhot 2009-09-16 23:23:35 UTC
Created attachment 361397 [details]
dmesg

Comment 2 Adam Williamson 2009-09-18 16:58:19 UTC
could you be somewhat more specific? I really don't understand what's going on here.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 3 Nicolas Mailhot 2009-09-20 17:29:30 UTC
mdadm[800]: segfault at 0 ip 00007fe511dd71f2 sp 00007fff7e693e68 error 4 in libc-2.10.90.so[7fe511d56000+176000]

It's all in the log

Comment 4 Nicolas Mailhot 2009-09-20 17:33:06 UTC
On the console you see something like

udev:

/sbin/mdmadm --detail
--export /dev/dm127' unexpected exit with status 0x000b

Comment 5 Alexey Torkhov 2009-09-22 20:50:29 UTC
Created attachment 362137 [details]
Backtrace

Mdadm crashes for me, when simply running "mdadm --detail --scan" with default mdadm.conf but md arrays present on disk.

Comment 6 Alexey Torkhov 2009-09-22 20:54:57 UTC
*** Bug 524381 has been marked as a duplicate of this bug. ***

Comment 7 Nicolas Mailhot 2009-09-30 18:45:14 UTC
anything post mdadm-3.0-2.fc12.x86_64 still makes this system crash at boot and drop in the maintenance console

Comment 8 Sander Hoentjen 2009-10-01 18:46:42 UTC
Another backtrace

# gdb --args mdadm -A /dev/md0
GNU gdb (GDB) Fedora (6.8.91.20090930-2.fc12)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /sbin/mdadm...Reading symbols from /usr/lib/debug/sbin/mdadm.debug...done.
done.
(gdb) run
Starting program: /sbin/mdadm -A /dev/md0

Program received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/strlen.S:31
31		pcmpeqb	(%rdi), %xmm2
Current language:  auto
The current source language is "auto; currently asm"
(gdb) bt
#0  __strlen_sse2 () at ../sysdeps/x86_64/strlen.S:31
#1  0x0000000000435df9 in set_member_info (st=0x85af00, ent=0x85d1a0) at mapfile.c:306
#2  0x00000000004362f8 in RebuildMap () at mapfile.c:369
#3  0x0000000000436861 in map_read (melp=<value optimized out>) at mapfile.c:166
#4  0x0000000000436c2d in map_update (mpp=0x0, devnum=<value optimized out>, metadata=0x7fffffffdfb4 "0.90", uuid=0x7fffffffdf28, path=<value optimized out>) at mapfile.c:206
#5  0x000000000040f52a in Assemble (st=<value optimized out>, mddev=<value optimized out>, ident=<value optimized out>, devlist=<value optimized out>, 
    backup_file=<value optimized out>, readonly=<value optimized out>, runstop=<value optimized out>, update=<value optimized out>, homehost=<value optimized out>, 
    require_homehost=<value optimized out>, verbose=<value optimized out>, force=<value optimized out>) at Assemble.c:1004
#6  0x000000000040547d in main (argc=<value optimized out>, argv=<value optimized out>) at mdadm.c:1055

Comment 9 Hans de Goede 2009-10-02 19:59:08 UTC
The mdadm crash is fixed by mdadm-3.0.2-1.fc12
A tag request for including this in to F-12 is here:
https://fedorahosted.org/rel-eng/ticket/2294

Comment 10 Adam Williamson 2009-10-03 02:36:46 UTC
if people could test this quickly over the weekend that'd be great; we have a go/no-go meeting on monday. you can get the fixed build here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=134892

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 11 Bruno Wolff III 2009-10-03 05:03:32 UTC
I tried it out and the system boots successfully. There were no segfault or unexpected status messages. However I did see some warnings that I don't get with mkinitrd images which suggests there is still some minor problem.
For instance:
Buffer I/O error on device dm-0, logical block 64
Buffer I/O error on device dm-0, logical block 65
Buffer I/O error on device dm-0, logical block 66
Buffer I/O error on device dm-0, logical block 67
Buffer I/O error on device dm-0, logical block 68
Buffer I/O error on device dm-0, logical block 69
Buffer I/O error on device dm-0, logical block 70
Buffer I/O error on device dm-0, logical block 71
device-mapper: ioctl: unable to remove open device temporary-cryptsetup-930
Buffer I/O error on device dm-0, logical block 72
Buffer I/O error on device dm-0, logical block 73

Comment 12 Hans de Goede 2009-10-03 08:38:07 UTC
(In reply to comment #11)
> I tried it out and the system boots successfully. There were no segfault or
> unexpected status messages. However I did see some warnings that I don't get
> with mkinitrd images which suggests there is still some minor problem.
> For instance:
> Buffer I/O error on device dm-0, logical block 64
> Buffer I/O error on device dm-0, logical block 65
> Buffer I/O error on device dm-0, logical block 66
> Buffer I/O error on device dm-0, logical block 67
> Buffer I/O error on device dm-0, logical block 68
> Buffer I/O error on device dm-0, logical block 69
> Buffer I/O error on device dm-0, logical block 70
> Buffer I/O error on device dm-0, logical block 71
> device-mapper: ioctl: unable to remove open device temporary-cryptsetup-930
> Buffer I/O error on device dm-0, logical block 72
> Buffer I/O error on device dm-0, logical block 73  

Atleast the:
device-mapper: ioctl: unable to remove open device temporary-cryptsetup-930

Is a different issue, dmcrypt creates a temporary devicemapper device, for some reason and dracut's udev rules should not probe that, otherwise you get that unable to remove error, because it is busy due to the probing. We recently hit the same issue in anaconda.

I'm right now not behind my computer with the irc logs of when we discussed this, when I'm behind that machine I'll add another comment with some more info.

I think the other errors are related / caused by this same issue. Either way
I believe this is unrelated to mdraid / mdadm.

Comment 13 Sander Hoentjen 2009-10-03 10:40:48 UTC
new mdadm fixes it for me too

Comment 14 Bruno Wolff III 2009-10-03 16:42:11 UTC
For the other issue I'd like to be added to whatever bug is tracking it. If there isn't one, I can start one?

Comment 15 Adam Williamson 2009-10-03 17:10:12 UTC
closing this one, anyway, as it seems to be clearly fixed. yes, Bruno, if there's no hardware issue behind your other problem, look for a dupe or file a new bug (i'd search for "Buffer I/O error on device dm-0" to find dupes).

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 16 Hans de Goede 2009-10-04 08:57:48 UTC
(In reply to comment #14)
> For the other issue I'd like to be added to whatever bug is tracking it. If
> there isn't one, I can start one?  

I don't think there is a bug for tracking the temp dmcrypt node probing from dracut, please file a bug against dracut for this. Also please include a note there, to see:
https://bugzilla.redhat.com/show_bug.cgi?id=526699#c5

For more info.

Comment 17 Bruno Wolff III 2009-10-04 16:49:05 UTC
I opened bug 527056 and made the suggested reference.

Comment 18 Hans de Goede 2009-10-04 17:01:28 UTC
(In reply to comment #17)
> I opened bug 527056 and made the suggested reference.  

Thanks!


Note You need to log in before you can comment on or make changes to this bug.