592030 – mdadm crash (and dangerous incorrect mounting) on startup

Bug 592030 - mdadm crash (and dangerous incorrect mounting) on startup

Summary: mdadm crash (and dangerous incorrect mounting) on startup

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	mdadm
Sub Component:
Version:	13
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Doug Ledford
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-05-13 17:54 UTC by Andy Lutomirski
Modified:	2010-07-20 22:57 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-07-20 22:57:37 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
mdadm backtrace (1.89 KB, text/plain) 2010-05-13 17:54 UTC, Andy Lutomirski	no flags	Details
core dump (29.32 KB, application/gzip) 2010-05-13 17:55 UTC, Andy Lutomirski	no flags	Details
mdadm --examine /dev/sda /dev/sdb output (2.23 KB, text/plain) 2010-05-13 17:56 UTC, Andy Lutomirski	no flags	Details
dmraid -n output (5.27 KB, text/plain) 2010-05-13 17:56 UTC, Andy Lutomirski	no flags	Details
Patch to show other slot states (918 bytes, patch) 2010-05-15 00:52 UTC, Dan Williams	no flags	Details \| Diff
Last 32K of RAID element in fubarred state (32.00 KB, application/octet-stream) 2010-05-18 18:31 UTC, Andy Lutomirski	no flags	Details
View All

Description Andy Lutomirski 2010-05-13 17:54:21 UTC

Created attachment 413838 [details]
mdadm backtrace

Hi-

I have an imsm RAID.  I corrupted it (separate bug, to be filed soon) and the BIOS refused to recognize it, instead showing both members as "Offline Device" (IIRC).  So I reset metadata on one (sda, I think) and asked the BIOS to recover the array onto that device.

I rebooted and mdadm segfaulted.  Then dracut did something *really* bad: it mounted /dev/sda1.  (The latter is probably just fallout from the mdadm crash.)

Attachments coming: dmraid -n output, mdadm --examine /dev/sda /dev/sdb output, a core file, and a backtrace.

This is mdadm-3.1.2-10.fc13.x86_64.

Comment 1 Andy Lutomirski 2010-05-13 17:55:32 UTC

Created attachment 413839 [details]
core dump

Clarification: I got the backtrace (gdb.txt) from this core dump, and I got this core dump from mdadm -IRs, after /dev/sda1 was already mounted.

Comment 2 Andy Lutomirski 2010-05-13 17:56:14 UTC

Created attachment 413840 [details]
mdadm --examine /dev/sda /dev/sdb output

Comment 3 Andy Lutomirski 2010-05-13 17:56:33 UTC

Created attachment 413841 [details]
dmraid -n output

Comment 4 Dan Williams 2010-05-15 00:52:31 UTC

Created attachment 414194 [details]
Patch to show other slot states

Can you get the output of /proc/mdstat at the time of the failure?  I'm trying to see how we ended up in update_recovery_start() with all the disks in the list having a non-zero recovery_start.  sda is marked out-of-sync, but maybe we are only picking up sdb and it does not have slot0 marked failed?  The output from examine does not show the state of the other disks, but the attached patch does.

Comment 5 Andy Lutomirski 2010-05-15 04:14:41 UTC

I'll try to reproduce on Monday -- the machine in question is at work and off right now, so it's hard to do remotely.

IIRC /proc/mdstat showed the inactive container device but not the actual array.  I still got the segfault early in /var/log/boot.log, though, but that one might have been different.  Unless dracut did something, then the segfault would have happened starting from empty /proc/mdstat, but I wouldn't be at all surprised if dracut started the incremental assembly.


P.S. The incorrect mount on startup is #592059, so don't worry about it here.
P.P.S. While I have your attention, how am I supposed to boot off a degraded array?  Fedora's initscripts (and in general anything using incremental assembly) seem to ignore this possibility, but as far as I'm concerned, ease of booting off a RAID1 is the main reason to use firmware RAID.

Comment 6 Dan Williams 2010-05-16 17:56:30 UTC

(In reply to comment #5)
> P.P.S. While I have your attention, how am I supposed to boot off a degraded
> array?  Fedora's initscripts (and in general anything using incremental
> assembly) seem to ignore this possibility, but as far as I'm concerned, ease of
> booting off a RAID1 is the main reason to use firmware RAID.    

I haven't tried f13 yet, but the f12 scripts handle degraded assembly just fine.  Perhaps you are seeing calls with the --no-degraded flag set?  The expectation is that flag is only used until all the initial udev events have settled.  At that point a final call to -I without the --no-degraded option is performed to activate imsm containers.

Comment 7 Andy Lutomirski 2010-05-17 22:01:07 UTC

I tried to reproduce today and had a different problem. Here's what happened:

As I left the box last week, I had a degraded array on sdb and a non-RAID volume on sda. I also messed up my initramfs -- it didn't have mdadm. So I booted the F13 beta livecd, and it automatically started dmraid. It also did a swapon on the swap partition on the array (wtf?!?).

Since I'm trying to test mdadm, I did dmraid -an (which failed due to swap), did swapoff, did dmraid -an again, and had to manually dmsetup remove the array's main disk and the extended partition (again, wtf -- I'll eventually file a bug against kpartx or dmraid depending on which one is at fault).

Then I did mdadm -A --scan, which happily started my array.

I mounted stuff, chrooted, and reran dracut. Then I unmounted everything, forgot to mdadm --stop the array, and rebooted (gracefully, from the GNOME menu).

Now I have a non-RAID disk an a "Offline member" shown in IMSM's OROM and my system doesn't recognize the array at all. I can't create an array (no space), delete an array (nothing to delete), and the only option is to reset the "Offline member" to non-RAID.

Now what do I do? I don't have any data on this array, and I need to have this machine working by Thursday, which means that any debugging you want me to do is a lot more likely to happen quickly if it's before Thursday. (On Thursday, or maybe Wednesday, if this thing isn't working, I'll switch to standard software RAID, or maybe leave one imsm volume for testing and use the other by itself to boot from.)

Comment 8 Dan Williams 2010-05-17 22:18:55 UTC

(In reply to comment #7)
> Now I have a non-RAID disk an a "Offline member" shown in IMSM's OROM and my
> system doesn't recognize the array at all.  I can't create an array (no space),
> delete an array (nothing to delete), and the only option is to reset the
> "Offline member" to non-RAID.

Does the status change if you pull out sda and just boot off of sdb?  The "offline" state may persist until you can re-add sda because it sounds like there is now conflicting metadata on the drives and the orom is unable to determine an authoritative answer.

Do you have any details on how this divergence started?

For next steps I would remove sda which hopefully allows you to boot on sdb.  Then mdadm --add /dev/mdX /dev/sdY to start the rebuild(where /dev/mdX is the imsm container device and /dev/sdY is whatever sda becomes after being hotplugged).

Comment 9 Andy Lutomirski 2010-05-17 23:57:53 UTC

I'm pretty sure that sda has no RAID metadata -- I did a mdadm --zero-superblock on it.

The divergence may have started when anaconda told me that my firmware RAID device was inconsistent or inaccessible or something and asked me to reinitialize it, but it kept working for awhile after that.

After a couple of boots (on a working system using md raid), the system (after a clean reboot) decided that *both* members were parts of an offline array. (If you like, I can email you dmraid -n, etc. logs from after that happened.) Following some online advice, I asked OROM to reset metadata on sda and then to use sda to recover the array. At that point, mdadm started crashing.

I then (from the emergency shell, I think) did a mdadm --zero-superblock /dev/sda, which was around the time that mdadm started crashing (except that dracut and/or initscripts started mounting /dev/sdbX instead of the RAID partitions -- see #592059). After working around that bug with a udev script that effectively did a partx -d /dev/sdb on boot (to prevent the wrong thing from mounting), my system was unbootable (obviously).

The current state of affairs (RAID fubarred) started after booting off the F13 beta live cd, which has noiswmd in its command line parameters, do dmraid *could* be the culprit, except that I'm reasonably confident that only md raid was involved in the first explosion.

I'm sure I can fix everything by wiping *both* superblocks, esp. since I don't care about my data, but I'd rather help debug. I can try tomorrow, though.

Comment 10 Andy Lutomirski 2010-05-18 18:31:38 UTC

Created attachment 414935 [details]
Last 32K of RAID element in fubarred state

Current fubarred state looks like this (with your patch).  This is the state in which the OROM can't do anything.

% /mdadm-patched/mdadm --examine /dev/sdb
/dev/sdb:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.1.00
    Orig Family : fe42c1cf
         Family : b89e8c85
     Generation : 000004a5
           UUID : a81543df:035da07e:a333390d:42e54a6d
       Checksum : be5cd83e correct
    MPB Sectors : 1
          Disks : 2
   RAID Devices : 1

  Disk01 Serial : 9VMEMGST
          State : active
             Id : 00010000
    Usable Size : 976768654 (465.76 GiB 500.11 GB)

[Volume0]:
           UUID : fdf41e5b:6e41e174:3ce49e1b:eef5332e
     RAID Level : 1
        Members : 2
          Slots : [_U]
      This Slot : 1
     Array Size : 976766976 (465.76 GiB 500.10 GB)
   Per Dev Size : 976767240 (465.76 GiB 500.10 GB)
  Sector Offset : 0
    Num Stripes : 3815496
     Chunk Size : 64 KiB
       Reserved : 0
  Migrate State : idle
      Map State : degraded
    Dirty State : clean

  Disk00 Serial : 9VMEPHJS:1
          State : failed
             Id : ffffffff
    Usable Size : 976768654 (465.76 GiB 500.11 GB)

% dmraid -n /dev/sdb 
/dev/sdb (isw):
0x000 sig: "  Intel Raid ISM Cfg Sig. 1.1.00"
0x020 check_sum: 3193755710
0x024 mpb_size: 480
0x028 family_num: 3097398405
0x02c generation_num: 1189
0x030 error_log_size: 4080
0x034 attributes: 2147483648
0x038 num_disks: 2
0x039 num_raid_devs: 1
0x03a error_log_pos: 0
0x03c cache_size: 0
0x040 orig_family_num: 4265787855
0x044 power_cycle_count: 0
0x048 bbm_log_size: 0
0x0d8 disk[0].serial: "      9VMEPHJS:1"
0x0e8 disk[0].totalBlocks: 976773168
0x0ec disk[0].scsiId: 0xffffffff
0x0f0 disk[0].status: 0x4
0x0f4 disk[0].owner_cfg_num: 0x0
0x108 disk[1].serial: "        9VMEMGST"
0x118 disk[1].totalBlocks: 976773168
0x11c disk[1].scsiId: 0x10000
0x120 disk[1].status: 0x53a
0x124 disk[1].owner_cfg_num: 0x0
0x138 isw_dev[0].volume: "         Volume0"
0x14c isw_dev[0].SizeHigh: 0
0x148 isw_dev[0].SizeLow: 976766976
0x150 isw_dev[0].status: 0x1c
0x154 isw_dev[0].reserved_blocks: 0
0x158 isw_dev[0].migr_priority: 0
0x159 isw_dev[0].num_sub_vol: 0
0x15a isw_dev[0].tid: 0
0x15b isw_dev[0].cng_master_disk: 0
0x15c isw_dev[0].cache_policy: 0
0x15e isw_dev[0].cng_state: 0
0x15f isw_dev[0].cng_sub_state: 0
0x188 isw_dev[0].vol.curr_migr_unit: 0
0x18c isw_dev[0].vol.check_point_id: 0
0x190 isw_dev[0].vol.migr_state: 0
0x191 isw_dev[0].vol.migr_type: 1
0x192 isw_dev[0].vol.dirty: 0
0x193 isw_dev[0].vol.fs_state: 255
0x194 isw_dev[0].vol.verify_errors: 0
0x196 isw_dev[0].vol.verify_bad_blocks: 0
0x1a8 isw_dev[0].vol.map[0].pba_of_lba0: 0
0x1ac isw_dev[0].vol.map[0].blocks_per_member: 976767240
0x1b0 isw_dev[0].vol.map[0].num_data_stripes: 3815496
0x1b4 isw_dev[0].vol.map[0].blocks_per_strip: 128
0x1b6 isw_dev[0].vol.map[0].map_state: 2
0x1b7 isw_dev[0].vol.map[0].raid_level: 1
0x1b8 isw_dev[0].vol.map[0].num_members: 2
0x1b9 isw_dev[0].vol.map[0].num_domains: 2
0x1ba isw_dev[0].vol.map[0].failed_disk_num: 0
0x1bb isw_dev[0].vol.map[0].ddf: 1
0x1d8 isw_dev[0].vol.map[0].disk_ord_tbl[0]: 0x1000000
0x1dc isw_dev[0].vol.map[0].disk_ord_tbl[1]: 0x1

Comment 11 Dan Williams 2010-05-18 18:52:39 UTC

(In reply to comment #10)
> Created an attachment (id=414935) [details]
> Last 32K of RAID element in fubarred state
> Current fubarred state looks like this (with your patch).  This is the state in
> which the OROM can't do anything.

Is this with or without sda plugged in?  What does (patched) mdadm -E /dev/sda show currently?

In isolation this looks like a valid metadata record.  You should be able to get things back together from this state by either booting with sda removed, or leave it plugged-in, boot from a rescue cd and...

# make sure nothing is active
dmraid -an
mdadm -Ss

# start the container with only the 'good' drive
mdadm -A /dev/md0 /dev/sdb
mdadm -I /dev/md0

# add back in sda to start the rebuild
mdadm --add /dev/md0 /dev/sda

...at this point you can reboot and the orom 'should' be happy.

Comment 12 Doug Ledford 2010-07-20 22:57:37 UTC

This looks more like a fubar array than an mdadm issue.  In addition, it's quite stale.  There will be a new mdadm landing soon (mdadm-3.1.3-0.git07202010.2 or later) and it does have some mdmon related fixes.  I'm going to mark this close as notabug and assume if there was a problem it's fixed already.  If that turns out not to be the case, please reopen the bug.

Note You need to log in before you can comment on or make changes to this bug.