973643 – kernel-3.10.0-0.rc4.git0.2.fc20.x86_64: mdraid not working

Bug 973643 - kernel-3.10.0-0.rc4.git0.2.fc20.x86_64: mdraid not working

Summary: kernel-3.10.0-0.rc4.git0.2.fc20.x86_64: mdraid not working

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Jes Sorensen
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-12 12:11 UTC by Harald Hoyer
Modified:	2013-08-14 11:34 UTC (History)
CC List:	5 users (show)
Fixed In Version:	kernel-3.10.0-0.rc6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-08-14 11:34:42 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
kernel crash after md raid activation (7.22 KB, text/plain) 2013-06-12 12:11 UTC, Harald Hoyer	no flags	Details
View All

Description Harald Hoyer 2013-06-12 12:11:56 UTC

Created attachment 760118 [details]
kernel crash after md raid activation

# mdadm --create /dev/md0 --run --auto=yes --level=5 --raid-devices=3 /dev/sda2 /dev/sda3 /dev/sda4

[    1.020928] md0: WARNING: sda3 appears to be on the same physical disk as sda4.
[    1.021835] md0: WARNING: sda3 appears to be on the same physical disk as sda2.
[    1.022635] md0: WARNING: sda2 appears to be on the same physical disk as sda4.
[    1.023414] True protection against single-disk failure might be compromised.
[    1.024195] md/raid:md0: device sda3 operational as raid disk 1
[    1.024823] md/raid:md0: device sda2 operational as raid disk 0
[    1.025719] md/raid:md0: allocated 3282kB
[    1.026241] md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2
[    1.027080] md0: detected capacity change from 0 to 25165824
[    1.029317] md: recovery of RAID array md0
[    1.030412] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[    1.031915] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[    1.034298] md: using 128k window, over a total of 12288k.

Then the kernel crashes with the attached dump. Note, that "raid level 5 active with 2 out of 3 devices" although 3 devices were given.

Doing the same with loop devices gives:

# dd if=/dev/zero of=sda1 bs=1024 count=$((40*1024))  
# dd if=/dev/zero of=sda2 bs=1024 count=$((40*1024))  
# dd if=/dev/zero of=sda3 bs=1024 count=$((40*1024))  
# losetup -f /sda1
# losetup -f /sda2
# losetup -f /sda3
# mdadm --create /dev/md0 --run --auto=yes --level=5 --raid-devices=3 /dev/loop0 /dev/loop1 /dev/loop2
mdadm: Defaulting to version 1.2 metadata
[   45.320350] md: bind<loop0>
[   45.321234] md: bind<loop1>
[   45.321900] md: bind<loop2>
[   45.342068] raid6: sse2x1    7554 MB/s
[   45.359058] raid6: sse2x2    8800 MB/s
[   45.376028] raid6: sse2x4   11363 MB/s
[   45.376534] raid6: using algorithm sse2x4 (11363 MB/s)
[   45.377131] raid6: using intx1 recovery algorithm
[   45.378707] async_tx: api initialized (async)
[   45.380277] xor: measuring software checksum speed
[   45.390047]    prefetch64-sse:   216.000 MB/sec
[   45.400026]    generic_sse:   220.000 MB/sec
[   45.400594] xor: using function: generic_sse (220.000 MB/sec)
[   45.407082] md: raid6 personality registered for level 6
[   45.407756] md: raid5 personality registered for level 5
[   45.408371] md: raid4 personality registered for level 4
[   45.409366] md/raid:md0: device loop1 operational as raid disk 1
[   45.410080] md/raid:md0: device loop0 operational as raid disk 0
[   45.411270] md/raid:md0: allocated 3282kB
[   45.411879] md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2
[   45.412919] md0: detected capacity change from 0 to 82837504
mdadm: array /dev/md0 started.
initqueue:/# [   45.414517] md: recovery of RAID array md0
[   45.414924] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[   45.415580] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[   45.417027] md: using 128k window, over a total of 40448k.
[   45.419806]  md0: unknown partition table
[   45.833435] md: md0: recovery done.

# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 loop2[3] loop1[1] loop0[0]
      80896 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      
unused devices: <none>

Note: "loop2[3]" and also "raid level 5 active with 2 out of 3 devices"


Kernel < 3.10 works fine.

Comment 1 Jes Sorensen 2013-08-14 11:34:42 UTC

This should be fixed in kernel-3.10.0-0.rc6 by the inclusion of the
following patch:

commit 4997b72ee62930cb841d185398ea547d979789f4
Author: Kent Overstreet <koverstreet>
Date:   Thu May 30 08:44:39 2013 +0200

    raid5: Initialize bi_vcnt
    
    The patch that converted raid5 to use bio_reset() forgot to initialize
    bi_vcnt.
    
    Signed-off-by: Kent Overstreet <koverstreet>
    Cc: NeilBrown <neilb>
    Cc: linux-raid.org
    Tested-by: Ilia Mirkin <imirkin.edu>
    Signed-off-by: Jens Axboe <axboe>

I cannot reproduce the problem with kernel-3.10.0-1.fc20. If you see this again,
please open up a free bugzilla and drop it on me.

Thanks,
Jes

Note You need to log in before you can comment on or make changes to this bug.