Bug 973643 - kernel-3.10.0-0.rc4.git0.2.fc20.x86_64: mdraid not working
kernel-3.10.0-0.rc4.git0.2.fc20.x86_64: mdraid not working
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Jes Sorensen
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-12 08:11 EDT by Harald Hoyer
Modified: 2013-08-14 07:34 EDT (History)
5 users (show)

See Also:
Fixed In Version: kernel-3.10.0-0.rc6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-14 07:34:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
kernel crash after md raid activation (7.22 KB, text/plain)
2013-06-12 08:11 EDT, Harald Hoyer
no flags Details

  None (edit)
Description Harald Hoyer 2013-06-12 08:11:56 EDT
Created attachment 760118 [details]
kernel crash after md raid activation

# mdadm --create /dev/md0 --run --auto=yes --level=5 --raid-devices=3 /dev/sda2 /dev/sda3 /dev/sda4

[    1.020928] md0: WARNING: sda3 appears to be on the same physical disk as sda4.
[    1.021835] md0: WARNING: sda3 appears to be on the same physical disk as sda2.
[    1.022635] md0: WARNING: sda2 appears to be on the same physical disk as sda4.
[    1.023414] True protection against single-disk failure might be compromised.
[    1.024195] md/raid:md0: device sda3 operational as raid disk 1
[    1.024823] md/raid:md0: device sda2 operational as raid disk 0
[    1.025719] md/raid:md0: allocated 3282kB
[    1.026241] md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2
[    1.027080] md0: detected capacity change from 0 to 25165824
[    1.029317] md: recovery of RAID array md0
[    1.030412] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[    1.031915] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[    1.034298] md: using 128k window, over a total of 12288k.

Then the kernel crashes with the attached dump. Note, that "raid level 5 active with 2 out of 3 devices" although 3 devices were given.

Doing the same with loop devices gives:

# dd if=/dev/zero of=sda1 bs=1024 count=$((40*1024))  
# dd if=/dev/zero of=sda2 bs=1024 count=$((40*1024))  
# dd if=/dev/zero of=sda3 bs=1024 count=$((40*1024))  
# losetup -f /sda1
# losetup -f /sda2
# losetup -f /sda3
# mdadm --create /dev/md0 --run --auto=yes --level=5 --raid-devices=3 /dev/loop0 /dev/loop1 /dev/loop2
mdadm: Defaulting to version 1.2 metadata
[   45.320350] md: bind<loop0>
[   45.321234] md: bind<loop1>
[   45.321900] md: bind<loop2>
[   45.342068] raid6: sse2x1    7554 MB/s
[   45.359058] raid6: sse2x2    8800 MB/s
[   45.376028] raid6: sse2x4   11363 MB/s
[   45.376534] raid6: using algorithm sse2x4 (11363 MB/s)
[   45.377131] raid6: using intx1 recovery algorithm
[   45.378707] async_tx: api initialized (async)
[   45.380277] xor: measuring software checksum speed
[   45.390047]    prefetch64-sse:   216.000 MB/sec
[   45.400026]    generic_sse:   220.000 MB/sec
[   45.400594] xor: using function: generic_sse (220.000 MB/sec)
[   45.407082] md: raid6 personality registered for level 6
[   45.407756] md: raid5 personality registered for level 5
[   45.408371] md: raid4 personality registered for level 4
[   45.409366] md/raid:md0: device loop1 operational as raid disk 1
[   45.410080] md/raid:md0: device loop0 operational as raid disk 0
[   45.411270] md/raid:md0: allocated 3282kB
[   45.411879] md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2
[   45.412919] md0: detected capacity change from 0 to 82837504
mdadm: array /dev/md0 started.
initqueue:/# [   45.414517] md: recovery of RAID array md0
[   45.414924] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[   45.415580] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[   45.417027] md: using 128k window, over a total of 40448k.
[   45.419806]  md0: unknown partition table
[   45.833435] md: md0: recovery done.

# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 loop2[3] loop1[1] loop0[0]
      80896 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      
unused devices: <none>

Note: "loop2[3]" and also "raid level 5 active with 2 out of 3 devices"


Kernel < 3.10 works fine.
Comment 1 Jes Sorensen 2013-08-14 07:34:42 EDT
This should be fixed in kernel-3.10.0-0.rc6 by the inclusion of the
following patch:

commit 4997b72ee62930cb841d185398ea547d979789f4
Author: Kent Overstreet <koverstreet@google.com>
Date:   Thu May 30 08:44:39 2013 +0200

    raid5: Initialize bi_vcnt
    
    The patch that converted raid5 to use bio_reset() forgot to initialize
    bi_vcnt.
    
    Signed-off-by: Kent Overstreet <koverstreet@google.com>
    Cc: NeilBrown <neilb@suse.de>
    Cc: linux-raid@vger.kernel.org
    Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

I cannot reproduce the problem with kernel-3.10.0-1.fc20. If you see this again,
please open up a free bugzilla and drop it on me.

Thanks,
Jes

Note You need to log in before you can comment on or make changes to this bug.