Created attachment 760118 [details] kernel crash after md raid activation # mdadm --create /dev/md0 --run --auto=yes --level=5 --raid-devices=3 /dev/sda2 /dev/sda3 /dev/sda4 [ 1.020928] md0: WARNING: sda3 appears to be on the same physical disk as sda4. [ 1.021835] md0: WARNING: sda3 appears to be on the same physical disk as sda2. [ 1.022635] md0: WARNING: sda2 appears to be on the same physical disk as sda4. [ 1.023414] True protection against single-disk failure might be compromised. [ 1.024195] md/raid:md0: device sda3 operational as raid disk 1 [ 1.024823] md/raid:md0: device sda2 operational as raid disk 0 [ 1.025719] md/raid:md0: allocated 3282kB [ 1.026241] md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2 [ 1.027080] md0: detected capacity change from 0 to 25165824 [ 1.029317] md: recovery of RAID array md0 [ 1.030412] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 1.031915] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. [ 1.034298] md: using 128k window, over a total of 12288k. Then the kernel crashes with the attached dump. Note, that "raid level 5 active with 2 out of 3 devices" although 3 devices were given. Doing the same with loop devices gives: # dd if=/dev/zero of=sda1 bs=1024 count=$((40*1024)) # dd if=/dev/zero of=sda2 bs=1024 count=$((40*1024)) # dd if=/dev/zero of=sda3 bs=1024 count=$((40*1024)) # losetup -f /sda1 # losetup -f /sda2 # losetup -f /sda3 # mdadm --create /dev/md0 --run --auto=yes --level=5 --raid-devices=3 /dev/loop0 /dev/loop1 /dev/loop2 mdadm: Defaulting to version 1.2 metadata [ 45.320350] md: bind<loop0> [ 45.321234] md: bind<loop1> [ 45.321900] md: bind<loop2> [ 45.342068] raid6: sse2x1 7554 MB/s [ 45.359058] raid6: sse2x2 8800 MB/s [ 45.376028] raid6: sse2x4 11363 MB/s [ 45.376534] raid6: using algorithm sse2x4 (11363 MB/s) [ 45.377131] raid6: using intx1 recovery algorithm [ 45.378707] async_tx: api initialized (async) [ 45.380277] xor: measuring software checksum speed [ 45.390047] prefetch64-sse: 216.000 MB/sec [ 45.400026] generic_sse: 220.000 MB/sec [ 45.400594] xor: using function: generic_sse (220.000 MB/sec) [ 45.407082] md: raid6 personality registered for level 6 [ 45.407756] md: raid5 personality registered for level 5 [ 45.408371] md: raid4 personality registered for level 4 [ 45.409366] md/raid:md0: device loop1 operational as raid disk 1 [ 45.410080] md/raid:md0: device loop0 operational as raid disk 0 [ 45.411270] md/raid:md0: allocated 3282kB [ 45.411879] md/raid:md0: raid level 5 active with 2 out of 3 devices, algorithm 2 [ 45.412919] md0: detected capacity change from 0 to 82837504 mdadm: array /dev/md0 started. initqueue:/# [ 45.414517] md: recovery of RAID array md0 [ 45.414924] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 45.415580] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. [ 45.417027] md: using 128k window, over a total of 40448k. [ 45.419806] md0: unknown partition table [ 45.833435] md: md0: recovery done. # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 loop2[3] loop1[1] loop0[0] 80896 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> Note: "loop2[3]" and also "raid level 5 active with 2 out of 3 devices" Kernel < 3.10 works fine.
This should be fixed in kernel-3.10.0-0.rc6 by the inclusion of the following patch: commit 4997b72ee62930cb841d185398ea547d979789f4 Author: Kent Overstreet <koverstreet> Date: Thu May 30 08:44:39 2013 +0200 raid5: Initialize bi_vcnt The patch that converted raid5 to use bio_reset() forgot to initialize bi_vcnt. Signed-off-by: Kent Overstreet <koverstreet> Cc: NeilBrown <neilb> Cc: linux-raid.org Tested-by: Ilia Mirkin <imirkin.edu> Signed-off-by: Jens Axboe <axboe> I cannot reproduce the problem with kernel-3.10.0-1.fc20. If you see this again, please open up a free bugzilla and drop it on me. Thanks, Jes