Description of problem: The kernel reliably panics when creating a RAID 5 array using the command: mdadm --create --run r5t1 --level 5 --raid-devices 4 --spare-devices 1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 missing Version-Release number of selected component (if applicable): kernel 0:3.10.0-0.rc0.git24.1.fc20 mdadm-3.2.6-19.fc20.x86_64 (Version of other components can be found here: http://koji.fedoraproject.org/koji/getfile?taskID=5360743&name=root.log ) How reproducible: 100% Steps to Reproduce: 1. Run the test tests/md/test-mdadm.sh in the libguestfs test suite. Additional information: mdadm --create --run r5t1 --level 5 --raid-devices 4 --spare-devices 1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 missing mdadm: Defaulting to version 1.2 metadata [ 185.041140] md: bind<sda2> [ 185.062137] md: bind<sdb2> [ 185.086133] md: bind<sdc2> [ 185.095132] md: bind<sdd2> [ 185.947002] async_tx: api initialized (async) [ 186.013992] xor: measuring software checksum speed [ 186.024991] prefetch64-sse: 960.000 MB/sec [ 186.035989] generic_sse: 964.000 MB/sec [ 186.035989] xor: using function: generic_sse (964.000 MB/sec) [ 186.207963] raid6: sse2x1 195 MB/s [ 186.225960] raid6: sse2x2 417 MB/s [ 186.243957] raid6: sse2x4 277 MB/s [ 186.243957] raid6: using algorithm sse2x2 (417 MB/s) [ 186.245957] raid6: using intx1 recovery algorithm [ 186.610902] md: raid6 personality registered for level 6 [ 186.611901] md: raid5 personality registered for level 5 [ 186.612901] md: raid4 personality registered for level 4 [ 186.670892] md/raid:md125: device sdc2 operational as raid disk 2 [ 186.671892] md/raid:md125: device sdb2 operational as raid disk 1 [ 186.673892] md/raid:md125: device sda2 operational as raid disk 0 [ 186.791874] md/raid:md125: allocated 4356kB [ 186.797873] md/raid:md125: raid level 5 active with 3 out of 4 devices, algorithm 2 [ 186.810871] md125: detected capacity change from 0 to 4718592 [ 186.826869] md: recovery of RAID array md125 [ 186.827869] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 186.828868] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. [ 186.829868] md: using 128k window, over a total of 1536k. mdadm: array /dev/md/r5t1 started. [ 186.878861] ------------[ cut here ]------------ [ 186.878861] kernel BUG at drivers/scsi/scsi_lib.c:1197! [ 186.878861] invalid opcode: 0000 [#1] SMP [ 186.878861] Modules linked in: raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 kvm_amd kvm snd_pcsp snd_pcm snd_page_alloc snd_timer snd soundcore virtio_net virtio_scsi virtio_blk virtio_rng virtio_balloon virtio_mmio sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc32 crc_itu_t libcrc32c [ 186.878861] CPU: 0 PID: 50 Comm: kworker/0:1H Tainted: G W 3.10.0-0.rc0.git24.1.fc20.x86_64 #1 [ 186.878861] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 186.878861] Workqueue: kblockd blk_delay_work [ 186.878861] task: ffff88001a10cdc0 ti: ffff88001a178000 task.ti: ffff88001a178000 [ 186.878861] RIP: 0010:[<ffffffff8149d192>] [<ffffffff8149d192>] scsi_setup_fs_cmnd+0x92/0xa0 [ 186.878861] RSP: 0018:ffff88001a179c18 EFLAGS: 00000046 [ 186.878861] RAX: 0000000000000000 RBX: ffff88001a020000 RCX: 000000000000002f [ 186.878861] RDX: 0000000000000000 RSI: ffff88001a040000 RDI: ffff88001a020000 [ 186.878861] RBP: ffff88001a179c28 R08: 0000000000019000 R09: 0000000000000000 [ 186.878861] R10: 0000000000002018 R11: 00000000000000c8 R12: ffff88001a040000 [ 186.878861] R13: ffff88001a179fd8 R14: ffff88001a020000 R15: ffff88001a086f60 [ 186.878861] FS: 0000000000000000(0000) GS:ffff88001f000000(0000) knlGS:0000000000000000 [ 186.878861] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 186.878861] CR2: 00007fff2ebf6788 CR3: 0000000019daf000 CR4: 00000000000006f0 [ 186.878861] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 186.878861] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 [ 186.878861] Stack: [ 186.878861] ffff88001a040000 ffff88001a028000 ffff88001a179c90 ffffffff814ab2b0 [ 186.878861] 0000000000000001 ffffffff8258a970 0000000000000046 ffffffff000000c8 [ 186.878861] 0000000000002018 0000000000000000 ffff88001a028000 ffff88001a179fd8 [ 186.878861] Call Trace: [ 186.878861] [<ffffffff814ab2b0>] sd_prep_fn+0x340/0xf90 [ 186.878861] [<ffffffff8132ee3d>] blk_peek_request+0x13d/0x300 [ 186.878861] [<ffffffff8149d492>] scsi_request_fn+0x52/0x540 [ 186.878861] [<ffffffff8132af33>] __blk_run_queue+0x33/0x40 [ 186.878861] [<ffffffff8132b125>] blk_delay_work+0x25/0x40 [ 186.878861] [<ffffffff8108d251>] process_one_work+0x211/0x6c0 [ 186.878861] [<ffffffff8108d1e5>] ? process_one_work+0x1a5/0x6c0 [ 186.878861] [<ffffffff8108d81d>] worker_thread+0x11d/0x3a0 [ 186.878861] [<ffffffff8108d700>] ? process_one_work+0x6c0/0x6c0 [ 186.878861] [<ffffffff81095cbd>] kthread+0xed/0x100 [ 186.878861] [<ffffffff81095bd0>] ? insert_kthread_work+0x80/0x80 [ 186.878861] [<ffffffff8172af2c>] ret_from_fork+0x7c/0xb0 [ 186.878861] [<ffffffff81095bd0>] ? insert_kthread_work+0x80/0x80 [ 186.878861] Code: c3 48 8b 00 48 85 c0 74 b7 48 8b 40 48 48 85 c0 74 ae 4c 89 e6 48 89 df ff d0 85 c0 74 a2 eb dc b8 02 00 00 00 0f 1f 40 00 eb d1 <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 [ 186.878861] RIP [<ffffffff8149d192>] scsi_setup_fs_cmnd+0x92/0xa0 [ 186.878861] RSP <ffff88001a179c18> [ 186.878861] ---[ end trace 6ed7a728fbe3e4df ]--- [ 186.878861] BUG: sleeping function called from invalid context at kernel/rwsem.c:20 [ 186.878861] in_atomic(): 1, irqs_disabled(): 1, pid: 50, name: kworker/0:1H [ 186.878861] INFO: lockdep is turned off. [ 186.878861] irq event stamp: 68212 [ 186.878861] hardirqs last enabled at (68211): [<ffffffff817213dc>] _raw_spin_unlock_irq+0x2c/0x50 [ 186.878861] hardirqs last disabled at (68212): [<ffffffff8172123f>] _raw_spin_lock_irq+0x1f/0x90 [ 186.878861] softirqs last enabled at (68198): [<ffffffff81071ae3>] __do_softirq+0x193/0x400 [ 186.878861] softirqs last disabled at (68155): [<ffffffff81071f25>] irq_exit+0xc5/0xd0 [ 186.878861] CPU: 0 PID: 50 Comm: kworker/0:1H Tainted: G D W 3.10.0-0.rc0.git24.1.fc20.x86_64 #1 [ 186.878861] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 186.878861] Workqueue: kblockd blk_delay_work [ 186.878861] ffffffff81a1c635 ffff88001a179918 ffffffff81719788 ffff88001a179940 [ 186.878861] ffffffff810a3309 ffff88001a06d268 ffff88001a06d2c8 ffff88001a10cdc0 [ 186.878861] ffff88001a179968 ffffffff8171e4ba ffff88001a179978 ffff88001a10cdc0 [ 186.878861] Call Trace: [ 186.878861] [<ffffffff81719788>] dump_stack+0x19/0x1b [ 186.878861] [<ffffffff810a3309>] __might_sleep+0x179/0x230 [ 186.878861] [<ffffffff8171e4ba>] down_read+0x2a/0xa0 [ 186.878861] [<ffffffff81081694>] exit_signals+0x24/0x130 [ 186.878861] [<ffffffff8106ddbc>] do_exit+0xbc/0xcc0 [ 186.878861] [<ffffffff8106b0b8>] ? kmsg_dump+0x1b8/0x240 [ 186.878861] [<ffffffff8106af25>] ? kmsg_dump+0x25/0x240 [ 186.878861] [<ffffffff81722ff2>] oops_end+0xa2/0xe0 [ 186.878861] [<ffffffff8101bb4b>] die+0x4b/0x70 [ 186.878861] [<ffffffff81722880>] do_trap+0x60/0x170 [ 186.878861] [<ffffffff81018ff5>] do_invalid_op+0x95/0xb0 [ 186.878861] [<ffffffff8149d192>] ? scsi_setup_fs_cmnd+0x92/0xa0 [ 186.878861] [<ffffffff81326b2e>] ? elv_dispatch_sort+0xce/0xe0 [ 186.878861] [<ffffffff8134bf68>] ? cfq_dispatch_insert+0x158/0x280 [ 186.878861] [<ffffffff813615cd>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 186.878861] [<ffffffff817220e3>] ? restore_args+0x30/0x30 [ 186.878861] [<ffffffff8172c22e>] invalid_op+0x1e/0x30 [ 186.878861] [<ffffffff8149d192>] ? scsi_setup_fs_cmnd+0x92/0xa0 [ 186.878861] [<ffffffff8149d117>] ? scsi_setup_fs_cmnd+0x17/0xa0 [ 186.878861] [<ffffffff814ab2b0>] sd_prep_fn+0x340/0xf90 [ 186.878861] [<ffffffff8132ee3d>] blk_peek_request+0x13d/0x300 [ 186.878861] [<ffffffff8149d492>] scsi_request_fn+0x52/0x540 [ 186.878861] [<ffffffff8132af33>] __blk_run_queue+0x33/0x40 [ 186.878861] [<ffffffff8132b125>] blk_delay_work+0x25/0x40 [ 186.878861] [<ffffffff8108d251>] process_one_work+0x211/0x6c0 [ 186.878861] [<ffffffff8108d1e5>] ? process_one_work+0x1a5/0x6c0 [ 186.878861] [<ffffffff8108d81d>] worker_thread+0x11d/0x3a0 [ 186.878861] [<ffffffff8108d700>] ? process_one_work+0x6c0/0x6c0 [ 186.878861] [<ffffffff81095cbd>] kthread+0xed/0x100 [ 186.878861] [<ffffffff81095bd0>] ? insert_kthread_work+0x80/0x80 [ 186.878861] [<ffffffff8172af2c>] ret_from_fork+0x7c/0xb0 [ 186.878861] [<ffffffff81095bd0>] ? insert_kthread_work+0x80/0x80 [ 186.878861] note: kworker/0:1H[50] exited with preempt_count 1 [ 186.879861] BUG: unable to handle kernel paging request at ffffffffffffff98 [ 186.879861] IP: [<ffffffff81096630>] kthread_data+0x10/0x20 [ 186.879861] PGD 1c0f067 PUD 1c11067 PMD 0 [ 186.879861] Oops: 0000 [#2] SMP [ 186.879861] Modules linked in: raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 kvm_amd kvm snd_pcsp snd_pcm snd_page_alloc snd_timer snd soundcore virtio_net virtio_scsi virtio_blk virtio_rng virtio_balloon virtio_mmio sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc32 crc_itu_t libcrc32c [ 186.879861] CPU: 0 PID: 50 Comm: kworker/0:1H Tainted: G D W 3.10.0-0.rc0.git24.1.fc20.x86_64 #1 [ 186.879861] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 186.879861] task: ffff88001a10cdc0 ti: ffff88001a178000 task.ti: ffff88001a178000 [ 186.879861] RIP: 0010:[<ffffffff81096630>] [<ffffffff81096630>] kthread_data+0x10/0x20 [ 186.879861] RSP: 0018:ffff88001a179900 EFLAGS: 00000002 [ 186.879861] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000000f [ 186.879861] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffff88001a10cdc0 [ 186.879861] RBP: ffff88001a179900 R08: ffff88001a10ce30 R09: 0000002b82e86108 [ 186.879861] R10: 00000000ffffffff R11: 0000000000000000 R12: ffff88001f1d50c0 [ 186.879861] R13: 0000000000000000 R14: ffff88001a1794f8 R15: 0000000000000000 [ 186.879861] FS: 0000000000000000(0000) GS:ffff88001f000000(0000) knlGS:0000000000000000 [ 186.879861] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 186.879861] CR2: 0000000000000028 CR3: 0000000019daf000 CR4: 00000000000006f0 [ 186.879861] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 186.879861] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 [ 186.879861] Stack: [ 186.879861] ffff88001a179918 ffffffff8108e751 ffff88001a10cdc0 ffff88001a179978 [ 186.879861] ffffffff8171f0ba 00000000001d50c0 ffff88001a179fd8 ffff88001a179fd8 [ 186.879861] 00000000001d50c0 ffff88001a10cdc0 ffff88001a10d5a8 ffff88001a1794f8 [ 186.879861] Call Trace: [ 186.879861] [<ffffffff8108e751>] wq_worker_sleeping+0x11/0x90 [ 186.879861] [<ffffffff8171f0ba>] __schedule+0x76a/0xa20 [ 186.879861] [<ffffffff8171f399>] schedule+0x29/0x70 [ 186.879861] [<ffffffff8106e498>] do_exit+0x798/0xcc0 [ 186.879861] [<ffffffff81722ff2>] oops_end+0xa2/0xe0 [ 186.879861] [<ffffffff8101bb4b>] die+0x4b/0x70 [ 186.879861] [<ffffffff81722880>] do_trap+0x60/0x170 [ 186.879861] [<ffffffff81018ff5>] do_invalid_op+0x95/0xb0 [ 186.879861] [<ffffffff8149d192>] ? scsi_setup_fs_cmnd+0x92/0xa0 [ 186.879861] [<ffffffff81326b2e>] ? elv_dispatch_sort+0xce/0xe0 [ 186.879861] [<ffffffff8134bf68>] ? cfq_dispatch_insert+0x158/0x280 [ 186.879861] [<ffffffff813615cd>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 186.879861] [<ffffffff817220e3>] ? restore_args+0x30/0x30 [ 186.879861] [<ffffffff8172c22e>] invalid_op+0x1e/0x30 [ 186.879861] [<ffffffff8149d192>] ? scsi_setup_fs_cmnd+0x92/0xa0 [ 186.879861] [<ffffffff8149d117>] ? scsi_setup_fs_cmnd+0x17/0xa0 [ 186.879861] [<ffffffff814ab2b0>] sd_prep_fn+0x340/0xf90 [ 186.879861] [<ffffffff8132ee3d>] blk_peek_request+0x13d/0x300 [ 186.879861] [<ffffffff8149d492>] scsi_request_fn+0x52/0x540 [ 186.879861] [<ffffffff8132af33>] __blk_run_queue+0x33/0x40 [ 186.879861] [<ffffffff8132b125>] blk_delay_work+0x25/0x40 [ 186.879861] [<ffffffff8108d251>] process_one_work+0x211/0x6c0 [ 186.879861] [<ffffffff8108d1e5>] ? process_one_work+0x1a5/0x6c0 [ 186.879861] [<ffffffff8108d81d>] worker_thread+0x11d/0x3a0 [ 186.879861] [<ffffffff8108d700>] ? process_one_work+0x6c0/0x6c0 [ 186.879861] [<ffffffff81095cbd>] kthread+0xed/0x100 [ 186.879861] [<ffffffff81095bd0>] ? insert_kthread_work+0x80/0x80 [ 186.879861] [<ffffffff8172af2c>] ret_from_fork+0x7c/0xb0 [ 186.879861] [<ffffffff81095bd0>] ? insert_kthread_work+0x80/0x80 [ 186.879861] Code: 00 48 89 e5 5d 48 8b 40 88 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8b 87 70 05 00 00 55 48 89 e5 <48> 8b 40 98 5d c3 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 [ 186.879861] RIP [<ffffffff81096630>] kthread_data+0x10/0x20 [ 186.879861] RSP <ffff88001a179900> [ 186.879861] CR2: ffffffffffffff98 [ 186.879861] ---[ end trace 6ed7a728fbe3e4e0 ]--- [ 186.879861] Fixing recursive fault but reboot is needed!
CC-ing Paolo since the stack trace indicates that virtio-scsi might be involved in this.
Could you please try this against real physical disks and see if it does the same thing?
I don't want to disappoint, so basically no. I don't have a physical machine that runs Rawhide. Any test by necessity would be running in a VM, hence using virtio at some point.
No, this is not virtio-scsi (doesn't seem like, at least), and yes, you should test using another driver. megasas can work or, with QEMU 1.5, pvscsi too. megasas is supported by libvirt (it calls it lsisas1078).
(In reply to comment #4) > No, this is not virtio-scsi (doesn't seem like, at least), and yes, you > should test using another driver. megasas can work or, with QEMU 1.5, > pvscsi too. megasas is supported by libvirt (it calls it lsisas1078). I'll see if I can work out the libvirt voodoo for this tomorrow.
Created attachment 747775 [details] Complete messages (using lsisas1078 driver) I tried lsisas1078 as suggested by Paolo Bonzini, and to the untrained eye it appears to be failing in exactly the same way. I attached the complete libguestfs debug + kernel messages in case they are helpful.
By the way the actual error is: /* * Filesystem requests must transfer data. */ BUG_ON(!req->nr_phys_segments);
This is on the way out to QEMU, so it seems likely to be a kernel problem rather than a virt problem. Can you try with an external USB disk and USB passthrough? That would be the smoking gun that it is a kernel problem. :)
This seems to have "fixed itself" with the new kernel (3.10.0-0.rc2.git0.3.fc20). I'm just running a build to confirm this.
Dammit unfortunately it's not fixed. It has, however, changed so that instead of just hanging the kernel, it now panics allowing us to catch the error and report it. The stack trace is the same as before (http://kojipkgs.fedoraproject.org//work/tasks/8778/5408778/build.log)
Created attachment 752953 [details] BUG of existing RAID5 Attached is a very similar BUG from an existing level 5 array with physical disks. The bug happened when trying to read a file from it. This is from vanilla 3.10-rc2 on an ArchLinux host, for whatever it's worth.. could underline the "it's a kernel bug" assumption. The box also has a level 1 array - no issues with that one.
Tom, Your bug report does not seem to resemble the previous bug report at all, there are traces of plugging in your output and also drive errors coming from the ATA driver: 2013-05-24T17:59:42+02:00 sirius kernel: ata7: lost interrupt (Status 0x58) 2013-05-24T17:59:42+02:00 sirius kernel: ata7: drained 65536 bytes to clear DRQ 2013-05-24T17:59:42+02:00 sirius kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen 2013-05-24T17:59:42+02:00 sirius kernel: sr 6:0:0:0: CDB: 2013-05-24T17:59:42+02:00 sirius kernel: cdb[0]=0x0: 00 00 00 00 00 00 2013-05-24T17:59:42+02:00 sirius kernel: ata7.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 2013-05-24T17:59:42+02:00 sirius kernel: res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout) 2013-05-24T17:59:42+02:00 sirius kernel: ata7.00: status: { DRDY } 2013-05-24T17:59:42+02:00 sirius kernel: ata7: soft resetting link 2013-05-24T17:59:42+02:00 sirius kernel: ata7.00: configured for UDMA/33 2013-05-24T17:59:42+02:00 sirius kernel: ata7: EH complete What driver are you running? Jes
Jes, the "very similar" part is just the first trace (cut here.. end trace), indeed. The ATA errors could be caused by me unplugging a USB connected mobile while the box was slowly locking up, so please ignore those. IIRC they didn't happen when I tried vanilla -rc1. Anyways, the driver is PATA_ATIIXP. The mdraid error was almost the same on -rc1, beside the fact that -rc1 went to death immediately, while with -rc2 it took 2-3 minutes until the system got completely unresponsive. The disks are good, and back on 2.9.1 mdraid is working fine. If this is fully unrelated, I'm sorry. This bug is what I found crawling google and lkml about the bug headline "drivers/scsi/scsi_lib.c:1196"
Tom, The ATA errors almost certainly came from the harddrives, not the USB mobile. A USB disk would not normally show up with the name 'ata<X>'. A USB disk will normally show up as 'sd<Y>'. It is certainly possible that the problem is due to interrupts not getting through to the ATA driver, or that the newer kernel introduced a bug in said driver. Jes
This thread appeared a couple of weeks ago on LKML. There is a suggested patch in the third comment. http://comments.gmane.org/gmane.linux.kernel/1492771
Here is a similar-but-a-bit-different patch claiming to fix the same bug: http://permalink.gmane.org/gmane.linux.raid/42949
Neither of those is upstream in Linus's tree.
I tested http://thread.gmane.org/gmane.linux.kernel/1492771/focus=42953 on the basis that it was the simpler of the two patches. Here is my scratch-build: http://koji.fedoraproject.org/koji/taskinfo?taskID=5437990 This fixes the problem. My test which previously failed now works. I asked on LKML for this patch to be included.
Fix is now in Linus' tree: commit 4997b72ee62930cb841d185398ea547d979789f4 Author: Kent Overstreet <koverstreet> Date: Thu May 30 08:44:39 2013 +0200 raid5: Initialize bi_vcnt The patch that converted raid5 to use bio_reset() forgot to initialize bi_vcnt. Signed-off-by: Kent Overstreet <koverstreet> Cc: NeilBrown <neilb> Cc: linux-raid.org Tested-by: Ilia Mirkin <imirkin.edu> Signed-off-by: Jens Axboe <axboe> Should show up in the next resync I presume.
fixed in rc6 builds and newer.