Description of problem: mke2fs -t ext2 -F -b 4096 /dev/VG/LV1 mke2fs 1.42.9 (28-Dec-2013) [ 44.142483] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [ 44.142483] IP: [<ffffffff8122040a>] bio_trim+0x1a/0x40 [ 44.142483] PGD 1d193067 PUD 1d1c1067 PMD 0 [ 44.142483] Oops: 0000 [#1] SMP [ 44.142483] Modules linked in: raid1 kvm_amd snd_pcsp snd_pcm kvm snd_timer snd soundcore serio_raw ata_generic pata_acpi virtio_balloon virtio_pci virtio_mmio virtio_net virtio_scsi virtio_blk virtio_console virtio_rng virtio_ring virtio ideapad_laptop sparse_keymap rfkill sym53c8xx scsi_transport_spi crc8 crc_ccitt crc32 crc_itu_t libcrc32c megaraid megaraid_sas megaraid_mbox megaraid_mm [ 44.142483] CPU: 0 PID: 229 Comm: mke2fs Tainted: G W 3.14.0-0.rc1.git0.1.fc21.x86_64 #1 [ 44.142483] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 44.142483] task: ffff88001c100000 ti: ffff88001c0e4000 task.ti: ffff88001c0e4000 [ 44.142483] RIP: 0010:[<ffffffff8122040a>] [<ffffffff8122040a>] bio_trim+0x1a/0x40 [ 44.142483] RSP: 0018:ffff88001c0e5b88 EFLAGS: 00000246 [ 44.142483] RAX: ffff88001d13f020 RBX: 0000000000000000 RCX: 000000000000b690 [ 44.142483] RDX: 0000000000008000 RSI: 0000000000000000 RDI: 0000000000000000 [ 44.142483] RBP: ffff88001c0e5b98 R08: 00000000000174a0 R09: ffff88001f0174a0 [ 44.142483] R10: 0000000000000000 R11: ffffea0000744fc0 R12: 0000000001000000 [ 44.142483] R13: 0000000000000000 R14: ffff88001c0bfe80 R15: ffff88001d16df00 [ 44.142483] FS: 00007fe89c7817c0(0000) GS:ffff88001f000000(0000) knlGS:0000000000000000 [ 44.142483] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 44.142483] CR2: 0000000000000028 CR3: 000000001c0e7000 CR4: 00000000000006f0 [ 44.142483] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 44.142483] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 [ 44.142483] Stack: [ 44.142483] 0000000000000001 0000000000000000 ffff88001c0e5c80 ffffffffa01923f3 [ 44.142483] ffff88001c0e5c50 ffffc90000125040 0000000000008000 ffff88001d16df60 [ 44.142483] 0000000000003000 ffff88001c0e5c18 ffffffff00008000 0000000000000001 [ 44.142483] Call Trace: [ 44.142483] [<ffffffffa01923f3>] make_request+0x4c3/0xcd0 [raid1] [ 44.142483] [<ffffffff810c8ec6>] ? check_preempt_wakeup+0x166/0x250 [ 44.142483] [<ffffffff81555e85>] md_make_request+0xe5/0x230 [ 44.142483] [<ffffffff81326c20>] generic_make_request+0xe0/0x130 [ 44.142483] [<ffffffff81326ce8>] submit_bio+0x78/0x160 [ 44.142483] [<ffffffff81220bfe>] ? bio_alloc_bioset+0x1ce/0x2f0 [ 44.142483] [<ffffffff811fcc73>] ? pollwake+0x73/0x90 [ 44.142483] [<ffffffff8133243b>] blkdev_issue_discard+0x1fb/0x2c0 [ 44.142483] [<ffffffff81336da5>] blkdev_ioctl+0x635/0x7d0 [ 44.142483] [<ffffffff811e83a7>] ? do_sync_write+0x67/0xa0 [ 44.142483] [<ffffffff81222d11>] block_ioctl+0x41/0x50 [ 44.142483] [<ffffffff811fbf90>] do_vfs_ioctl+0x2e0/0x4a0 [ 44.142483] [<ffffffff811fc1f1>] SyS_ioctl+0xa1/0xc0 [ 44.142483] [<ffffffff816fbbe9>] system_call_fastpath+0x16/0x1b [ 44.142483] Code: 01 e9 75 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 54 41 89 d4 41 c1 e4 09 85 f6 53 48 89 fb 75 06 <44> 3b 67 28 74 14 3e 80 63 10 f7 c1 e6 09 48 89 df e8 f0 fe ff [ 44.142483] RIP [<ffffffff8122040a>] bio_trim+0x1a/0x40 [ 44.142483] RSP <ffff88001c0e5b88> [ 44.142483] CR2: 0000000000000028 [ 44.144483] ---[ end trace f318ded04f590341 ]--- guestfsd: error: ext2: /dev/VG/LV1: mke2fs 1.42.9 (28-Dec-2013) libguestfs: trace: mkfs = -1 (error) Version-Release number of selected component (if applicable): kernel 0:3.14.0-0.rc1.git0.1.fc21 e2fsprogs-1.42.9-2.fc21.x86_64 How reproducible: Unknown, at least once. Steps to Reproduce: 1. Run the libguestfs test suite in Rawhide. Additional info: http://kojipkgs.fedoraproject.org//work/tasks/9085/6489085/build.log http://kojipkgs.fedoraproject.org//work/tasks/9085/6489085/root.log
I should note: This is running under virtualization. I don't have an easy means to test this on baremetal, so don't ask me to do that. The backing disk is virtio-scsi. It was all working fine about 2 weeks ago.
Heh, userspace doing I/O should never cause a kernel bug. This is a kernel bug, not e2fsprogs. Looks like possibly a problem in dm discard handling.
I was pointed to this patch, and tested it, but it did *NOT* fix this bug. https://lkml.org/lkml/2014/2/4/107
Can you recreate this with no previous kernel oopses/warnings present? Likely so, but we'd like to make sure something else didn't mess up kernel memory and your oops has the 'W' taint set already.
Created attachment 860554 [details] log file The shortest reproducer I can come up with (using guestfish) is: guestfish -xv -N part -N part \ md-create test "/dev/sda1 /dev/sdb1" : \ pvcreate /dev/md/test : \ vgcreate VG /dev/md/test : \ lvcreate LV VG 32 : \ mkfs ext4 /dev/VG/LV The full output (including the actual commands being run by guestfsd) is attached. Unfortunately there is an earlier problem (in kvm_amd module). This is automatically loaded because I'm running this under TCG so the guest thinks that nested (AMD) virt is available. Not sure how to get rid of this.
I renamed the kvm-amd.ko file so it wouldn't get loaded. The mkfs bug reported here still occurs.
Given md_make_request in the stack trace, this looks like an MD bug, not DM. Reassigning to Jes.
Created attachment 860894 [details] log file (md only case) (In reply to Mike Snitzer from comment #7) > Given md_make_request in the stack trace, this looks like an MD bug, not DM. You are correct. In fact the problem happens with a pure MD device, as in this test case: guestfish -xv -N part -N part \ md-create test "/dev/sda1 /dev/sdb1" : \ mkfs ext4 /dev/md/test The full output from this test is attached.
Could you please provide the actually run creating the device and /proc/mdstat output. It would be interesting to know whether this happens on non virtio-scsi. I don't have an easy way to test this, so please don't me expect to. Jes
(In reply to Jes Sorensen from comment #9) > Could you please provide the actually run creating the device and > /proc/mdstat > output. > > It would be interesting to know whether this happens on non virtio-scsi. > > I don't have an easy way to test this, so please don't me expect to. ...test virtio-scsi that is.
(In reply to Jes Sorensen from comment #9) > Could you please provide the actually run creating the device It's in the output attached above, but in brief the commands run are: mdadm --create --run test --level raid1 --raid-devices 2 /dev/sda1 /dev/sdb1 wipefs -a --force /dev/md/test mke2fs -t ext4 -F /dev/md/test The mke2fs command is the one which fails. > and /proc/mdstat output. The /proc/mdstat after creation of the MD device but before running mke2fs is: Personalities : [raid1] md127 : active raid1 sdb1[1] sda1[0] 102144 blocks super 1.2 [2/2] [UU] [==>..................] resync = 14.5% (14848/102144) finish=0.0min speed=14848K/sec unused devices: <none> I guess the resync does not complete before the mke2fs runs, because the commands are run in series as fast as possible. > It would be interesting to know whether this happens on non virtio-scsi. The following script uses [QMU-emulated] IDE, and it also fails in the same way, so it seems to have nothing to do with virtio-scsi. -------------------------------------------- #!/bin/bash - export LIBGUESTFS_BACKEND=direct rm /tmp/test1.img /tmp/test2.img truncate -s 100M /tmp/test1.img truncate -s 100M /tmp/test2.img guestfish -xv <<EOF add-drive-opts /tmp/test1.img iface:ide add-drive-opts /tmp/test2.img iface:ide run part-disk /dev/sda mbr part-disk /dev/sdb mbr md-create test "/dev/sda1 /dev/sdb1" mkfs ext4 /dev/md/test EOF
Kent Overstreet posted a patch which fixes the problem for me. https://lkml.org/lkml/2014/2/10/809 [PATCH] block: Fix cloning of discard/write same bios
This should be fixed with the rc2-git4 kernel that will be built today.