Bug 1061339
Summary: | NULL pointer dereference when TRIM is issued on MD device | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> | ||||||
Component: | kernel | Assignee: | Jes Sorensen <Jes.Sorensen> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rawhide | CC: | esandeen, gansalmon, itamar, jonathan, josef, kernel-maint, kzak, madhu.chinakonda, msnitzer, oliver | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-02-14 19:11:11 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Richard W.M. Jones
2014-02-04 16:05:31 UTC
I should note: This is running under virtualization. I don't have an easy means to test this on baremetal, so don't ask me to do that. The backing disk is virtio-scsi. It was all working fine about 2 weeks ago. Heh, userspace doing I/O should never cause a kernel bug. This is a kernel bug, not e2fsprogs. Looks like possibly a problem in dm discard handling. I was pointed to this patch, and tested it, but it did *NOT* fix this bug. https://lkml.org/lkml/2014/2/4/107 Can you recreate this with no previous kernel oopses/warnings present? Likely so, but we'd like to make sure something else didn't mess up kernel memory and your oops has the 'W' taint set already. Created attachment 860554 [details]
log file
The shortest reproducer I can come up with (using guestfish) is:
guestfish -xv -N part -N part \
md-create test "/dev/sda1 /dev/sdb1" : \
pvcreate /dev/md/test : \
vgcreate VG /dev/md/test : \
lvcreate LV VG 32 : \
mkfs ext4 /dev/VG/LV
The full output (including the actual commands being run by
guestfsd) is attached.
Unfortunately there is an earlier problem (in kvm_amd module).
This is automatically loaded because I'm running this under TCG
so the guest thinks that nested (AMD) virt is available. Not sure
how to get rid of this.
I renamed the kvm-amd.ko file so it wouldn't get loaded. The mkfs bug reported here still occurs. Given md_make_request in the stack trace, this looks like an MD bug, not DM. Reassigning to Jes. Created attachment 860894 [details] log file (md only case) (In reply to Mike Snitzer from comment #7) > Given md_make_request in the stack trace, this looks like an MD bug, not DM. You are correct. In fact the problem happens with a pure MD device, as in this test case: guestfish -xv -N part -N part \ md-create test "/dev/sda1 /dev/sdb1" : \ mkfs ext4 /dev/md/test The full output from this test is attached. Could you please provide the actually run creating the device and /proc/mdstat output. It would be interesting to know whether this happens on non virtio-scsi. I don't have an easy way to test this, so please don't me expect to. Jes (In reply to Jes Sorensen from comment #9) > Could you please provide the actually run creating the device and > /proc/mdstat > output. > > It would be interesting to know whether this happens on non virtio-scsi. > > I don't have an easy way to test this, so please don't me expect to. ...test virtio-scsi that is. (In reply to Jes Sorensen from comment #9) > Could you please provide the actually run creating the device It's in the output attached above, but in brief the commands run are: mdadm --create --run test --level raid1 --raid-devices 2 /dev/sda1 /dev/sdb1 wipefs -a --force /dev/md/test mke2fs -t ext4 -F /dev/md/test The mke2fs command is the one which fails. > and /proc/mdstat output. The /proc/mdstat after creation of the MD device but before running mke2fs is: Personalities : [raid1] md127 : active raid1 sdb1[1] sda1[0] 102144 blocks super 1.2 [2/2] [UU] [==>..................] resync = 14.5% (14848/102144) finish=0.0min speed=14848K/sec unused devices: <none> I guess the resync does not complete before the mke2fs runs, because the commands are run in series as fast as possible. > It would be interesting to know whether this happens on non virtio-scsi. The following script uses [QMU-emulated] IDE, and it also fails in the same way, so it seems to have nothing to do with virtio-scsi. -------------------------------------------- #!/bin/bash - export LIBGUESTFS_BACKEND=direct rm /tmp/test1.img /tmp/test2.img truncate -s 100M /tmp/test1.img truncate -s 100M /tmp/test2.img guestfish -xv <<EOF add-drive-opts /tmp/test1.img iface:ide add-drive-opts /tmp/test2.img iface:ide run part-disk /dev/sda mbr part-disk /dev/sdb mbr md-create test "/dev/sda1 /dev/sdb1" mkfs ext4 /dev/md/test EOF Kent Overstreet posted a patch which fixes the problem for me. https://lkml.org/lkml/2014/2/10/809 [PATCH] block: Fix cloning of discard/write same bios This should be fixed with the rc2-git4 kernel that will be built today. |