Bug 1168434 - dm cache: kernel crashes when handling a partial block at end of device
Summary: dm cache: kernel crashes when handling a partial block at end of device
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Mike Snitzer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-26 22:37 UTC by Zdenek Kabelac
Modified: 2015-01-13 00:05 UTC (History)
17 users (show)

Fixed In Version: kernel-3.17.8-200.fc20
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-11 02:57:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Script to trigger kernel oops (347 bytes, application/x-shellscript)
2014-11-26 22:37 UTC, Zdenek Kabelac
no flags Details

Description Zdenek Kabelac 2014-11-26 22:37:28 UTC
Created attachment 961830 [details]
Script to trigger kernel oops

Description of problem:

Creating cache device with attached script leads to kernel crash with 3.18-rc6

kernel BUG at drivers/md/dm-cache-target.c:731!
---

random: nonblocking pool is initialized
loop: module loaded
device-mapper: cache-policy-mq: version 1.2.0 loaded
------------[ cut here ]------------
kernel BUG at drivers/md/dm-cache-target.c:731!
invalid opcode: 0000 [#1] PREEMPT SMP 
Modules linked in: dm_cache_mq dm_cache dm_persistent_data dm_bio_prison dm_bufio libcrc32c loop nfsv4 nfs autofs4 nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc fuse dm_crypt dm_mod md_mod uhci_hcd ehci_hcd i2c_piix4 serio_raw i2c_core virtio_net usbcore usb_common pvpanic floppy evdev
CPU: 0 PID: 1921 Comm: udevd Not tainted 3.18.0-rc6-00030-gae2fefb #219
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff880039d40000 ti: ffff880039d28000 task.ti: ffff880039d28000
RIP: 0010:[<ffffffffa035a200>]  [<ffffffffa035a200>] inc_ds+0x60/0x70 [dm_cache]
RSP: 0018:ffff880039d2b908  EFLAGS: 00010246
RAX: ffffffffffffff88 RBX: ffff880035900140 RCX: ffffffffffffff88
RDX: 0000000000000000 RSI: ffff8800359001b8 RDI: ffff88003860ec00
RBP: ffff880039d2b918 R08: 0000000000000000 R09: ffff88003861a058
R10: 0000000000000000 R11: ffff88003861a058 R12: ffff8800359001b8
R13: 0000000000000001 R14: ffff88003860ec00 R15: ffff8800359001b8
FS:  0000000000000000(0000) GS:ffff88003fc00000(0063) knlGS:00000000f73fe7c0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000f7b8312c CR3: 0000000039f60000 CR4: 00000000000006f0
Stack:
 0000001039d2b928 ffff880035900140 ffff880039d2b998 ffffffffa035b403
 ffff880039d2b948 0000000000000292 0000000000000000 0000000000000002
 0000000000000000 ffff8800358afc00 ffff8800358af900 0000000000000010
Call Trace:
 [<ffffffffa035b403>] cache_map+0x203/0x400 [dm_cache]
 [<ffffffffa00f9fae>] __map_bio+0x3e/0x290 [dm_mod]
 [<ffffffffa00fa71b>] __split_and_process_bio+0x32b/0x510 [dm_mod]
 [<ffffffffa00fa412>] ? __split_and_process_bio+0x22/0x510 [dm_mod]
 [<ffffffffa00facfa>] dm_request+0x1ba/0x310 [dm_mod]
 [<ffffffffa00fab76>] ? dm_request+0x36/0x310 [dm_mod]
 [<ffffffff81362e28>] generic_make_request+0xd8/0x130
 [<ffffffff81362ef8>] submit_bio+0x78/0x190
 [<ffffffff81184a0c>] ? __lru_cache_add+0x5c/0xb0
 [<ffffffff8122e54a>] mpage_bio_submit+0x2a/0x40
 [<ffffffff8122eed5>] mpage_readpages+0x115/0x130
 [<ffffffff812289c0>] ? I_BDEV+0x10/0x10
 [<ffffffff812289c0>] ? I_BDEV+0x10/0x10
 [<ffffffff8122925d>] blkdev_readpages+0x1d/0x20
 [<ffffffff81182e3f>] __do_page_cache_readahead+0x2af/0x350
 [<ffffffff81182d06>] ? __do_page_cache_readahead+0x176/0x350
 [<ffffffff81183424>] force_page_cache_readahead+0x34/0x50
 [<ffffffff81183486>] page_cache_sync_readahead+0x46/0x50
 [<ffffffff8117609c>] generic_file_read_iter+0x51c/0x640
 [<ffffffff8162ffb7>] ? mutex_lock_nested+0x267/0x430
 [<ffffffff81229557>] blkdev_read_iter+0x37/0x40
 [<ffffffff811e73ae>] new_sync_read+0x7e/0xb0
 [<ffffffff811e7c4b>] vfs_read+0x9b/0x180
 [<ffffffff811e87e9>] SyS_read+0x49/0xb0
 [<ffffffff81637963>] sysenter_dispatch+0x7/0x1f
 [<ffffffff8139966b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
Code: 83 7b 08 00 75 2d 48 8b bf 50 03 00 00 e8 e9 41 fe ff 48 89 43 08 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 a3 f8 ff ff 0f 1f 00 <0f> 0b 66 0f 1f 44 00 00 0f 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 
RIP  [<ffffffffa035a200>] inc_ds+0x60/0x70 [dm_cache]
 RSP <ffff880039d2b908>
---[ end trace 8049ee6e6ab043da ]---


Version-Release number of selected component (if applicable):
lvm2 2.02.113

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Joe Thornber 2014-11-28 09:31:54 UTC
Reproduced with following new test in dmtest:


https://github.com/jthornber/device-mapper-test-suite/blob/master/lib/dmtest/tests/cache/small_cache_tests.rb#L37

Comment 4 Joe Thornber 2014-11-28 09:53:24 UTC
This patch fixes:

https://github.com/jthornber/linux-2.6/commit/b522fe9a4997c3843e70ffb9eb12354db1282df2

Handing over to Mike to roll a new kernel.

Comment 6 Alasdair Kergon 2014-11-28 23:36:07 UTC
Workaround is to size your devices such that there is no partial block.

Comment 7 Josh Boyer 2014-12-16 16:34:07 UTC
(In reply to Mike Snitzer from comment #5)
> It is staged upstream for 3.19 (and stable) inclusion:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/
> commit/?h=dm-for-3.19&id=b586694b0dc456ed3118c9b4247e2f72fa5a26c7

There's a handful of other patches tagged for stable as well.  Would you like us to grab all of them and carry them until they hit the stable trees, or would you rather wait until upstream works it out?  Specifically, I'm looking at:

f824a2af3dfbbb766c02e19df21f985bceadf0ee
1e32134a5a404e80bfb47fad8a94e9bbfcbdacc5
f29a3147e251d7ae20d3194ff67f109d71e501b4

Comment 8 Mike Snitzer 2014-12-17 18:10:25 UTC
(In reply to Josh Boyer from comment #7)
> (In reply to Mike Snitzer from comment #5)
> > It is staged upstream for 3.19 (and stable) inclusion:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/
> > commit/?h=dm-for-3.19&id=b586694b0dc456ed3118c9b4247e2f72fa5a26c7
> 
> There's a handful of other patches tagged for stable as well.  Would you
> like us to grab all of them and carry them until they hit the stable trees,
> or would you rather wait until upstream works it out?  Specifically, I'm
> looking at:
> 
> f824a2af3dfbbb766c02e19df21f985bceadf0ee
> 1e32134a5a404e80bfb47fad8a94e9bbfcbdacc5
> f29a3147e251d7ae20d3194ff67f109d71e501b4

Doesn't hurt if you pull them in until they land in the kernel you rebase to.  Thanks.

Comment 9 Josh Boyer 2014-12-18 16:31:09 UTC
Thanks Mike.  Fixed in all the relevant Fedora branches.

Comment 12 Fedora Update System 2015-01-09 13:10:00 UTC
kernel-3.17.8-300.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/kernel-3.17.8-300.fc21

Comment 13 Fedora Update System 2015-01-09 13:10:51 UTC
kernel-3.17.8-200.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.17.8-200.fc20

Comment 14 Fedora Update System 2015-01-10 03:00:17 UTC
Package kernel-3.17.8-200.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.17.8-200.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-0515/kernel-3.17.8-200.fc20
then log in and leave karma (feedback).

Comment 15 Fedora Update System 2015-01-11 02:57:58 UTC
kernel-3.17.8-300.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 16 Fedora Update System 2015-01-13 00:05:36 UTC
kernel-3.17.8-200.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.