Bug 1080894
Summary: | dm-cache: crash on creating cache | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Heinz Mauelshagen <heinzm> |
Component: | kernel | Assignee: | Mike Snitzer <msnitzer> |
Status: | CLOSED ERRATA | QA Contact: | XiaoNi <xni> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 | CC: | agk, hartsjc, jbrassow, msnitzer, prockai, xni, yanwang |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-3.10.0-210.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-03-05 11:46:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1113511, 1119326, 1159001 |
Description
Heinz Mauelshagen
2014-03-26 09:46:50 UTC
*** Bug 1081934 has been marked as a duplicate of this bug. *** Still happens with 3.10.0-123.el7.x86_64: [ 14.020028] ------------[ cut here ]------------ [ 14.021004] kernel BUG at drivers/md/persistent-data/dm-btree-spine.c:169! [ 14.021004] invalid opcode: 0000 [#1] SMP [ 14.021004] Modules linked in: dm_cache_mq dm_cache() nls_utf8 dm_thin_pool dm_bio_prison dm_persistent_data libcrc32c dm_raid raid10 raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx dm_zero dm_mirror dm_region_hash dm_log dm_snapshot dm_bufio dm_mod loop sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_piix libata des_generic md4 virtio_net cifs dns_resolver ext4 jbd2 mbcache virtio_balloon virtio_blk virtio_pci virtio_ring virtio [ 14.021004] CPU: 0 PID: 508 Comm: dmsetup Tainted: G -------------- T 3.10.0-123.el7.x86_64 #1 [ 14.021004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 14.021004] task: ffff8801ad8bcfa0 ti: ffff8801ab83e000 task.ti: ffff8801ab83e000 [ 14.021004] RIP: 0010:[<ffffffffa029273a>] [<ffffffffa029273a>] ro_pop+0x2a/0x30 [dm_persistent_data] [ 14.021004] RSP: 0018:ffff8801ab83fb20 EFLAGS: 00010246 [ 14.021004] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000 [ 14.021004] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff8801ab83fb80 [ 14.021004] RBP: ffff8801ab83fb70 R08: 0000000000000000 R09: 0000000000000000 [ 14.021004] R10: 0000000000000004 R11: ffff88012c5e2ff8 R12: 0000000000000004 [ 14.021004] R13: ffffffffa028b7a0 R14: ffff8801ab83fbd0 R15: ffff88003704a000 [ 14.021004] FS: 00007f88415e8800(0000) GS:ffff8801b6c00000(0000) knlGS:0000000000000000 [ 14.021004] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 14.021004] CR2: 00007fd9471812e0 CR3: 00000001ab28a000 CR4: 00000000000006f0 [ 14.021004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 14.021004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 14.021004] Stack: [ 14.021004] ffffffffa02905a2 00000000000001fd ffff8801ab83fb80 ffff8801ab83fba0 [ 14.021004] 0000000073895023 0000000000000028 ffffffffa028b7a0 ffff8801ab83fbd0 [ 14.021004] ffff88007ff78800 ffff8801aafc4000 ffff8801ab83fbc0 ffffffffa029121e [ 14.021004] Call Trace: [ 14.021004] [<ffffffffa02905a2>] ? walk_node+0xc2/0x100 [dm_persistent_data] [ 14.021004] [<ffffffffa028b7a0>] ? block_dec+0x160/0x160 [dm_persistent_data] [ 14.021004] [<ffffffffa029121e>] dm_btree_walk+0x4e/0x80 [dm_persistent_data] [ 14.021004] [<ffffffffa02c1a30>] ? complete_migration+0x30/0x30 [dm_cache] [ 14.021004] [<ffffffffa028b5dc>] dm_array_walk+0x3c/0x60 [dm_persistent_data] [ 14.021004] [<ffffffffa02c4700>] ? blocks_are_unmapped_or_clean+0xd0/0xd0 [dm_cache] [ 14.021004] [<ffffffffa02c551f>] dm_cache_load_mappings+0x7f/0xe0 [dm_cache] [ 14.021004] [<ffffffffa02c1a30>] ? complete_migration+0x30/0x30 [dm_cache] [ 14.021004] [<ffffffff810f0001>] ? kdb_register+0x1/0x20 [ 14.021004] [<ffffffffa02c3ef9>] cache_preresume+0xf9/0x1a0 [dm_cache] [ 14.021004] [<ffffffffa01d2ff9>] dm_table_resume_targets+0x49/0xe0 [dm_mod] [ 14.021004] [<ffffffffa01d089c>] dm_resume+0x4c/0xd0 [dm_mod] [ 14.021004] [<ffffffffa01d5bcb>] dev_suspend+0x12b/0x250 [dm_mod] [ 14.021004] [<ffffffffa01d5aa0>] ? table_load+0x380/0x380 [dm_mod] [ 14.021004] [<ffffffffa01d64e5>] ctl_ioctl+0x255/0x500 [dm_mod] [ 14.021004] [<ffffffffa01d67a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 14.021004] [<ffffffff811c2f25>] do_vfs_ioctl+0x2e5/0x4c0 [ 14.021004] [<ffffffff81257a2e>] ? file_has_perm+0xae/0xc0 [ 14.021004] [<ffffffff811c31a1>] SyS_ioctl+0xa1/0xc0 [ 14.021004] [<ffffffff815ea325>] ? do_device_not_available+0x35/0x60 [ 14.021004] [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b [ 14.021004] Code: 90 0f 1f 44 00 00 8b 47 08 85 c0 74 1e 83 e8 01 55 89 47 08 48 98 48 8b 74 c7 10 48 8b 07 48 89 e5 48 8b 38 e8 38 d4 ff ff 5d c3 <0f> 0b 0f 1f 40 00 0f 1f 44 00 00 8b 47 08 85 c0 74 15 83 e8 01 [ 14.021004] RIP [<ffffffffa029273a>] ro_pop+0x2a/0x30 [dm_persistent_data] [ 14.021004] RSP <ffff8801ab83fb20> [ 14.074847] ---[ end trace 2f328e1677444d10 ]--- [ 14.075464] Kernel panic - not syncing: Fatal exception I am going to try with a more recent kernel build (but it's going to take a while). Also, with 4G of RAM, the kernel runs out of memory creating the 2.5T cache device. It gives the above panic with 6G of RAM. Also happens with latest RHEL 7.1 compose (20141021): [ 23.990481] ------------[ cut here ]------------ [ 23.991004] kernel BUG at drivers/md/persistent-data/dm-btree-spine.c:169! [ 23.991004] invalid opcode: 0000 [#1] SMP [ 23.991004] Modules linked in: dm_cache_mq dm_cache() nls_utf8 dm_thin_pool dm_bio_prison dm_persistent_data libcrc32c dm_raid raid10 raid1 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx dm_zero dm_mirror dm_region_hash dm_log dm_snapshot dm_bufio dm_mod loop sd_mod crc_t10dif crct10dif_generic crct10dif_common ata_piix libata des_generic md4 virtio_net cifs dns_resolver ext4 jbd2 mbcache virtio_balloon virtio_blk virtio_pci virtio_ring virtio [ 23.991004] CPU: 0 PID: 541 Comm: dmsetup Tainted: G -------------- T 3.10.0-189.el7.x86_64 #1 [ 23.991004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 23.991004] task: ffff8800ba9116c0 ti: ffff8800ba924000 task.ti: ffff8800ba924000 [ 23.991004] RIP: 0010:[<ffffffffa029a6fa>] [<ffffffffa029a6fa>] ro_pop+0x2a/0x30 [dm_persistent_data] [ 23.991004] RSP: 0018:ffff8800ba927b20 EFLAGS: 00010246 [ 23.991004] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000 [ 23.991004] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff8800ba927b80 [ 23.991004] RBP: ffff8800ba927b70 R08: 0000000000000000 R09: 0000000000000000 [ 23.991004] R10: 0000000000000004 R11: ffff880034f1eff8 R12: 0000000000000004 [ 23.991004] R13: ffffffffa02937a0 R14: ffff8800ba927bd0 R15: ffff88012ebc3000 [ 23.991004] FS: 00007f22b5681800(0000) GS:ffff8801b6c00000(0000) knlGS:0000000000000000 [ 23.991004] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 23.991004] CR2: 00007f8ccdb76d3c CR3: 00000000ba8c4000 CR4: 00000000000006f0 [ 23.991004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 23.991004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 23.991004] Stack: [ 23.991004] ffffffffa0298562 00000000000001fd ffff8800ba927b80 ffff8800ba927ba0 [ 23.991004] 00000000d698142d 0000000000000028 ffffffffa02937a0 ffff8800ba927bd0 [ 23.991004] ffff8800ba920400 ffff8801adb7f800 ffff8800ba927bc0 ffffffffa02991de [ 23.991004] Call Trace: [ 23.991004] [<ffffffffa0298562>] ? walk_node+0xc2/0x100 [dm_persistent_data] [ 23.991004] [<ffffffffa02937a0>] ? block_dec+0x160/0x160 [dm_persistent_data] [ 23.991004] [<ffffffffa02991de>] dm_btree_walk+0x4e/0x80 [dm_persistent_data] [ 23.991004] [<ffffffffa02c99f0>] ? complete_migration+0x30/0x30 [dm_cache] [ 23.991004] [<ffffffffa02935dc>] dm_array_walk+0x3c/0x60 [dm_persistent_data] [ 23.991004] [<ffffffffa02cc640>] ? blocks_are_unmapped_or_clean+0xd0/0xd0 [dm_cache] [ 23.991004] [<ffffffffa02cd50f>] dm_cache_load_mappings+0x7f/0xe0 [dm_cache] [ 23.991004] [<ffffffffa02c99f0>] ? complete_migration+0x30/0x30 [dm_cache] [ 23.991004] [<ffffffff81110001>] ? irq_create_mapping+0x211/0x240 [ 23.991004] [<ffffffffa02cbe69>] cache_preresume+0xf9/0x1a0 [dm_cache] [ 23.991004] [<ffffffffa01d84a9>] dm_table_resume_targets+0x49/0xe0 [dm_mod] [ 23.991004] [<ffffffffa01d592c>] dm_resume+0x4c/0xd0 [dm_mod] [ 23.991004] [<ffffffffa01daccb>] dev_suspend+0x12b/0x250 [dm_mod] [ 23.991004] [<ffffffffa01daba0>] ? table_load+0x380/0x380 [dm_mod] [ 23.991004] [<ffffffffa01db5e5>] ctl_ioctl+0x255/0x500 [dm_mod] [ 23.991004] [<ffffffffa01db8a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 23.991004] [<ffffffff811d9205>] do_vfs_ioctl+0x2e5/0x4c0 [ 23.991004] [<ffffffff8126e0fe>] ? file_has_perm+0xae/0xc0 [ 23.991004] [<ffffffff811d9481>] SyS_ioctl+0xa1/0xc0 [ 23.991004] [<ffffffff8160baa5>] ? do_device_not_available+0x35/0x60 [ 23.991004] [<ffffffff816134e9>] system_call_fastpath+0x16/0x1b [ 23.991004] Code: 90 0f 1f 44 00 00 8b 47 08 85 c0 74 1e 83 e8 01 55 89 47 08 48 98 48 8b 74 c7 10 48 8b 07 48 89 e5 48 8b 38 e8 38 d4 ff ff 5d c3 <0f> 0b 0f 1f 40 00 0f 1f 44 00 00 8b 47 08 85 c0 74 15 83 e8 01 [ 23.991004] RIP [<ffffffffa029a6fa>] ro_pop+0x2a/0x30 [dm_persistent_data] [ 23.991004] RSP <ffff8800ba927b20> [ 24.030925] ---[ end trace 5b03bba261d2c923 ]--- [ 24.031260] Kernel panic - not syncing: Fatal exception The kernel is this build: [ 0.000000] Linux version 3.10.0-189.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC) ) #1 SMP Fri Oct 17 11:04:19 EDT 2014 This appears to be directly linked to the number of blocks in the “ssd” device. Increasing block size or shrinking the ssd device both make the tripped BUG_ON go away. The threshold appears to be somewhere between 2^22 and 2^23 blocks. I'm reading the source code trying to narrow the problem down. I have cut down the threshold to be between 2^23 - 2^13 and 2^23 - 2^14 cache-device blocks. That is definitely an odd number. Just FYI. I need move to other things, but ping me if you need any other details I could provide. Reproduced with the following dmtest test: https://github.com/jthornber/device-mapper-test-suite/blob/master/lib/dmtest/tests/cache/large_cache_tests.rb#L139 The real problem here is commit 64ab346a360a4b15c28fb8531918d4a01f4eabd9 made at the end of March. Keeping track of which blocks in on the origin have been discarded allows us to optimise migration to/from the cache by avoiding a copy (no point copying discarded data). Originally the discard block size was a large multiple of the cache block size, because the discard bitset size depends on the size of the _origin_ rather than the fast ssd device. The offending patch makes these two block sizes the same; when the origin is large and the cache block size is small this causes an outrageous amount of metadata and memory to be used to store the discard bitset. We just can't go live with this patch. For the record the testing that NA did with v. large setups was done before this patch went in. So I'm backing this patch out and investigating the issue that caused it to go in in the first place. 2 patches are making their way upstream. https://github.com/jthornber/linux-2.6/commit/e28c5a7e0c8208b7f3080744c9d90ce97b359f09 https://github.com/jthornber/linux-2.6/commit/c84dd912f9fef71ce8f1d3d801ee7cacf9412635 Which fix this test case: https://github.com/jthornber/device-mapper-test-suite/blob/master/lib/dmtest/tests/cache/large_cache_tests.rb#L139 Handing over to Mike Snitzer to roll a RHEL kernel for testing. Patch(es) available on kernel-3.10.0-210.el7 Hi all The problem is fixed in kernel-3.10.0-210.el7. Set Verified. Thanks Xiao Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0290.html |