+++ This bug was initially created as a clone of Bug #1111748 +++ Description of problem: This issue appears to require the following: 1. cached volume 2. xfs on top and mounted 3. some i/o written to the fs 4. snapshot attempt If any of the above aren't present, the kernel BUG isn't tripped and the snapshot attempt works. lvcreate -L 4G -n corigin cache_sanity /dev/sdf1 Create cache data and cache metadata (fast) volumes lvcreate -L 2G -n pool cache_sanity /dev/sda1 lvcreate -L 8M -n pool_meta cache_sanity /dev/sda1 WARNING: xfs signature detected on /dev/cache_sanity/pool_meta at offset 0. Wipe it? [y/n] 1 existing signature left on the device. Create cache pool volume by combining the cache data and cache metadata (fast) volumes lvconvert --type cache-pool --cachemode writethrough --poolmetadata cache_sanity/pool_meta cache_sanity/pool Create cached volume by combining the cache pool (fast) and origin (slow) volumes lvconvert --type cache --cachepool cache_sanity/pool cache_sanity/corigin Placing an xfs filesystem on origin volume Mounting origin volume [root@host-015 ~]# lvs -a -o +devices LV Attr LSize Pool Origin Data% Devices corigin Cwi-aoC--- 4.00g pool [corigin_corig] corigin_corig(0) [corigin_corig] -wi-ao---- 4.00g /dev/sdf1(0) [lvol0_pmspare] ewi------- 8.00m /dev/sdg1(0) pool Cwi-a-C--- 2.00g pool_cdata(0) [pool_cdata] Cwi-aoC--- 2.00g /dev/sda1(0) [pool_cmeta] ewi-aoC--- 8.00m /dev/sda1(512) [root@host-015 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/cache_sanity-corigin 4.0G 33M 4.0G 1% /mnt/corigin [root@host-015 ~]# dd if=/dev/zero of=/mnt/corigin/ddfile bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.595037 s, 176 MB/s [root@host-015 ~]# lvcreate -s /dev/cache_sanity/corigin -c 64 -n merge -L 500M [883088.722481] XFS (dm-2): Mounting Filesystem [883088.765490] XFS (dm-2): Ending clean mount [883120.221176] node_check: 6 callbacks suppressed [883120.222037] device-mapper: btree spine: node_check failed: csum 549844194 != wanted 549945722 [883120.223520] device-mapper: block manager: btree_node validator check failed for block 4 [883120.224939] ------------[ cut here ]------------ [883120.225743] kernel BUG at drivers/md/persistent-data/dm-btree-spine.c:178! [883120.225881] invalid opcode: 0000 [#1] SMP [883120.225881] Modules linked in: dm_snapshot dm_cache_mq dm_cache() dm_raid raid456 raid1 raid10 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 binfmt_misc bridge stp llc sd_mod crct10dif_generic crc_t10dif crct10dif_common iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iptable_filter ip_tables sctp sg i2c_piix4 pcspkr i2c_core serio_raw virtio_balloon virtio_net mperf nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c ata_generic pata_acpi virtio_blk ata_piix libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [883120.225881] CPU: 0 PID: 18467 Comm: lvcreate Tainted: G -------------- T 3.10.0-123.el7.x86_64 #1 [883120.225881] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [883120.225881] task: ffff88003bc06660 ti: ffff88000b1c0000 task.ti: ffff88000b1c0000 [883120.225881] RIP: 0010:[<ffffffffa052f761>] [<ffffffffa052f761>] ro_node+0x21/0x30 [dm_persistent_data] [883120.225881] RSP: 0018:ffff88000b1c1b20 EFLAGS: 00010246 [883120.225881] RAX: 0000000000000000 RBX: ffff88000b1c1b80 RCX: 0000000000000000 [883120.225881] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff88000b1c1b80 [883120.225881] RBP: ffff88000b1c1b70 R08: 0000000000000000 R09: 0000000000020c4f [883120.225881] R10: 6567616e616d206b R11: 636f6c62203a7265 R12: ffffffffa05287a0 [883120.225881] R13: ffffffffa05287a0 R14: ffff88000b1c1bd0 R15: ffff88003b780800 [883120.225881] FS: 00007fb22160e880(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [883120.225881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [883120.225881] CR2: 00007fd47003e000 CR3: 000000000c35a000 CR4: 00000000000006f0 [883120.225881] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [883120.225881] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [883120.225881] Stack: [883120.225881] ffffffffa052d522 ffffffac810dead1 ffff88000b1c1b80 ffffffff819e28e0 [883120.225881] 000000009204f187 0000000000000004 ffffffffa05287a0 ffff88000b1c1bd0 [883120.225881] ffff88000455f800 ffff88003b780800 ffff88000b1c1bc0 ffffffffa052e21e [883120.225881] Call Trace: [883120.225881] [<ffffffffa052d522>] ? walk_node+0x42/0x100 [dm_persistent_data] [883120.225881] [<ffffffffa05287a0>] ? block_dec+0x160/0x160 [dm_persistent_data] [883120.225881] [<ffffffffa052e21e>] dm_btree_walk+0x4e/0x80 [dm_persistent_data] [883120.225881] [<ffffffffa05bba30>] ? complete_migration+0x30/0x30 [dm_cache] [883120.225881] [<ffffffffa05285dc>] dm_array_walk+0x3c/0x60 [dm_persistent_data] [883120.225881] [<ffffffffa05be700>] ? blocks_are_unmapped_or_clean+0xd0/0xd0 [dm_cache] [883120.225881] [<ffffffffa05bf51f>] dm_cache_load_mappings+0x7f/0xe0 [dm_cache] [883120.225881] [<ffffffffa05bba30>] ? complete_migration+0x30/0x30 [dm_cache] [883120.225881] [<ffffffff810f0000>] ? kdb_register_repeat+0x270/0x270 [883120.225881] [<ffffffffa05bdef9>] cache_preresume+0xf9/0x1a0 [dm_cache] [883120.225881] [<ffffffffa0005ff9>] dm_table_resume_targets+0x49/0xe0 [dm_mod] [883120.225881] [<ffffffffa000389c>] dm_resume+0x4c/0xd0 [dm_mod] [883120.225881] [<ffffffffa0008bcb>] dev_suspend+0x12b/0x250 [dm_mod] [883120.225881] [<ffffffffa0008aa0>] ? table_load+0x380/0x380 [dm_mod] [883120.225881] [<ffffffffa00094e5>] ctl_ioctl+0x255/0x500 [dm_mod] [883120.225881] [<ffffffff8123db34>] ? SYSC_semtimedop+0x264/0xd10 [883120.225881] [<ffffffffa00097a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [883120.225881] [<ffffffff811c2f25>] do_vfs_ioctl+0x2e5/0x4c0 [883120.225881] [<ffffffff81257a2e>] ? file_has_perm+0xae/0xc0 [883120.225881] [<ffffffff8123c601>] ? wake_up_sem_queue_do+0x31/0x60 [883120.225881] [<ffffffff811c31a1>] SyS_ioctl+0xa1/0xc0 [883120.225881] [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b [883120.225881] Code: ff ff 5d c3 0f 0b 0f 1f 40 00 0f 1f 44 00 00 8b 47 08 85 c0 74 15 83 e8 01 55 48 98 48 8b 7c c7 10 48 89 e5 e8 91 99 ff ff 5d c3 <0f> 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 [883120.225881] RIP [<ffffffffa052f761>] ro_node+0x21/0x30 [dm_persistent_data] [883120.225881] RSP <ffff88000b1c1b20> [883120.289367] ---[ end trace 39cd9818e7719e97 ]--- [883120.290102] Kernel panic - not syncing: Fatal exception Version-Release number of selected component (if applicable): 3.10.0-123.el7.x86_64 lvm2-2.02.105-14.el7 BUILT: Wed Mar 26 08:29:41 CDT 2014 lvm2-libs-2.02.105-14.el7 BUILT: Wed Mar 26 08:29:41 CDT 2014 lvm2-cluster-2.02.105-14.el7 BUILT: Wed Mar 26 08:29:41 CDT 2014 device-mapper-1.02.84-14.el7 BUILT: Wed Mar 26 08:29:41 CDT 2014 device-mapper-libs-1.02.84-14.el7 BUILT: Wed Mar 26 08:29:41 CDT 2014 device-mapper-event-1.02.84-14.el7 BUILT: Wed Mar 26 08:29:41 CDT 2014 device-mapper-event-libs-1.02.84-14.el7 BUILT: Wed Mar 26 08:29:41 CDT 2014 device-mapper-persistent-data-0.3.2-1.el7 BUILT: Thu Apr 3 09:58:51 CDT 2014 cmirror-2.02.105-14.el7 BUILT: Wed Mar 26 08:29:41 CDT 2014 How reproducible: Everytime --- Additional comment from Joe Thornber on 2014-06-24 05:06:12 EDT --- I've added a test to the device-mapper-test-suite that recreates this scenario driving device-mapper directly (cutting LVM out of the loop): https://github.com/jthornber/device-mapper-test-suite/blob/master/lib/dmtest/tests/stacking/stacking_tests.rb With my kernel tree this passes fine. The next step is to test with a RHEL7 kernel. --- Additional comment from Joe Thornber on 2014-06-24 10:15:57 EDT --- Test passes on a RHEL7 kernel. Handing bug back to LVM team for their take.
This needs to be fixed for full cache support in 6.6 as well. 2.6.32-485.el6.x86_64 lvm2-2.02.107-1.el6 BUILT: Mon Jun 23 09:44:45 CDT 2014 lvm2-libs-2.02.107-1.el6 BUILT: Mon Jun 23 09:44:45 CDT 2014 lvm2-cluster-2.02.107-1.el6 BUILT: Mon Jun 23 09:44:45 CDT 2014 udev-147-2.55.el6 BUILT: Wed Jun 18 06:30:21 CDT 2014 device-mapper-1.02.86-1.el6 BUILT: Mon Jun 23 09:44:45 CDT 2014 device-mapper-libs-1.02.86-1.el6 BUILT: Mon Jun 23 09:44:45 CDT 2014 device-mapper-event-1.02.86-1.el6 BUILT: Mon Jun 23 09:44:45 CDT 2014 device-mapper-event-libs-1.02.86-1.el6 BUILT: Mon Jun 23 09:44:45 CDT 2014 device-mapper-persistent-data-0.3.2-1.el6 BUILT: Fri Apr 4 08:43:06 CDT 2014 cmirror-2.02.107-1.el6 BUILT: Mon Jun 23 09:44:45 CDT 2014
moving to 6.7 because: 1) this bug is not a blocker candidate due to tech preview status 2) snapshots of cache LVs have been disabled in 6.6 and the bug is avoided in this way.