RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1113759 - kernel BUG at drivers/md/persistent-data/dm-btree-spine.c when taking snapshot of a cached XFS
Summary: kernel BUG at drivers/md/persistent-data/dm-btree-spine.c when taking snapsho...
Keywords:
Status: CLOSED DUPLICATE of bug 1169431
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.6
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Mike Snitzer
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-26 20:35 UTC by Corey Marthaler
Modified: 2015-01-05 20:30 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1111748
Environment:
Last Closed: 2015-01-05 20:28:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2014-06-26 20:35:03 UTC
+++ This bug was initially created as a clone of Bug #1111748 +++

Description of problem:

This issue appears to require the following:
 1. cached volume
 2. xfs on top and mounted
 3. some i/o written to the fs
 4. snapshot attempt

If any of the above aren't present, the kernel BUG isn't tripped and the snapshot attempt works.

lvcreate -L 4G -n corigin cache_sanity /dev/sdf1

Create cache data and cache metadata (fast) volumes
lvcreate -L 2G -n pool cache_sanity /dev/sda1
lvcreate -L 8M -n pool_meta cache_sanity /dev/sda1
WARNING: xfs signature detected on /dev/cache_sanity/pool_meta at offset 0. Wipe it? [y/n] 
  1 existing signature left on the device.

Create cache pool volume by combining the cache data and cache metadata (fast) volumes
lvconvert --type cache-pool --cachemode writethrough --poolmetadata cache_sanity/pool_meta cache_sanity/pool
Create cached volume by combining the cache pool (fast) and origin (slow) volumes
lvconvert --type cache --cachepool cache_sanity/pool cache_sanity/corigin

Placing an xfs filesystem on origin volume
Mounting origin volume

[root@host-015 ~]# lvs -a -o +devices
  LV               Attr       LSize   Pool Origin          Data%  Devices
  corigin          Cwi-aoC---   4.00g pool [corigin_corig]        corigin_corig(0)
  [corigin_corig]  -wi-ao----   4.00g                             /dev/sdf1(0)
  [lvol0_pmspare]  ewi-------   8.00m                             /dev/sdg1(0)
  pool             Cwi-a-C---   2.00g                             pool_cdata(0)
  [pool_cdata]     Cwi-aoC---   2.00g                             /dev/sda1(0)
  [pool_cmeta]     ewi-aoC---   8.00m                             /dev/sda1(512)

[root@host-015 ~]# df -h
Filesystem                        Size  Used Avail Use% Mounted on
/dev/mapper/cache_sanity-corigin  4.0G   33M  4.0G   1% /mnt/corigin

[root@host-015 ~]# dd if=/dev/zero of=/mnt/corigin/ddfile bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.595037 s, 176 MB/s

[root@host-015 ~]# lvcreate -s /dev/cache_sanity/corigin -c 64 -n merge -L 500M

[883088.722481] XFS (dm-2): Mounting Filesystem
[883088.765490] XFS (dm-2): Ending clean mount
[883120.221176] node_check: 6 callbacks suppressed
[883120.222037] device-mapper: btree spine: node_check failed: csum 549844194 != wanted 549945722
[883120.223520] device-mapper: block manager: btree_node validator check failed for block 4
[883120.224939] ------------[ cut here ]------------
[883120.225743] kernel BUG at drivers/md/persistent-data/dm-btree-spine.c:178!
[883120.225881] invalid opcode: 0000 [#1] SMP 
[883120.225881] Modules linked in: dm_snapshot dm_cache_mq dm_cache() dm_raid raid456 raid1 raid10 async_raid6_recov async_memcpy async_pq async_xor async_tx dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 binfmt_misc bridge stp llc sd_mod crct10dif_generic crc_t10dif crct10dif_common iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iptable_filter ip_tables sctp sg i2c_piix4 pcspkr i2c_core serio_raw virtio_balloon virtio_net mperf nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c ata_generic pata_acpi virtio_blk ata_piix libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
[883120.225881] CPU: 0 PID: 18467 Comm: lvcreate Tainted: G            -------------- T 3.10.0-123.el7.x86_64 #1
[883120.225881] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[883120.225881] task: ffff88003bc06660 ti: ffff88000b1c0000 task.ti: ffff88000b1c0000
[883120.225881] RIP: 0010:[<ffffffffa052f761>]  [<ffffffffa052f761>] ro_node+0x21/0x30 [dm_persistent_data]
[883120.225881] RSP: 0018:ffff88000b1c1b20  EFLAGS: 00010246
[883120.225881] RAX: 0000000000000000 RBX: ffff88000b1c1b80 RCX: 0000000000000000
[883120.225881] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff88000b1c1b80
[883120.225881] RBP: ffff88000b1c1b70 R08: 0000000000000000 R09: 0000000000020c4f
[883120.225881] R10: 6567616e616d206b R11: 636f6c62203a7265 R12: ffffffffa05287a0
[883120.225881] R13: ffffffffa05287a0 R14: ffff88000b1c1bd0 R15: ffff88003b780800
[883120.225881] FS:  00007fb22160e880(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[883120.225881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[883120.225881] CR2: 00007fd47003e000 CR3: 000000000c35a000 CR4: 00000000000006f0
[883120.225881] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[883120.225881] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[883120.225881] Stack:
[883120.225881]  ffffffffa052d522 ffffffac810dead1 ffff88000b1c1b80 ffffffff819e28e0
[883120.225881]  000000009204f187 0000000000000004 ffffffffa05287a0 ffff88000b1c1bd0
[883120.225881]  ffff88000455f800 ffff88003b780800 ffff88000b1c1bc0 ffffffffa052e21e
[883120.225881] Call Trace:
[883120.225881]  [<ffffffffa052d522>] ? walk_node+0x42/0x100 [dm_persistent_data]
[883120.225881]  [<ffffffffa05287a0>] ? block_dec+0x160/0x160 [dm_persistent_data]
[883120.225881]  [<ffffffffa052e21e>] dm_btree_walk+0x4e/0x80 [dm_persistent_data]
[883120.225881]  [<ffffffffa05bba30>] ? complete_migration+0x30/0x30 [dm_cache]
[883120.225881]  [<ffffffffa05285dc>] dm_array_walk+0x3c/0x60 [dm_persistent_data]
[883120.225881]  [<ffffffffa05be700>] ? blocks_are_unmapped_or_clean+0xd0/0xd0 [dm_cache]
[883120.225881]  [<ffffffffa05bf51f>] dm_cache_load_mappings+0x7f/0xe0 [dm_cache]
[883120.225881]  [<ffffffffa05bba30>] ? complete_migration+0x30/0x30 [dm_cache]
[883120.225881]  [<ffffffff810f0000>] ? kdb_register_repeat+0x270/0x270
[883120.225881]  [<ffffffffa05bdef9>] cache_preresume+0xf9/0x1a0 [dm_cache]
[883120.225881]  [<ffffffffa0005ff9>] dm_table_resume_targets+0x49/0xe0 [dm_mod]
[883120.225881]  [<ffffffffa000389c>] dm_resume+0x4c/0xd0 [dm_mod]
[883120.225881]  [<ffffffffa0008bcb>] dev_suspend+0x12b/0x250 [dm_mod]
[883120.225881]  [<ffffffffa0008aa0>] ? table_load+0x380/0x380 [dm_mod]
[883120.225881]  [<ffffffffa00094e5>] ctl_ioctl+0x255/0x500 [dm_mod]
[883120.225881]  [<ffffffff8123db34>] ? SYSC_semtimedop+0x264/0xd10
[883120.225881]  [<ffffffffa00097a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[883120.225881]  [<ffffffff811c2f25>] do_vfs_ioctl+0x2e5/0x4c0
[883120.225881]  [<ffffffff81257a2e>] ? file_has_perm+0xae/0xc0
[883120.225881]  [<ffffffff8123c601>] ? wake_up_sem_queue_do+0x31/0x60
[883120.225881]  [<ffffffff811c31a1>] SyS_ioctl+0xa1/0xc0
[883120.225881]  [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b
[883120.225881] Code: ff ff 5d c3 0f 0b 0f 1f 40 00 0f 1f 44 00 00 8b 47 08 85 c0 74 15 83 e8 01 55 48 98 48 8b 7c c7 10 48 89 e5 e8 91 99 ff ff 5d c3 <0f> 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 
[883120.225881] RIP  [<ffffffffa052f761>] ro_node+0x21/0x30 [dm_persistent_data]
[883120.225881]  RSP <ffff88000b1c1b20>
[883120.289367] ---[ end trace 39cd9818e7719e97 ]---
[883120.290102] Kernel panic - not syncing: Fatal exception



Version-Release number of selected component (if applicable):
3.10.0-123.el7.x86_64
lvm2-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
lvm2-libs-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
lvm2-cluster-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-libs-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-event-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-event-libs-1.02.84-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014
device-mapper-persistent-data-0.3.2-1.el7    BUILT: Thu Apr  3 09:58:51 CDT 2014
cmirror-2.02.105-14.el7    BUILT: Wed Mar 26 08:29:41 CDT 2014


How reproducible:
Everytime

--- Additional comment from Joe Thornber on 2014-06-24 05:06:12 EDT ---

I've added a test to the device-mapper-test-suite that recreates this scenario driving device-mapper directly (cutting LVM out of the loop):

https://github.com/jthornber/device-mapper-test-suite/blob/master/lib/dmtest/tests/stacking/stacking_tests.rb

With my kernel tree this passes fine.

The next step is to test with a RHEL7 kernel.

--- Additional comment from Joe Thornber on 2014-06-24 10:15:57 EDT ---

Test passes on a RHEL7 kernel.

Handing bug back to LVM team for their take.

Comment 1 Corey Marthaler 2014-06-26 20:36:01 UTC
This needs to be fixed for full cache support in 6.6 as well.

2.6.32-485.el6.x86_64

lvm2-2.02.107-1.el6    BUILT: Mon Jun 23 09:44:45 CDT 2014
lvm2-libs-2.02.107-1.el6    BUILT: Mon Jun 23 09:44:45 CDT 2014
lvm2-cluster-2.02.107-1.el6    BUILT: Mon Jun 23 09:44:45 CDT 2014
udev-147-2.55.el6    BUILT: Wed Jun 18 06:30:21 CDT 2014
device-mapper-1.02.86-1.el6    BUILT: Mon Jun 23 09:44:45 CDT 2014
device-mapper-libs-1.02.86-1.el6    BUILT: Mon Jun 23 09:44:45 CDT 2014
device-mapper-event-1.02.86-1.el6    BUILT: Mon Jun 23 09:44:45 CDT 2014
device-mapper-event-libs-1.02.86-1.el6    BUILT: Mon Jun 23 09:44:45 CDT 2014
device-mapper-persistent-data-0.3.2-1.el6    BUILT: Fri Apr  4 08:43:06 CDT 2014
cmirror-2.02.107-1.el6    BUILT: Mon Jun 23 09:44:45 CDT 2014

Comment 3 Jonathan Earl Brassow 2014-08-27 12:55:06 UTC
moving to 6.7 because:
1) this bug is not a blocker candidate due to tech preview status
2) snapshots of cache LVs have been disabled in 6.6 and the bug is avoided in this way.


Note You need to log in before you can comment on or make changes to this bug.