Description of problem: When thin-pool is supposed to be committing and thin-pool's metadata device has invalid size (due to other table manipulation errors with raid reshape) target generates this OOPS message: device-mapper: raid: raid456 discard support disabled due to discard_zeroes_data uncertainty. device-mapper: raid: Set dm-raid.devices_handle_discard_safely=Y to override. device-mapper: thin: Data device (dm-1) discard unsupported: Disabling discard passdown. device-mapper: table: 253:2: dm-1 too small for target: start=0, len=67584, dev_size=61440 BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000002ab8a067 P4D 800000002ab8a067 PUD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 3520 Comm: dmsetup Not tainted 5.5.0-rc5+ #109 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1.fc32 04/01/2014 RIP: 0010:metadata_pre_commit_callback+0x15/0x60 [dm_thin_pool] Code: fe e9 9f ce b3 c4 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 48 8d 6f 50 53 48 89 fb 48 89 ef e8 2f 79 e1 c4 48 8b 43 10 <48> 8b 10 48 8b 92 80 00 00 00 48 39 53 58 74 10 66 81 63 64 ff fd RSP: 0018:ffffb527006c3bf0 EFLAGS: 00010216 RAX: 0000000000000000 RBX: ffff8e8b6a126900 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff8e8b6a1269b8 RBP: ffff8e8b6a126950 R08: ffffb527006c3c30 R09: 0000000000000000 R10: 0000000000000000 R11: ffffffff8623aca8 R12: 0000000000000001 R13: ffff8e8b6b258800 R14: ffff8e8b6e7f8a00 R15: ffff8e8b6c65e000 FS: 0000000000000000(0000) GS:ffff8e8b6f100000(0063) knlGS:00000000f7be7a40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000002ab8c000 CR4: 00000000000006e0 Call Trace: __commit_transaction+0x36/0x200 [dm_thin_pool] dm_pool_commit_metadata+0x28/0x60 [dm_thin_pool] commit+0x1d/0xc0 [dm_thin_pool] dm_table_postsuspend_targets+0x40/0x50 __dm_destroy+0x128/0x140 dm_hash_remove_all+0x58/0x130 remove_all+0x19/0x30 ctl_ioctl+0x1ae/0x460 ? dm_hash_remove_all+0x130/0x130 dm_compat_ctl_ioctl+0x7/0x10 __ia32_compat_sys_ioctl+0x164/0x240 do_fast_syscall_32+0xa4/0x2c5 entry_SYSENTER_compat+0x7f/0x91 Modules linked in: dm_raid dm_thin_pool dm_persistent_data dm_bio_prison nfsv3 nfs_acl nfs lockd grace joydev serio_raw virtio_balloon sunrpc virtio_rng raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear virtio_net net_failover virtio_blk failover virtio_pci virtio_ring virtio ata_generic pata_acpi CR2: 0000000000000000 ---[ end trace ac1a948a9460b1cd ]--- RIP: 0010:metadata_pre_commit_callback+0x15/0x60 [dm_thin_pool] Code: fe e9 9f ce b3 c4 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 48 8d 6f 50 53 48 89 fb 48 89 ef e8 2f 79 e1 c4 48 8b 43 10 <48> 8b 10 48 8b 92 80 00 00 00 48 39 53 58 74 10 66 81 63 64 ff fd RSP: 0018:ffffb527006c3bf0 EFLAGS: 00010216 RAX: 0000000000000000 RBX: ffff8e8b6a126900 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff8e8b6a1269b8 RBP: ffff8e8b6a126950 R08: ffffb527006c3c30 R09: 0000000000000000 R10: 0000000000000000 R11: ffffffff8623aca8 R12: 0000000000000001 R13: ffff8e8b6b258800 R14: ffff8e8b6e7f8a00 R15: ffff8e8b6c65e000 FS: 0000000000000000(0000) GS:ffff8e8b6f100000(0063) knlGS:00000000f7be7a40 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000002ab8c000 CR4: 00000000000006e0 Version-Release number of selected component (if applicable): Current upstream 5.5.0-rc5+ How reproducible: Not so easy to describe - latest patch for raid fix by Heinz is needed. lvcreate -T -L10 vg/pool lvconvert -y --ty raid5 --stripes 3 vg/pool_tdata lvconvert -y --ty raid5 --stripes 3 vg/pool_tdata lvconvert -y --ty raid5 --stripes 3 vg/pool_tdata now it ends with failed DM - pool remains suspend. After resume - just run 'dmsetup remove...' for pool device.
I can repo a similar deadlock in the latest rhel8.2 rpms. kernel-4.18.0-167.el8 BUILT: Sat Dec 14 19:43:52 CST 2019 lvm2-2.03.07-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 lvm2-libs-2.03.07-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 lvm2-dbusd-2.03.07-1.el8 BUILT: Mon Dec 2 00:12:23 CST 2019 device-mapper-1.02.167-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 device-mapper-libs-1.02.167-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 device-mapper-event-1.02.167-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 device-mapper-event-libs-1.02.167-1.el8 BUILT: Mon Dec 2 00:09:32 CST 2019 [root@hayes-02 ~]# lvcreate -T -L10 vg/pool Rounding up size to full physical extent 12.00 MiB Thin pool volume with chunk size 64.00 KiB can address at most 15.81 TiB of data. Logical volume "pool" created. [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvol0_pmspare] vg ewi------- 4.00m /dev/sdb1(0) pool vg twi-a-tz-- 12.00m 0.00 10.84 pool_tdata(0) [pool_tdata] vg Twi-ao---- 12.00m /dev/sdb1(1) [pool_tmeta] vg ewi-ao---- 4.00m /dev/sdg1(0) [root@hayes-02 ~]# lvconvert -y --ty raid5 --stripes 3 vg/pool_tdata Using default stripesize 64.00 KiB. Replaced LV type raid5 (same as raid5_ls) with possible type raid1. Repeat this command to convert to raid5 after an interim conversion has finished. --stripes not allowed for LV vg/pool_tdata when converting from linear to raid1. Logical volume vg/pool_tdata successfully converted. [root@hayes-02 ~]# lvconvert -y --ty raid5 --stripes 3 vg/pool_tdata Using default stripesize 64.00 KiB. --stripes not allowed for LV vg/pool_tdata when converting from raid1 to raid5. Logical volume vg/pool_tdata successfully converted. [root@hayes-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvol0_pmspare] vg ewi------- 4.00m /dev/sdb1(0) pool vg twi-a-tz-- 12.00m 0.00 10.84 pool_tdata(0) [pool_tdata] vg rwi-aor--- 12.00m 100.00 pool_tdata_rimage_0(0),pool_tdata_rimage_1(0) [pool_tdata_rimage_0] vg iwi-aor--- 12.00m /dev/sdb1(1) [pool_tdata_rimage_1] vg iwi-aor--- 12.00m /dev/sdc1(1) [pool_tdata_rmeta_0] vg ewi-aor--- 4.00m /dev/sdb1(4) [pool_tdata_rmeta_1] vg ewi-aor--- 4.00m /dev/sdc1(0) [pool_tmeta] vg ewi-ao---- 4.00m /dev/sdg1(0) [root@hayes-02 ~]# lvconvert -y --ty raid5 --stripes 3 vg/pool_tdata Using default stripesize 64.00 KiB. WARNING: Adding stripes to active and open logical volume vg/pool_tdata will grow it from 3 to 9 extents! Run "lvresize -l3 vg/pool_tdata" to shrink it or use the additional capacity. Internal error: Performing unsafe table load while 11 device(s) are known to be suspended: (253:1) Jan 7 10:20:51 hayes-02 kernel: device-mapper: raid: Device 3 specified for rebuild; clearing superblock Jan 7 10:20:51 hayes-02 kernel: md/raid:mdX: device dm-4 operational as raid disk 0 Jan 7 10:20:51 hayes-02 kernel: md/raid:mdX: device dm-6 operational as raid disk 1 Jan 7 10:20:51 hayes-02 kernel: md/raid:mdX: raid level 5 active with 2 out of 2 devices, algorithm 2 Jan 7 10:20:51 hayes-02 lvm[29105]: No longer monitoring RAID device vg-pool_tdata for events. Jan 7 10:20:51 hayes-02 dmeventd[29105]: No longer monitoring thin pool vg-pool. Jan 7 10:20:59 hayes-02 systemd-logind[1532]: New session 3 of user root. Jan 7 10:20:59 hayes-02 systemd[1]: Started Session 3 of user root. Jan 7 10:23:25 hayes-02 kernel: INFO: task lvconvert:29276 blocked for more than 120 seconds. Jan 7 10:23:25 hayes-02 kernel: Not tainted 4.18.0-167.el8.x86_64 #1 Jan 7 10:23:25 hayes-02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 7 10:23:25 hayes-02 kernel: lvconvert D 0 29276 2349 0x00000080 Jan 7 10:23:25 hayes-02 kernel: Call Trace: Jan 7 10:23:25 hayes-02 kernel: ? __schedule+0x24f/0x650 Jan 7 10:23:25 hayes-02 kernel: ? try_to_wake_up+0x54/0x570 Jan 7 10:23:25 hayes-02 kernel: ? path_lookupat.isra.50+0xa7/0x200 Jan 7 10:23:25 hayes-02 kernel: schedule+0x2f/0xa0 Jan 7 10:23:25 hayes-02 kernel: schedule_timeout+0x20d/0x310 Jan 7 10:23:25 hayes-02 kernel: ? dm_make_request+0x74/0x90 [dm_mod] Jan 7 10:23:25 hayes-02 kernel: io_schedule_timeout+0x19/0x40 Jan 7 10:23:25 hayes-02 kernel: wait_for_completion_io+0x11f/0x190 Jan 7 10:23:25 hayes-02 kernel: ? wake_up_q+0x70/0x70 Jan 7 10:23:25 hayes-02 kernel: submit_bio_wait+0x5b/0x80 Jan 7 10:23:25 hayes-02 kernel: sync_page_io+0x127/0x160 Jan 7 10:23:25 hayes-02 kernel: read_disk_sb+0x39/0x50 [dm_raid] Jan 7 10:23:25 hayes-02 kernel: raid_ctr+0x6aa/0x1838 [dm_raid] Jan 7 10:23:25 hayes-02 kernel: ? dm_split_args+0x63/0x160 [dm_mod] Jan 7 10:23:25 hayes-02 kernel: dm_table_add_target+0x17d/0x360 [dm_mod] Jan 7 10:23:25 hayes-02 kernel: table_load+0x122/0x2e0 [dm_mod] Jan 7 10:23:25 hayes-02 kernel: ? dev_status+0x40/0x40 [dm_mod] Jan 7 10:23:25 hayes-02 kernel: ctl_ioctl+0x1af/0x3f0 [dm_mod] Jan 7 10:23:25 hayes-02 kernel: ? selinux_file_ioctl+0x90/0x200 Jan 7 10:23:25 hayes-02 kernel: dm_ctl_ioctl+0xa/0x10 [dm_mod] Jan 7 10:23:25 hayes-02 kernel: do_vfs_ioctl+0xa4/0x630 Jan 7 10:23:25 hayes-02 kernel: ksys_ioctl+0x60/0x90 Jan 7 10:23:25 hayes-02 kernel: __x64_sys_ioctl+0x16/0x20 Jan 7 10:23:25 hayes-02 kernel: do_syscall_64+0x5b/0x1a0 Jan 7 10:23:25 hayes-02 kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca Jan 7 10:23:25 hayes-02 kernel: RIP: 0033:0x7fefb653487b Jan 7 10:23:25 hayes-02 kernel: Code: Bad RIP value. Jan 7 10:23:25 hayes-02 kernel: RSP: 002b:00007ffd2aeaf058 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 Jan 7 10:23:25 hayes-02 kernel: RAX: ffffffffffffffda RBX: 000055c6a285acb0 RCX: 00007fefb653487b Jan 7 10:23:25 hayes-02 kernel: RDX: 000055c6a49a1b10 RSI: 00000000c138fd09 RDI: 0000000000000003 Jan 7 10:23:25 hayes-02 kernel: RBP: 000055c6a2908917 R08: 0000000000000000 R09: 00007ffd2aeaeec0 Jan 7 10:23:25 hayes-02 kernel: R10: 000055c6a2974f46 R11: 0000000000000206 R12: 0000000000000000 Jan 7 10:23:25 hayes-02 kernel: R13: 000055c6a49a1b40 R14: 000055c6a49a1b10 R15: 000055c6a5296090 Jan 7 10:25:28 hayes-02 kernel: INFO: task lvconvert:29276 blocked for more than 120 seconds. Jan 7 10:25:28 hayes-02 kernel: Not tainted 4.18.0-167.el8.x86_64 #1 Jan 7 10:25:28 hayes-02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 7 10:25:28 hayes-02 kernel: lvconvert D 0 29276 2349 0x00000080 Jan 7 10:25:28 hayes-02 kernel: Call Trace: Jan 7 10:25:28 hayes-02 kernel: ? __schedule+0x24f/0x650 Jan 7 10:25:28 hayes-02 kernel: ? try_to_wake_up+0x54/0x570 Jan 7 10:25:28 hayes-02 kernel: ? path_lookupat.isra.50+0xa7/0x200 Jan 7 10:25:28 hayes-02 kernel: schedule+0x2f/0xa0 Jan 7 10:25:28 hayes-02 kernel: schedule_timeout+0x20d/0x310 Jan 7 10:25:28 hayes-02 kernel: ? dm_make_request+0x74/0x90 [dm_mod] Jan 7 10:25:28 hayes-02 kernel: io_schedule_timeout+0x19/0x40 Jan 7 10:25:28 hayes-02 kernel: wait_for_completion_io+0x11f/0x190 Jan 7 10:25:28 hayes-02 kernel: ? wake_up_q+0x70/0x70 Jan 7 10:25:28 hayes-02 kernel: submit_bio_wait+0x5b/0x80 Jan 7 10:25:28 hayes-02 kernel: sync_page_io+0x127/0x160 Jan 7 10:25:28 hayes-02 kernel: read_disk_sb+0x39/0x50 [dm_raid] Jan 7 10:25:28 hayes-02 kernel: raid_ctr+0x6aa/0x1838 [dm_raid] Jan 7 10:25:28 hayes-02 kernel: ? dm_split_args+0x63/0x160 [dm_mod] Jan 7 10:25:28 hayes-02 kernel: dm_table_add_target+0x17d/0x360 [dm_mod] Jan 7 10:25:28 hayes-02 kernel: table_load+0x122/0x2e0 [dm_mod] Jan 7 10:25:28 hayes-02 kernel: ? dev_status+0x40/0x40 [dm_mod] Jan 7 10:25:28 hayes-02 kernel: ctl_ioctl+0x1af/0x3f0 [dm_mod] Jan 7 10:25:28 hayes-02 kernel: ? selinux_file_ioctl+0x90/0x200 Jan 7 10:25:28 hayes-02 kernel: dm_ctl_ioctl+0xa/0x10 [dm_mod] Jan 7 10:25:28 hayes-02 kernel: do_vfs_ioctl+0xa4/0x630 Jan 7 10:25:28 hayes-02 kernel: ksys_ioctl+0x60/0x90 Jan 7 10:25:28 hayes-02 kernel: __x64_sys_ioctl+0x16/0x20 Jan 7 10:25:28 hayes-02 kernel: do_syscall_64+0x5b/0x1a0
Yep - it's the original BZ 1782045 - which lead to this report - but to hit the oops - patch for raid needs to be applied on RAID target. otherwise we end with suspended devices blocking progress.
Seems like pool_ctr() completes but the table load goes on to fail due to size issues: device-mapper: table: 253:2: dm-1 too small for target: start=0, len=67584, dev_size=61440 Then __commit_transaction() is called via postsuspend. But somehow pmd->pre_commit_context is NULL. Not _exactly_ sure on the exact sequence of what is occurring so I'll just reproduce and hopefully chase it down.
Just showing my work, please scroll to bottom of this comment to see unexpected success (no crash). [root@thegoat ~]# vgcreate vg /dev/nvme0n1p[1-4] File descriptor 6 (/dev/pts/2) leaked on vgcreate invocation. Parent PID 2041: /bin/bash Physical volume "/dev/nvme0n1p1" successfully created. Physical volume "/dev/nvme0n1p2" successfully created. Physical volume "/dev/nvme0n1p3" successfully created. Physical volume "/dev/nvme0n1p4" successfully created. Volume group "vg" successfully created [root@thegoat ~]# lvcreate -T -L10 vg/pool File descriptor 6 (/dev/pts/2) leaked on lvcreate invocation. Parent PID 2041: /bin/bash Rounding up size to full physical extent 12.00 MiB Thin pool volume with chunk size 64.00 KiB can address at most 15.81 TiB of data. Logical volume "pool" created. [root@thegoat ~]# lvs -a -o +devices File descriptor 6 (/dev/pts/2) leaked on lvs invocation. Parent PID 2041: /bin/bash LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices [lvol0_pmspare] vg ewi------- 4.00m /dev/nvme0n1p1(0) pool vg twi-a-tz-- 12.00m 0.00 10.84 pool_tdata(0) [pool_tdata] vg Twi-ao---- 12.00m /dev/nvme0n1p1(1) [pool_tmeta] vg ewi-ao---- 4.00m /dev/nvme0n1p4(0) [root@thegoat ~]# [root@thegoat ~]# [root@thegoat ~]# lvconvert -y --ty raid5 --stripes 3 vg/pool_tdata File descriptor 6 (/dev/pts/2) leaked on lvconvert invocation. Parent PID 2041: /bin/bash Using default stripesize 64.00 KiB. Replaced LV type raid5 (same as raid5_ls) with possible type raid1. Repeat this command to convert to raid5 after an interim conversion has finished. --stripes not allowed for LV vg/pool_tdata when converting from linear to raid1. Logical volume vg/pool_tdata successfully converted. [root@thegoat ~]# lvconvert -y --ty raid5 --stripes 3 vg/pool_tdata File descriptor 6 (/dev/pts/2) leaked on lvconvert invocation. Parent PID 2041: /bin/bash Using default stripesize 64.00 KiB. --stripes not allowed for LV vg/pool_tdata when converting from raid1 to raid5. Logical volume vg/pool_tdata successfully converted. [root@thegoat ~]# lvconvert -y --ty raid5 --stripes 3 vg/pool_tdata File descriptor 6 (/dev/pts/2) leaked on lvconvert invocation. Parent PID 2041: /bin/bash Using default stripesize 64.00 KiB. WARNING: Adding stripes to active and open logical volume vg/pool_tdata will grow it from 3 to 9 extents! Run "lvresize -l3 vg/pool_tdata" to shrink it or use the additional capacity. device-mapper: resume ioctl on (253:2) failed: Invalid argument Unable to resume vg-pool (253:2). Problem reactivating logical volume vg/pool. Reshape request failed on LV vg/pool_tdata. Releasing activation in critical section. libdevmapper exiting with 1 device(s) still suspended. [root@thegoat ~]# dmsetup info vg-pool Name: vg-pool State: SUSPENDED Read Ahead: 8192 Tables present: LIVE Open count: 0 Event number: 0 Major, minor: 253, 2 Number of targets: 1 UUID: LVM-TiaRkyhhNy4nb38EvoL7JOGQBcC0RWic7VPIDfk5SrqzvQALskYsXdtquWP9nkig-tpool [root@thegoat ~]# dmsetup table vg-pool_tdata_rimage_2: 0 32768 linear 259:8 10240 vg-pool_tdata_rmeta_0: 0 8192 linear 259:6 34816 vg-pool_tdata_rimage_1: 0 32768 linear 259:7 10240 vg-pool: 0 24576 thin-pool 253:0 253:1 128 0 0 vg-pool_tdata_rimage_0: 0 24576 linear 259:6 10240 vg-pool_tdata_rimage_0: 24576 8192 linear 259:6 43008 vg-pool_tdata: 0 73728 raid raid5_ls 7 128 rebuild 2 rebuild 3 region_size 4096 4 253:3 253:4 253:5 253:6 253:7 253:8 253:9 253:10 vg-pool_tmeta: 0 8192 linear 259:9 2048 vg-pool_tdata_rmeta_3: 0 8192 linear 259:9 10240 vg-pool_tdata_rmeta_2: 0 8192 linear 259:8 2048 vg-pool_tdata_rimage_3: 0 32768 linear 259:9 18432 vg-pool_tdata_rmeta_1: 0 8192 linear 259:7 2048 [root@thegoat ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme0n1 259:1 0 260.9G 0 disk ├─nvme0n1p3 259:8 0 60G 0 part │ ├─vg-pool_tdata_rimage_2 253:8 0 16M 0 lvm │ │ └─vg-pool_tdata 253:1 0 36M 0 lvm │ │ └─vg-pool 253:2 0 12M 0 lvm │ └─vg-pool_tdata_rmeta_2 253:7 0 4M 0 lvm │ └─vg-pool_tdata 253:1 0 36M 0 lvm │ └─vg-pool 253:2 0 12M 0 lvm ├─nvme0n1p1 259:6 0 60G 0 part │ ├─vg-pool_tdata_rimage_0 253:4 0 16M 0 lvm │ │ └─vg-pool_tdata 253:1 0 36M 0 lvm │ │ └─vg-pool 253:2 0 12M 0 lvm │ └─vg-pool_tdata_rmeta_0 253:3 0 4M 0 lvm │ └─vg-pool_tdata 253:1 0 36M 0 lvm │ └─vg-pool 253:2 0 12M 0 lvm ├─nvme0n1p4 259:9 0 60G 0 part │ ├─vg-pool_tdata_rimage_3 253:10 0 16M 0 lvm │ │ └─vg-pool_tdata 253:1 0 36M 0 lvm │ │ └─vg-pool 253:2 0 12M 0 lvm │ ├─vg-pool_tmeta 253:0 0 4M 0 lvm │ │ └─vg-pool 253:2 0 12M 0 lvm │ └─vg-pool_tdata_rmeta_3 253:9 0 4M 0 lvm │ └─vg-pool_tdata 253:1 0 36M 0 lvm │ └─vg-pool 253:2 0 12M 0 lvm └─nvme0n1p2 259:7 0 60G 0 part ├─vg-pool_tdata_rimage_1 253:6 0 16M 0 lvm │ └─vg-pool_tdata 253:1 0 36M 0 lvm │ └─vg-pool 253:2 0 12M 0 lvm └─vg-pool_tdata_rmeta_1 253:5 0 4M 0 lvm └─vg-pool_tdata 253:1 0 36M 0 lvm └─vg-pool 253:2 0 12M 0 lvm [root@thegoat ~]# dmsetup info vg-pool_tdata Name: vg-pool_tdata State: ACTIVE Read Ahead: 1024 Tables present: LIVE Open count: 1 Event number: 1 Major, minor: 253, 1 Number of targets: 1 UUID: LVM-TiaRkyhhNy4nb38EvoL7JOGQBcC0RWic2L9qCKncfl51biu9tyE515aGZCBLvMBK-tdata [root@thegoat ~]# dmsetup resume vg-pool console shows: [ 1840.500421] md/raid:mdX: device dm-4 operational as raid disk 0 [ 1840.506351] md/raid:mdX: device dm-6 operational as raid disk 1 [ 1840.512551] md/raid:mdX: raid level 5 active with 2 out of 2 devices, algorithm 2 [ 1840.525984] device-mapper: raid: raid456 discard support disabled due to discard_zeroes_data uncertainty. [ 1840.535561] device-mapper: raid: Set dm-raid.devices_handle_discard_safely=Y to override. [ 1853.716383] device-mapper: raid: Device 2 specified for rebuild; clearing superblock [ 1853.724128] device-mapper: raid: Device 3 specified for rebuild; clearing superblock [ 1853.732717] md/raid:mdX: device dm-4 operational as raid disk 0 [ 1853.738642] md/raid:mdX: device dm-6 operational as raid disk 1 [ 1853.744769] md/raid:mdX: raid level 5 active with 2 out of 2 devices, algorithm 2 [ 1853.759966] device-mapper: raid: raid456 discard support disabled due to discard_zeroes_data uncertainty. [ 1853.769530] device-mapper: raid: Set dm-raid.devices_handle_discard_safely=Y to override. [ 1853.847610] device-mapper: thin: Data device (dm-1) discard unsupported: Disabling discard passdown. [ 1853.856738] device-mapper: table: 253:2: dm-1 too small for target: start=0, len=98304, dev_size=73728 # dmsetup remove vg-pool but no crash... nothing on console.. and vg-pool is gone: # dmsetup info vg-pool Device does not exist. Command failed. But I'm testing latest dm-5.6 kernel (5.5-rc5 based, which also has Heinz's fix) with RHEL7.8-beta1's lvm2: # rpm -q lvm2 device-mapper lvm2-2.02.186-2.el7.x86_64 device-mapper-1.02.164-2.el7.x86_64
For the reproduction this patch for raid target is mandatory: https://www.redhat.com/archives/dm-devel/2019-December/msg00065.html Then after the 3rd. call of lvconvert - lvm should finish command with just a single device left in suspend (aka vg-pool) Once this device is resumed then removing devices will provoke OOPS. Note - it's pretty much clear ATM the table manipulation is wrong with reshaping - as we are changing size of reshaped device up & DOWN and clearly this can be going one while device is stacked - so upon 'standard' work I assume this OOPS can't be triggered. It's likely some corner case of error handling.
Created attachment 1650621 [details] Kernel config file for 5.5-rc5+
(In reply to Zdenek Kabelac from comment #5) > For the reproduction this patch for raid target is mandatory: > > https://www.redhat.com/archives/dm-devel/2019-December/msg00065.html > > Then after the 3rd. call of lvconvert - lvm should finish command with just > a single device left in suspend (aka vg-pool) > Once this device is resumed then removing devices will provoke OOPS. I did that, doesn't OOPS for me. > Note - it's pretty much clear ATM the table manipulation is wrong with > reshaping - as we are changing size of reshaped device up & DOWN and clearly > this can be going one while device is stacked - so upon 'standard' work I > assume this OOPS can't be triggered. It's likely some corner case of error > handling. In the end you're saying that resuming the suspended device (which still has old table) is trigger for OOPS. You clearly hit an OOPS, I just don't see why and cannot reproduce. But I'll analyze more and think harder ;)
(In reply to Mike Snitzer from comment #7) > In the end you're saying that resuming the suspended device (which still has > old table) is trigger for OOPS. Sorry, the device remove after suspend is triggering your OOPS. Regardless, by point still stands, not yet seeing _why_.
(In reply to Mike Snitzer from comment #8) > (In reply to Mike Snitzer from comment #7) > > > In the end you're saying that resuming the suspended device (which still has > > old table) is trigger for OOPS. > > Sorry, the device remove after suspend is triggering your OOPS. Regardless, > by point still stands, not yet seeing _why_. Ha, remove after _resume_
Been trying _hard_ to reproduce, still cannot, resorting to trying to run Zdenek's minimalist 2GB debian based kvm image and have hit all sorts of redhat vs debian KVM usage quirks. Bottomline, Zdenek is doing his testing on the least "enterprise" testbed that exists (even using loopback devices for his storage devices). Inclined to close WORKSFORME
Reopening with log where the test is purely executer on top /dev/sda (scsi_debug) and 6 DM devices are made on top of this target. (same happens when scsi_debug is replaced with brd device aka /dev/ram0) Providing full log of lvm2 test suite element - hopefully this will help more - I'm thinkinkg which of the _DEBUG_ symbols should I enable for kernel compilation to see it more easily. Few more interesting notes might also help here - the OOPS issue is not occurring when lvcoversion 'breaks' pool_tmeta. And for this test - actually _tdata was not likely used *AT ALL* (since there we not any thin LV actually created) When I create 1 thin LV on this badly reshaped&resized thin-pool the thin-pool destruction proceeds and finishes normally. So it seems the OOPS occurs only for unused thin-pool case (on my setup) (and happens without SMP)
Created attachment 1651806 [details] Log of test execution using scsi_debug backend device
I've looked more closely in the ordering - there is needed one extra call for 'dmsetup status'. (as I've this embedded into my debugging version of lvm2 - and it didn't came to me initially as an important difference) dmsetup status vg-pool dmsetup resume vg-pool <--- hits OOPS
This seems to be caused by the commit 694cfe7f31db36912725e63a38a5179c8628a496
I can confirm that replacing 'metadata_pre_commit_callback()' internals with just 'return 0' fixes the problem for me. So there is something 'weird' with _tdata flushing logic (if _tdata are on badly working device)
(In reply to Mikuláš Patočka from comment #15) > This seems to be caused by the commit > 694cfe7f31db36912725e63a38a5179c8628a496 This isn't news, well aware of that commit being the source of this issue. The bdev is getting free'd after the thin-pool table load that fails (during the last lvconvert that fails). After resuming the thin-pool, running 'dmsetup vg-pool' triggers OOPS during commit of the postsuspend hook when the new precommit hook tries to access the thin-pool's data device that _should_ still exist because the resumed vg-pool still has a holder on it via that thin-pool table's dm_get_device(). Yet oops is occurring...
(In reply to Mike Snitzer from comment #17) > (In reply to Mikuláš Patočka from comment #15) > > This seems to be caused by the commit > > 694cfe7f31db36912725e63a38a5179c8628a496 > > This isn't news, well aware of that commit being the source of this issue. > > The bdev is getting free'd after the thin-pool table load that fails (during > the last lvconvert that fails). > After resuming the thin-pool, running 'dmsetup vg-pool' triggers OOPS during > commit of the postsuspend hook when the new precommit hook tries to access > the thin-pool's data device that _should_ still exist because the resumed > vg-pool still has a holder on it via that thin-pool table's dm_get_device(). > > Yet oops is occurring... Mikulas says oops is due to stale 'struct pool_c' pointer stored in in-core pool metadata. I'll buy that, just need to track down why that is occurring. Reopening this bug was appropriate.
I can easily reproduce using procedure detailed in comment#0 with these additional Kconfig CONFIG_ options enabled (thanks to Mikulas): # diff -Naur .config.5.5.0-rc5.snitm+_without_slub_debgging_on_and_enhanced .config --- .config.5.5.0-rc5.snitm+_without_slub_debgging_on_and_enhanced 2020-01-13 10:41:17.903198330 -0500 +++ .config 2020-01-13 10:41:27.389207112 -0500 @@ -5198,11 +5198,20 @@ # CONFIG_PAGE_EXTENSION is not set # CONFIG_DEBUG_PAGEALLOC is not set # CONFIG_PAGE_OWNER is not set -# CONFIG_PAGE_POISONING is not set +CONFIG_PAGE_POISONING=y +CONFIG_PAGE_POISONING_NO_SANITY=y +# CONFIG_PAGE_POISONING_ZERO is not set # CONFIG_DEBUG_PAGE_REF is not set CONFIG_DEBUG_RODATA_TEST=y -# CONFIG_DEBUG_OBJECTS is not set -# CONFIG_SLUB_DEBUG_ON is not set +CONFIG_DEBUG_OBJECTS=y +# CONFIG_DEBUG_OBJECTS_SELFTEST is not set +CONFIG_DEBUG_OBJECTS_FREE=y +CONFIG_DEBUG_OBJECTS_TIMERS=y +# CONFIG_DEBUG_OBJECTS_WORK is not set +# CONFIG_DEBUG_OBJECTS_RCU_HEAD is not set +# CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER is not set +CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1 +CONFIG_SLUB_DEBUG_ON=y # CONFIG_SLUB_STATS is not set CONFIG_HAVE_DEBUG_KMEMLEAK=y # CONFIG_DEBUG_KMEMLEAK is not set @@ -5273,6 +5282,7 @@ CONFIG_STACKTRACE=y # CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set # CONFIG_DEBUG_KOBJECT is not set +# CONFIG_DEBUG_KOBJECT_RELEASE is not set # # Debug kernel data structures
These 4 commits have been staged for upstream Linux 5.6 inclusion: 0031fd86398f dm thin: change data device's flush_bio to be member of struct pool fe64369163c5 dm thin: don't allow changing data device during thin-pool load 08285c05b4ce dm thin: fix use-after-free in metadata_pre_commit_callback 6d773d89a444 dm thin metadata: use pool locking at end of dm_pool_metadata_close They'll get sent to Linus when the 5.6 merge window opens (likely Jan 26).
I had to rebase (to tweak some commit headers) so here are the updated 5.6 commits: 44d8ebf43639 dm thin metadata: use pool locking at end of dm_pool_metadata_close a4a8d286586d dm thin: fix use-after-free in metadata_pre_commit_callback 873937e75f9a dm thin: don't allow changing data device during thin-pool reload f06c03d1ded2 dm thin: change data device's flush_bio to be member of struct pool
These all went upstream for Linux 5.6, closing CURRENTRELEASE