Red Hat Bugzilla – Bug 1439399
RAID TAKEOVER: takeover on raid volumes containing snapshots doesn't work
Last modified: 2018-04-10 11:21:33 EDT
Description of problem: [root@host-122 ~]# lvcreate -m 1 --type raid1 -n LV2 -L 100M black_bird Logical volume "LV2" created. [root@host-122 ~]# lvcreate -m 1 --type raid1 -n LV3 -L 100M black_bird Logical volume "LV3" created. [root@host-122 ~]# lvcreate -m 1 --type raid1 -n LV4 -L 100M black_bird Logical volume "LV4" created. # raid1 -> raid5_n appears to work fine w/o a snapshot [root@host-122 ~]# lvconvert --type raid5_n black_bird/LV2 Using default stripesize 64.00 KiB. Logical volume black_bird/LV2 successfully converted. [root@host-122 ~]# lvs -a -o +devices,segtype LV VG Attr LSize Cpy%Sync Devices Type LV2 black_bird rwi-a-r--- 100.00m 100.00 LV2_rimage_0(0),LV2_rimage_1(0) raid5_n [LV2_rimage_0] black_bird iwi-aor--- 100.00m /dev/sdf1(27) linear [LV2_rimage_1] black_bird iwi-aor--- 100.00m /dev/sdh1(27) linear [LV2_rmeta_0] black_bird ewi-aor--- 4.00m /dev/sdf1(26) linear [LV2_rmeta_1] black_bird ewi-aor--- 4.00m /dev/sdh1(26) linear # Create snapshots of raids LV3 and LV4 [root@host-122 ~]# lvcreate -L 12M -s black_bird/LV3 Using default stripesize 64.00 KiB. Logical volume "lvol0" created. [root@host-122 ~]# lvcreate -L 12M -s black_bird/LV4 Using default stripesize 64.00 KiB. Logical volume "lvol1" created. [root@host-122 ~]# lvconvert --type raid4 black_bird/LV3 Using default stripesize 64.00 KiB. Internal error: Writing metadata in critical section. Logical Volume LV3_rimage_0 already exists in volume group black_bird. [root@host-122 ~]# lvconvert --type raid5 black_bird/LV4 Using default stripesize 64.00 KiB. Internal error: Writing metadata in critical section. Logical volume black_bird/LV4 successfully converted. # This is with an actual filesystem and running I/O: [root@host-121 ~]# lvconvert --type raid5_n black_bird/synced_primary_raid1_2legs_1 Using default stripesize 64.00 KiB. Internal error: Writing metadata in critical section. [DEADLOCK] Apr 5 17:21:15 host-121 kernel: md/raid:mdX: device dm-3 operational as raid disk 0 Apr 5 17:21:15 host-121 kernel: md/raid:mdX: device dm-5 operational as raid disk 1 Apr 5 17:21:15 host-121 kernel: md/raid:mdX: raid level 5 active with 2 out of 2 devices, algorithm 5 Apr 5 17:21:15 host-121 lvm[21484]: No longer monitoring RAID device black_bird-synced_primary_raid1_2legs_1-real for events. Apr 5 17:21:15 host-121 dmeventd[21484]: No longer monitoring snapshot black_bird-bb_snap1. Apr 5 17:23:51 host-121 kernel: INFO: task xfsaild/dm-6:21594 blocked for more than 120 seconds. Apr 5 17:23:51 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 5 17:23:51 host-121 kernel: xfsaild/dm-6 D ffff88003b000fb0 0 21594 2 0x00000080 Apr 5 17:23:51 host-121 kernel: ffff88002028fd48 0000000000000046 ffff88002028ffd8 ffff88002028ffd8 Apr 5 17:23:51 host-121 kernel: ffff88002028ffd8 0000000000016cc0 ffffffffbcbdd460 ffff88003cf81f00 Apr 5 17:23:51 host-121 kernel: 0000000000000000 ffff88003b000fb0 ffff88003b3aed28 ffff88001f5ae000 Apr 5 17:23:51 host-121 kernel: Call Trace: Apr 5 17:23:51 host-121 kernel: [<ffffffffbc8960f9>] schedule+0x29/0x70 Apr 5 17:23:51 host-121 kernel: [<ffffffffc0303d36>] _xfs_log_force+0x1c6/0x2c0 [xfs] Apr 5 17:23:51 host-121 kernel: [<ffffffffbc2c2280>] ? wake_up_state+0x20/0x20 Apr 5 17:23:51 host-121 kernel: [<ffffffffc030fbbc>] ? xfsaild+0x16c/0x6f0 [xfs] Apr 5 17:23:51 host-121 kernel: [<ffffffffc0303e5c>] xfs_log_force+0x2c/0x70 [xfs] Apr 5 17:23:51 host-121 kernel: [<ffffffffc030fa50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] Apr 5 17:23:51 host-121 kernel: [<ffffffffc030fbbc>] xfsaild+0x16c/0x6f0 [xfs] Apr 5 17:23:51 host-121 kernel: [<ffffffffc030fa50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] Apr 5 17:23:51 host-121 kernel: [<ffffffffbc2ae9bf>] kthread+0xcf/0xe0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc2ae8f0>] ? insert_kthread_work+0x40/0x40 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc8a1b18>] ret_from_fork+0x58/0x90 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc2ae8f0>] ? insert_kthread_work+0x40/0x40 Apr 5 17:23:51 host-121 kernel: INFO: task xdoio:21696 blocked for more than 120 seconds. Apr 5 17:23:51 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 5 17:23:51 host-121 kernel: xdoio D ffff88003ab01f60 0 21696 21695 0x00000080 Apr 5 17:23:51 host-121 kernel: ffff88001e8b3e78 0000000000000082 ffff88001e8b3fd8 ffff88001e8b3fd8 Apr 5 17:23:51 host-121 kernel: ffff88001e8b3fd8 0000000000016cc0 ffff88001e133ec0 ffff88003b232800 Apr 5 17:23:51 host-121 kernel: 0000000000000001 0000000000000001 0000000000000000 ffff88003b232b08 Apr 5 17:23:51 host-121 kernel: Call Trace: Apr 5 17:23:51 host-121 kernel: [<ffffffffbc8960f9>] schedule+0x29/0x70 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc4014ae>] __sb_start_write+0xde/0x110 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc2afa00>] ? wake_up_atomic_t+0x30/0x30 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc3fe5eb>] vfs_write+0x1ab/0x1e0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc3ff30f>] SyS_write+0x7f/0xe0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc8a1bc9>] system_call_fastpath+0x16/0x1b Apr 5 17:23:51 host-121 kernel: INFO: task lvconvert:21730 blocked for more than 120 seconds. Apr 5 17:23:51 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 5 17:23:51 host-121 kernel: lvconvert D ffff88003b04bec0 0 21730 2606 0x00000080 Apr 5 17:23:51 host-121 kernel: ffff88001b1078b0 0000000000000086 ffff88001b107fd8 ffff88001b107fd8 Apr 5 17:23:51 host-121 kernel: ffff88001b107fd8 0000000000016cc0 ffff88001e136dd0 ffff88003fc16cc0 Apr 5 17:23:51 host-121 kernel: 0000000000000000 7fffffffffffffff ffff88003ff6ace8 ffffffffbc894380 Apr 5 17:23:51 host-121 kernel: Call Trace: Apr 5 17:23:51 host-121 kernel: [<ffffffffbc894380>] ? bit_wait+0x50/0x50 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc8960f9>] schedule+0x29/0x70 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc893d69>] schedule_timeout+0x239/0x2c0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc2c803e>] ? account_entity_dequeue+0xae/0xd0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc2cba5c>] ? dequeue_entity+0x11c/0x5d0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc260ede>] ? kvm_clock_get_cycles+0x1e/0x20 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc894380>] ? bit_wait+0x50/0x50 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc8958dd>] io_schedule_timeout+0xad/0x130 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc895978>] io_schedule+0x18/0x1a Apr 5 17:23:51 host-121 kernel: [<ffffffffbc894391>] bit_wait_io+0x11/0x50 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc893eb5>] __wait_on_bit+0x65/0x90 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc37f231>] wait_on_page_bit+0x81/0xa0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc2afac0>] ? wake_bit_function+0x40/0x40 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc37f361>] __filemap_fdatawait_range+0x111/0x190 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc382157>] filemap_fdatawait_keep_errors+0x27/0x30 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc42af9d>] sync_inodes_sb+0x16d/0x1f0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc430833>] sync_filesystem+0x63/0xb0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc4017bf>] freeze_super+0x8f/0x130 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc43b705>] freeze_bdev+0x75/0xd0 Apr 5 17:23:51 host-121 kernel: [<ffffffffc01c7868>] __dm_suspend+0xf8/0x210 [dm_mod] Apr 5 17:23:51 host-121 kernel: [<ffffffffc01c9ea0>] dm_suspend+0xc0/0xd0 [dm_mod] Apr 5 17:23:51 host-121 kernel: [<ffffffffc01cf414>] dev_suspend+0x194/0x250 [dm_mod] Apr 5 17:23:51 host-121 kernel: [<ffffffffc01cf280>] ? table_load+0x390/0x390 [dm_mod] Apr 5 17:23:51 host-121 kernel: [<ffffffffc01cfc45>] ctl_ioctl+0x1e5/0x500 [dm_mod] Apr 5 17:23:51 host-121 kernel: [<ffffffffc01cff73>] dm_ctl_ioctl+0x13/0x20 [dm_mod] Apr 5 17:23:51 host-121 kernel: [<ffffffffbc41264d>] do_vfs_ioctl+0x33d/0x540 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc4b072f>] ? file_has_perm+0x9f/0xb0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc4009ee>] ? ____fput+0xe/0x10 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc4128f1>] SyS_ioctl+0xa1/0xc0 Apr 5 17:23:51 host-121 kernel: [<ffffffffbc8a1bc9>] system_call_fastpath+0x16/0x1b Version-Release number of selected component (if applicable): 3.10.0-635.el7.x86_64 lvm2-2.02.169-3.el7 BUILT: Wed Mar 29 09:17:46 CDT 2017 lvm2-libs-2.02.169-3.el7 BUILT: Wed Mar 29 09:17:46 CDT 2017 lvm2-cluster-2.02.169-3.el7 BUILT: Wed Mar 29 09:17:46 CDT 2017 device-mapper-1.02.138-3.el7 BUILT: Wed Mar 29 09:17:46 CDT 2017 device-mapper-libs-1.02.138-3.el7 BUILT: Wed Mar 29 09:17:46 CDT 2017 device-mapper-event-1.02.138-3.el7 BUILT: Wed Mar 29 09:17:46 CDT 2017 device-mapper-event-libs-1.02.138-3.el7 BUILT: Wed Mar 29 09:17:46 CDT 2017 device-mapper-persistent-data-0.7.0-0.1.rc6.el7 BUILT: Mon Mar 27 10:15:46 CDT 2017
This appears to be the case w/ all raid types. Scenario raid6_nr: Convert Striped raid6_nr volume ********* Take over hash info for this scenario ********* * from type: raid6_nr * to type: raid6_la_6 * snapshot: 1 ****************************************************** Creating original volume on host-121... host-121: lvcreate --type raid6_nr -i 3 -n takeover -L 500M centipede2 Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Current volume device structure: LV Attr LSize Cpy%Sync Devices takeover rwi-a-r--- 504.00m 100.00 takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0),takeover_rimage_4(0) [takeover_rimage_0] iwi-aor--- 168.00m /dev/sdg1(1) [takeover_rimage_1] iwi-aor--- 168.00m /dev/sde1(1) [takeover_rimage_2] iwi-aor--- 168.00m /dev/sda1(1) [takeover_rimage_3] iwi-aor--- 168.00m /dev/sdd1(1) [takeover_rimage_4] iwi-aor--- 168.00m /dev/sdc1(1) [takeover_rmeta_0] ewi-aor--- 4.00m /dev/sdg1(0) [takeover_rmeta_1] ewi-aor--- 4.00m /dev/sde1(0) [takeover_rmeta_2] ewi-aor--- 4.00m /dev/sda1(0) [takeover_rmeta_3] ewi-aor--- 4.00m /dev/sdd1(0) [takeover_rmeta_4] ewi-aor--- 4.00m /dev/sdc1(0) Creating ext on top of mirror(s) on host-121... mke2fs 1.42.9 (28-Dec-2013) Mounting mirrored ext filesystems on host-121... Writing verification files (checkit) to mirror(s) on... ---- host-121 ---- Sleeping 15 seconds to get some outsanding I/O locks before the failure Creating a snapshot volume of raid to be changed lvcreate --type snapshot -L 100M -n snap -s centipede2/takeover Verifying files (checkit) on mirror(s) on... ---- host-121 ---- lvconvert --yes --type raid6_la_6 centipede2/takeover Internal error: Writing metadata in critical section. Apr 12 15:24:30 host-121 qarshd[31678]: Running cmdline: lvconvert --yes --type raid6_la_6 centipede2/takeover Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-3 operational as raid disk 0 Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-5 operational as raid disk 1 Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-7 operational as raid disk 2 Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-9 operational as raid disk 3 Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-11 operational as raid disk 4 Apr 12 15:24:31 host-121 kernel: md/raid:mdX: raid level 6 active with 5 out of 5 devices, algorithm 9 Apr 12 15:24:31 host-121 lvm[9616]: No longer monitoring RAID device centipede2-takeover-real for events. Apr 12 15:24:31 host-121 dmeventd[9616]: No longer monitoring snapshot centipede2-snap. Apr 12 15:26:31 host-121 kernel: INFO: task jbd2/dm-12-8:31512 blocked for more than 120 seconds. Apr 12 15:26:31 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 12 15:26:31 host-121 kernel: jbd2/dm-12-8 D ffff88003b003ec0 0 31512 2 0x00000080 Apr 12 15:26:31 host-121 kernel: ffff88000496ba60 0000000000000046 ffff88000496bfd8 ffff88000496bfd8 Apr 12 15:26:31 host-121 kernel: ffff88000496bfd8 0000000000016cc0 ffff880020845e20 ffff88003fc16cc0 Apr 12 15:26:31 host-121 kernel: 0000000000000000 7fffffffffffffff ffff88003ff5a260 ffffffffbe494380 Apr 12 15:26:31 host-121 kernel: Call Trace: Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4960f9>] schedule+0x29/0x70 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493d69>] schedule_timeout+0x239/0x2c0 Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a1799>] ? __split_and_process_bio+0x2e9/0x520 [dm_mod] Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ede>] ? kvm_clock_get_cycles+0x1e/0x20 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdee6c3c>] ? ktime_get_ts64+0x4c/0xf0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4958dd>] io_schedule_timeout+0xad/0x130 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe495978>] io_schedule+0x18/0x1a Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494391>] bit_wait_io+0x11/0x50 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493eb5>] __wait_on_bit+0x65/0x90 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493f61>] out_of_line_wait_on_bit+0x81/0xb0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafac0>] ? wake_bit_function+0x40/0x40 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe03394a>] __wait_on_buffer+0x2a/0x30 Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b3110>] jbd2_write_superblock+0xa0/0x180 [jbd2] Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b3229>] jbd2_journal_update_sb_log_tail+0x39/0xa0 [jbd2] Apr 12 15:26:31 host-121 kernel: [<ffffffffc06ac7f4>] jbd2_journal_commit_transaction+0x17a4/0x1990 [jbd2] Apr 12 15:26:31 host-121 kernel: [<ffffffffbdec803e>] ? account_entity_dequeue+0xae/0xd0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdecba5c>] ? dequeue_entity+0x11c/0x5d0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ebe>] ? kvm_clock_read+0x1e/0x20 Apr 12 15:26:31 host-121 kernel: [<ffffffffbde29557>] ? __switch_to+0xd7/0x4c0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbde96edb>] ? lock_timer_base.isra.34+0x2b/0x50 Apr 12 15:26:31 host-121 kernel: [<ffffffffbde9738e>] ? try_to_del_timer_sync+0x5e/0x90 Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b1a89>] kjournald2+0xc9/0x260 [jbd2] Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafa00>] ? wake_up_atomic_t+0x30/0x30 Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b19c0>] ? commit_timeout+0x10/0x10 [jbd2] Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeae9bf>] kthread+0xcf/0xe0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbde8bf0b>] ? do_exit+0x6bb/0xa40 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeae8f0>] ? insert_kthread_work+0x40/0x40 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4a1b18>] ret_from_fork+0x58/0x90 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeae8f0>] ? insert_kthread_work+0x40/0x40 Apr 12 15:26:31 host-121 kernel: INFO: task xdoio:31533 blocked for more than 120 seconds. Apr 12 15:26:31 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 12 15:26:31 host-121 kernel: xdoio D ffff88003b005e20 0 31533 31532 0x00000080 Apr 12 15:26:31 host-121 kernel: ffff8800186efdb0 0000000000000082 ffff8800186effd8 ffff8800186effd8 Apr 12 15:26:31 host-121 kernel: ffff8800186effd8 0000000000016cc0 ffff880020c41f60 ffff88002379f000 Apr 12 15:26:31 host-121 kernel: 0000000000000001 0000000000000001 0000000000000000 ffff88002379f308 Apr 12 15:26:31 host-121 kernel: Call Trace: Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4960f9>] schedule+0x29/0x70 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0014ae>] __sb_start_write+0xde/0x110 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafa00>] ? wake_up_atomic_t+0x30/0x30 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdfff81e>] do_readv_writev+0x20e/0x260 Apr 12 15:26:31 host-121 kernel: [<ffffffffc06d4e10>] ? ext4_dax_fault+0x150/0x150 [ext4] Apr 12 15:26:31 host-121 kernel: [<ffffffffbdffd9c0>] ? do_sync_read+0xd0/0xd0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ede>] ? kvm_clock_get_cycles+0x1e/0x20 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdee816a>] ? __getnstimeofday64+0x3a/0xd0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdfff905>] vfs_writev+0x35/0x60 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdfffabf>] SyS_writev+0x7f/0x110 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4a1bc9>] system_call_fastpath+0x16/0x1b Apr 12 15:26:31 host-121 kernel: INFO: task lvconvert:31679 blocked for more than 120 seconds. Apr 12 15:26:31 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 12 15:26:31 host-121 kernel: lvconvert D ffff880021862f10 0 31679 31678 0x00000080 Apr 12 15:26:31 host-121 kernel: ffff88000107f8b0 0000000000000086 ffff88000107ffd8 ffff88000107ffd8 Apr 12 15:26:31 host-121 kernel: ffff88000107ffd8 0000000000016cc0 ffff880020840000 ffff88003fc16cc0 Apr 12 15:26:31 host-121 kernel: 0000000000000000 7fffffffffffffff ffff88003ff5d7e8 ffffffffbe494380 Apr 12 15:26:31 host-121 kernel: Call Trace: Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4960f9>] schedule+0x29/0x70 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493d69>] schedule_timeout+0x239/0x2c0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdec803e>] ? account_entity_dequeue+0xae/0xd0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdecba5c>] ? dequeue_entity+0x11c/0x5d0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ede>] ? kvm_clock_get_cycles+0x1e/0x20 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4958dd>] io_schedule_timeout+0xad/0x130 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe495978>] io_schedule+0x18/0x1a Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494391>] bit_wait_io+0x11/0x50 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493eb5>] __wait_on_bit+0x65/0x90 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdf7f231>] wait_on_page_bit+0x81/0xa0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafac0>] ? wake_bit_function+0x40/0x40 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdf7f361>] __filemap_fdatawait_range+0x111/0x190 Apr 12 15:26:31 host-121 kernel: [<ffffffffbdf82157>] filemap_fdatawait_keep_errors+0x27/0x30 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe02af9d>] sync_inodes_sb+0x16d/0x1f0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe030833>] sync_filesystem+0x63/0xb0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0017bf>] freeze_super+0x8f/0x130 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe03b705>] freeze_bdev+0x75/0xd0 Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a0868>] __dm_suspend+0xf8/0x210 [dm_mod] Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a2ea0>] dm_suspend+0xc0/0xd0 [dm_mod] Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8414>] dev_suspend+0x194/0x250 [dm_mod] Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8280>] ? table_load+0x390/0x390 [dm_mod] Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8c45>] ctl_ioctl+0x1e5/0x500 [dm_mod] Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8f73>] dm_ctl_ioctl+0x13/0x20 [dm_mod] Apr 12 15:26:31 host-121 kernel: [<ffffffffbe01264d>] do_vfs_ioctl+0x33d/0x540 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0b072f>] ? file_has_perm+0x9f/0xb0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0009ee>] ? ____fput+0xe/0x10 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0128f1>] SyS_ioctl+0xa1/0xc0 Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4a1bc9>] system_call_fastpath+0x16/0x1b
Created attachment 1273641 [details] verbose lvconvert w/ snapshot attempt This was attempted w/o running I/O so it wouldn't deadlock.
Disallowing reshape/takeover while LV is under a snapshot until future release
Output for disallowing For completeness related to comment #6: [root@vm254 ~]# lvs -aoname,size,segtype,stripes,datastripes,syncpercent,reshapelen,origin,devices nvm LV LSize Type #Str #DStr Cpy%Sync RSize Origin Devices r 128.00m raid1 2 2 100.00 r_rimage_0(0),r_rimage_1(0) [r_rimage_0] 128.00m linear 1 1 /dev/sda(1) [r_rimage_1] 128.00m linear 1 1 /dev/sdq(1) [r_rmeta_0] 4.00m linear 1 1 /dev/sda(0) [r_rmeta_1] 4.00m linear 1 1 /dev/sdq(0) s 12.00m linear 1 1 r /dev/sda(33) [root@vm254 ~]# lvconvert --ty raid5 -y nvm/r Using default stripesize 64.00 KiB. Can't convert snapshot origin nvm/r.
CANTFIX reasoning: - though commit f1b78665ef181ccd630209243b74df0627322a35 fixes the 2-legged raid1 -> raid5 conversion, this does not provide any advantage over just keepiung the raid1 layout unless additionally reshaping to more stripes - reshaping to more (or less; not in this BZs context) stripes involves a RaidLV size change after adding (or before removing) stripes - active classic snapshots require the size of an origin LV to be constant and hence need the origin LV to be inactive when resizing via e.g. lvresize or "lvconvert --stripes ..." - on the other hand, inactive RaidLVs can't be resized/converted because kernel state is not available but mandatory to decide if the RaidLV is fully synchronized/reshaped -> we can't allow active RaidLVs to be reshaped when classic snapshots are on top of them (done in commit f342e803ba3c32890a2b08736fa94bdd541d5e9c as of comment #6)
(In reply to Heinz Mauelshagen from comment #11) > Output for disallowing For completeness related to comment #6: > > [root@vm254 ~]# lvs > -aoname,size,segtype,stripes,datastripes,syncpercent,reshapelen,origin, > devices nvm > LV LSize Type #Str #DStr Cpy%Sync RSize Origin Devices > > r 128.00m raid1 2 2 100.00 > r_rimage_0(0),r_rimage_1(0) > [r_rimage_0] 128.00m linear 1 1 /dev/sda(1) > > [r_rimage_1] 128.00m linear 1 1 /dev/sdq(1) > > [r_rmeta_0] 4.00m linear 1 1 /dev/sda(0) > > [r_rmeta_1] 4.00m linear 1 1 /dev/sdq(0) > > s 12.00m linear 1 1 r /dev/sda(33) > > [root@vm254 ~]# lvconvert --ty raid5 -y nvm/r > Using default stripesize 64.00 KiB. > Can't convert snapshot origin nvm/r. Can we get a clean-up of that error message? Something like: "Unable to convert nvm/r while under snapshot(s)." or "Snapshots must be removed in order to convert nvm/r." Otherwise, the user will simply ask, "why the hell not? what's wrong?".
(In reply to Jonathan Earl Brassow from comment #13) > (In reply to Heinz Mauelshagen from comment #11) > > Output for disallowing For completeness related to comment #6: > > > > [root@vm254 ~]# lvs > > -aoname,size,segtype,stripes,datastripes,syncpercent,reshapelen,origin, > > devices nvm > > LV LSize Type #Str #DStr Cpy%Sync RSize Origin Devices > > > > r 128.00m raid1 2 2 100.00 > > r_rimage_0(0),r_rimage_1(0) > > [r_rimage_0] 128.00m linear 1 1 /dev/sda(1) > > > > [r_rimage_1] 128.00m linear 1 1 /dev/sdq(1) > > > > [r_rmeta_0] 4.00m linear 1 1 /dev/sda(0) > > > > [r_rmeta_1] 4.00m linear 1 1 /dev/sdq(0) > > > > s 12.00m linear 1 1 r /dev/sda(33) > > > > [root@vm254 ~]# lvconvert --ty raid5 -y nvm/r > > Using default stripesize 64.00 KiB. > > Can't convert snapshot origin nvm/r. > > Can we get a clean-up of that error message? Something like: > "Unable to convert nvm/r while under snapshot(s)." > or > "Snapshots must be removed in order to convert nvm/r." > > Otherwise, the user will simply ask, "why the hell not? what's wrong?". Done, commit a95f656d0df0fb81d68fa27bfee2350953677174 enhances the rejection message.
Fix verified in the latest rpms. 3.10.0-772.el7.x86_64 lvm2-2.02.176-3.el7 BUILT: Fri Nov 10 07:12:10 CST 2017 lvm2-libs-2.02.176-3.el7 BUILT: Fri Nov 10 07:12:10 CST 2017 lvm2-cluster-2.02.176-3.el7 BUILT: Fri Nov 10 07:12:10 CST 2017 lvm2-lockd-2.02.176-3.el7 BUILT: Fri Nov 10 07:12:10 CST 2017 lvm2-python-boom-0.8-3.el7 BUILT: Fri Nov 10 07:16:45 CST 2017 cmirror-2.02.176-3.el7 BUILT: Fri Nov 10 07:12:10 CST 2017 device-mapper-1.02.145-3.el7 BUILT: Fri Nov 10 07:12:10 CST 2017 device-mapper-libs-1.02.145-3.el7 BUILT: Fri Nov 10 07:12:10 CST 2017 device-mapper-event-1.02.145-3.el7 BUILT: Fri Nov 10 07:12:10 CST 2017 device-mapper-event-libs-1.02.145-3.el7 BUILT: Fri Nov 10 07:12:10 CST 2017 device-mapper-persistent-data-0.7.3-2.el7 BUILT: Tue Oct 10 04:00:07 CDT 2017 [root@host-116 ~]# lvs -o +segtype LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Type snap centipede2 swi-a-s--- 100.00m takeover 1.22 linear takeover centipede2 owi-a-r--- 4.06g 100.00 raid6_rs_6 [root@host-116 ~]# lvconvert --yes -R 16384.00k --type raid5_rs centipede2/takeover Using default stripesize 64.00 KiB. Can't convert RAID LV centipede2/takeover while under snapshot. [root@host-116 ~]# lvconvert --yes --stripes 2 centipede2/takeover Using default stripesize 64.00 KiB. Can't convert RAID LV centipede2/takeover while under snapshot.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0853