Bug 1439399

Summary: RAID TAKEOVER: takeover on raid volumes containing snapshots doesn't work
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID QA Contact: cluster-qe <cluster-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, rbednar, zkabelac
Version: 7.4Keywords: Reopened
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.175-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 15:20:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1782045    
Bug Blocks: 1469559    
Attachments:
Description Flags
verbose lvconvert w/ snapshot attempt none

Description Corey Marthaler 2017-04-05 22:48:38 UTC
Description of problem:
[root@host-122 ~]# lvcreate -m 1 --type raid1 -n LV2 -L 100M black_bird
  Logical volume "LV2" created.
[root@host-122 ~]# lvcreate -m 1 --type raid1 -n LV3 -L 100M black_bird
  Logical volume "LV3" created.
[root@host-122 ~]# lvcreate -m 1 --type raid1 -n LV4 -L 100M black_bird
  Logical volume "LV4" created.

# raid1 -> raid5_n appears to work fine w/o a snapshot
[root@host-122 ~]# lvconvert --type raid5_n black_bird/LV2
  Using default stripesize 64.00 KiB.
  Logical volume black_bird/LV2 successfully converted.
[root@host-122 ~]# lvs -a -o +devices,segtype
  LV             VG          Attr       LSize   Cpy%Sync Devices                         Type   
  LV2            black_bird  rwi-a-r--- 100.00m 100.00   LV2_rimage_0(0),LV2_rimage_1(0) raid5_n
  [LV2_rimage_0] black_bird  iwi-aor--- 100.00m          /dev/sdf1(27)                   linear 
  [LV2_rimage_1] black_bird  iwi-aor--- 100.00m          /dev/sdh1(27)                   linear 
  [LV2_rmeta_0]  black_bird  ewi-aor---   4.00m          /dev/sdf1(26)                   linear 
  [LV2_rmeta_1]  black_bird  ewi-aor---   4.00m          /dev/sdh1(26)                   linear 


# Create snapshots of raids LV3 and LV4
[root@host-122 ~]# lvcreate -L 12M -s black_bird/LV3
  Using default stripesize 64.00 KiB.
  Logical volume "lvol0" created.
[root@host-122 ~]# lvcreate -L 12M -s black_bird/LV4
  Using default stripesize 64.00 KiB.
  Logical volume "lvol1" created.

[root@host-122 ~]# lvconvert --type raid4 black_bird/LV3
  Using default stripesize 64.00 KiB.
  Internal error: Writing metadata in critical section.
  Logical Volume LV3_rimage_0 already exists in volume group black_bird.

[root@host-122 ~]# lvconvert --type raid5 black_bird/LV4
  Using default stripesize 64.00 KiB.
  Internal error: Writing metadata in critical section.
  Logical volume black_bird/LV4 successfully converted.






# This is with an actual filesystem and running I/O:

[root@host-121 ~]# lvconvert --type raid5_n black_bird/synced_primary_raid1_2legs_1
  Using default stripesize 64.00 KiB.
  Internal error: Writing metadata in critical section.

[DEADLOCK]

Apr  5 17:21:15 host-121 kernel: md/raid:mdX: device dm-3 operational as raid disk 0
Apr  5 17:21:15 host-121 kernel: md/raid:mdX: device dm-5 operational as raid disk 1
Apr  5 17:21:15 host-121 kernel: md/raid:mdX: raid level 5 active with 2 out of 2 devices, algorithm 5
Apr  5 17:21:15 host-121 lvm[21484]: No longer monitoring RAID device black_bird-synced_primary_raid1_2legs_1-real for events.
Apr  5 17:21:15 host-121 dmeventd[21484]: No longer monitoring snapshot black_bird-bb_snap1.
Apr  5 17:23:51 host-121 kernel: INFO: task xfsaild/dm-6:21594 blocked for more than 120 seconds.
Apr  5 17:23:51 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  5 17:23:51 host-121 kernel: xfsaild/dm-6    D ffff88003b000fb0     0 21594      2 0x00000080
Apr  5 17:23:51 host-121 kernel: ffff88002028fd48 0000000000000046 ffff88002028ffd8 ffff88002028ffd8
Apr  5 17:23:51 host-121 kernel: ffff88002028ffd8 0000000000016cc0 ffffffffbcbdd460 ffff88003cf81f00
Apr  5 17:23:51 host-121 kernel: 0000000000000000 ffff88003b000fb0 ffff88003b3aed28 ffff88001f5ae000
Apr  5 17:23:51 host-121 kernel: Call Trace:
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8960f9>] schedule+0x29/0x70
Apr  5 17:23:51 host-121 kernel: [<ffffffffc0303d36>] _xfs_log_force+0x1c6/0x2c0 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2c2280>] ? wake_up_state+0x20/0x20
Apr  5 17:23:51 host-121 kernel: [<ffffffffc030fbbc>] ? xfsaild+0x16c/0x6f0 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc0303e5c>] xfs_log_force+0x2c/0x70 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc030fa50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc030fbbc>] xfsaild+0x16c/0x6f0 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc030fa50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2ae9bf>] kthread+0xcf/0xe0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2ae8f0>] ? insert_kthread_work+0x40/0x40
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8a1b18>] ret_from_fork+0x58/0x90
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2ae8f0>] ? insert_kthread_work+0x40/0x40
Apr  5 17:23:51 host-121 kernel: INFO: task xdoio:21696 blocked for more than 120 seconds.
Apr  5 17:23:51 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  5 17:23:51 host-121 kernel: xdoio           D ffff88003ab01f60     0 21696  21695 0x00000080
Apr  5 17:23:51 host-121 kernel: ffff88001e8b3e78 0000000000000082 ffff88001e8b3fd8 ffff88001e8b3fd8
Apr  5 17:23:51 host-121 kernel: ffff88001e8b3fd8 0000000000016cc0 ffff88001e133ec0 ffff88003b232800
Apr  5 17:23:51 host-121 kernel: 0000000000000001 0000000000000001 0000000000000000 ffff88003b232b08
Apr  5 17:23:51 host-121 kernel: Call Trace:
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8960f9>] schedule+0x29/0x70
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc4014ae>] __sb_start_write+0xde/0x110
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2afa00>] ? wake_up_atomic_t+0x30/0x30
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc3fe5eb>] vfs_write+0x1ab/0x1e0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc3ff30f>] SyS_write+0x7f/0xe0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8a1bc9>] system_call_fastpath+0x16/0x1b
Apr  5 17:23:51 host-121 kernel: INFO: task lvconvert:21730 blocked for more than 120 seconds.
Apr  5 17:23:51 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  5 17:23:51 host-121 kernel: lvconvert       D ffff88003b04bec0     0 21730   2606 0x00000080
Apr  5 17:23:51 host-121 kernel: ffff88001b1078b0 0000000000000086 ffff88001b107fd8 ffff88001b107fd8
Apr  5 17:23:51 host-121 kernel: ffff88001b107fd8 0000000000016cc0 ffff88001e136dd0 ffff88003fc16cc0
Apr  5 17:23:51 host-121 kernel: 0000000000000000 7fffffffffffffff ffff88003ff6ace8 ffffffffbc894380
Apr  5 17:23:51 host-121 kernel: Call Trace:
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc894380>] ? bit_wait+0x50/0x50
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8960f9>] schedule+0x29/0x70
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc893d69>] schedule_timeout+0x239/0x2c0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2c803e>] ? account_entity_dequeue+0xae/0xd0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2cba5c>] ? dequeue_entity+0x11c/0x5d0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc260ede>] ? kvm_clock_get_cycles+0x1e/0x20
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc894380>] ? bit_wait+0x50/0x50
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8958dd>] io_schedule_timeout+0xad/0x130
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc895978>] io_schedule+0x18/0x1a
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc894391>] bit_wait_io+0x11/0x50
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc893eb5>] __wait_on_bit+0x65/0x90
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc37f231>] wait_on_page_bit+0x81/0xa0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2afac0>] ? wake_bit_function+0x40/0x40
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc37f361>] __filemap_fdatawait_range+0x111/0x190
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc382157>] filemap_fdatawait_keep_errors+0x27/0x30
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc42af9d>] sync_inodes_sb+0x16d/0x1f0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc430833>] sync_filesystem+0x63/0xb0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc4017bf>] freeze_super+0x8f/0x130
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc43b705>] freeze_bdev+0x75/0xd0
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01c7868>] __dm_suspend+0xf8/0x210 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01c9ea0>] dm_suspend+0xc0/0xd0 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01cf414>] dev_suspend+0x194/0x250 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01cf280>] ? table_load+0x390/0x390 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01cfc45>] ctl_ioctl+0x1e5/0x500 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01cff73>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc41264d>] do_vfs_ioctl+0x33d/0x540
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc4b072f>] ? file_has_perm+0x9f/0xb0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc4009ee>] ? ____fput+0xe/0x10
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc4128f1>] SyS_ioctl+0xa1/0xc0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8a1bc9>] system_call_fastpath+0x16/0x1b


Version-Release number of selected component (if applicable):
3.10.0-635.el7.x86_64

lvm2-2.02.169-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
lvm2-libs-2.02.169-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
lvm2-cluster-2.02.169-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
device-mapper-1.02.138-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
device-mapper-libs-1.02.138-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
device-mapper-event-1.02.138-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
device-mapper-event-libs-1.02.138-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017

Comment 2 Corey Marthaler 2017-04-12 20:34:06 UTC
This appears to be the case w/ all raid types.

Scenario raid6_nr: Convert Striped raid6_nr volume
********* Take over hash info for this scenario *********
* from type:     raid6_nr
* to type:       raid6_la_6
* snapshot:      1
******************************************************

Creating original volume on host-121...
host-121: lvcreate  --type raid6_nr -i 3 -n takeover -L 500M centipede2

Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Current volume device structure:
  LV                  Attr       LSize   Cpy%Sync Devices
  takeover            rwi-a-r--- 504.00m 100.00   takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0),takeover_rimage_4(0)
  [takeover_rimage_0] iwi-aor--- 168.00m          /dev/sdg1(1)                                                                                            
  [takeover_rimage_1] iwi-aor--- 168.00m          /dev/sde1(1)
  [takeover_rimage_2] iwi-aor--- 168.00m          /dev/sda1(1)
  [takeover_rimage_3] iwi-aor--- 168.00m          /dev/sdd1(1)
  [takeover_rimage_4] iwi-aor--- 168.00m          /dev/sdc1(1)
  [takeover_rmeta_0]  ewi-aor---   4.00m          /dev/sdg1(0)
  [takeover_rmeta_1]  ewi-aor---   4.00m          /dev/sde1(0)
  [takeover_rmeta_2]  ewi-aor---   4.00m          /dev/sda1(0)
  [takeover_rmeta_3]  ewi-aor---   4.00m          /dev/sdd1(0)
  [takeover_rmeta_4]  ewi-aor---   4.00m          /dev/sdc1(0)


Creating ext on top of mirror(s) on host-121...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-121...

Writing verification files (checkit) to mirror(s) on...
        ---- host-121 ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 

Creating a snapshot volume of raid to be changed
        lvcreate --type snapshot -L 100M -n snap -s centipede2/takeover
Verifying files (checkit) on mirror(s) on...
        ---- host-121 ----

lvconvert --yes  --type raid6_la_6 centipede2/takeover
  Internal error: Writing metadata in critical section.


Apr 12 15:24:30 host-121 qarshd[31678]: Running cmdline: lvconvert --yes --type raid6_la_6 centipede2/takeover
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-3 operational as raid disk 0
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-5 operational as raid disk 1
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-7 operational as raid disk 2
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-9 operational as raid disk 3
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-11 operational as raid disk 4
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: raid level 6 active with 5 out of 5 devices, algorithm 9
Apr 12 15:24:31 host-121 lvm[9616]: No longer monitoring RAID device centipede2-takeover-real for events.
Apr 12 15:24:31 host-121 dmeventd[9616]: No longer monitoring snapshot centipede2-snap.
Apr 12 15:26:31 host-121 kernel: INFO: task jbd2/dm-12-8:31512 blocked for more than 120 seconds.
Apr 12 15:26:31 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 12 15:26:31 host-121 kernel: jbd2/dm-12-8    D ffff88003b003ec0     0 31512      2 0x00000080
Apr 12 15:26:31 host-121 kernel: ffff88000496ba60 0000000000000046 ffff88000496bfd8 ffff88000496bfd8
Apr 12 15:26:31 host-121 kernel: ffff88000496bfd8 0000000000016cc0 ffff880020845e20 ffff88003fc16cc0
Apr 12 15:26:31 host-121 kernel: 0000000000000000 7fffffffffffffff ffff88003ff5a260 ffffffffbe494380
Apr 12 15:26:31 host-121 kernel: Call Trace:
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4960f9>] schedule+0x29/0x70
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493d69>] schedule_timeout+0x239/0x2c0
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a1799>] ? __split_and_process_bio+0x2e9/0x520 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ede>] ? kvm_clock_get_cycles+0x1e/0x20
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdee6c3c>] ? ktime_get_ts64+0x4c/0xf0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4958dd>] io_schedule_timeout+0xad/0x130
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe495978>] io_schedule+0x18/0x1a
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494391>] bit_wait_io+0x11/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493eb5>] __wait_on_bit+0x65/0x90
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493f61>] out_of_line_wait_on_bit+0x81/0xb0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafac0>] ? wake_bit_function+0x40/0x40
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe03394a>] __wait_on_buffer+0x2a/0x30
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b3110>] jbd2_write_superblock+0xa0/0x180 [jbd2]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b3229>] jbd2_journal_update_sb_log_tail+0x39/0xa0 [jbd2]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06ac7f4>] jbd2_journal_commit_transaction+0x17a4/0x1990 [jbd2]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdec803e>] ? account_entity_dequeue+0xae/0xd0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdecba5c>] ? dequeue_entity+0x11c/0x5d0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ebe>] ? kvm_clock_read+0x1e/0x20
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde29557>] ? __switch_to+0xd7/0x4c0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde96edb>] ? lock_timer_base.isra.34+0x2b/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde9738e>] ? try_to_del_timer_sync+0x5e/0x90
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b1a89>] kjournald2+0xc9/0x260 [jbd2]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafa00>] ? wake_up_atomic_t+0x30/0x30
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b19c0>] ? commit_timeout+0x10/0x10 [jbd2]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeae9bf>] kthread+0xcf/0xe0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde8bf0b>] ? do_exit+0x6bb/0xa40
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeae8f0>] ? insert_kthread_work+0x40/0x40
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4a1b18>] ret_from_fork+0x58/0x90
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeae8f0>] ? insert_kthread_work+0x40/0x40
Apr 12 15:26:31 host-121 kernel: INFO: task xdoio:31533 blocked for more than 120 seconds.
Apr 12 15:26:31 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 12 15:26:31 host-121 kernel: xdoio           D ffff88003b005e20     0 31533  31532 0x00000080
Apr 12 15:26:31 host-121 kernel: ffff8800186efdb0 0000000000000082 ffff8800186effd8 ffff8800186effd8
Apr 12 15:26:31 host-121 kernel: ffff8800186effd8 0000000000016cc0 ffff880020c41f60 ffff88002379f000
Apr 12 15:26:31 host-121 kernel: 0000000000000001 0000000000000001 0000000000000000 ffff88002379f308
Apr 12 15:26:31 host-121 kernel: Call Trace:
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4960f9>] schedule+0x29/0x70
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0014ae>] __sb_start_write+0xde/0x110
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafa00>] ? wake_up_atomic_t+0x30/0x30
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdfff81e>] do_readv_writev+0x20e/0x260
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06d4e10>] ? ext4_dax_fault+0x150/0x150 [ext4]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdffd9c0>] ? do_sync_read+0xd0/0xd0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ede>] ? kvm_clock_get_cycles+0x1e/0x20
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdee816a>] ? __getnstimeofday64+0x3a/0xd0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdfff905>] vfs_writev+0x35/0x60
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdfffabf>] SyS_writev+0x7f/0x110
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4a1bc9>] system_call_fastpath+0x16/0x1b
Apr 12 15:26:31 host-121 kernel: INFO: task lvconvert:31679 blocked for more than 120 seconds.
Apr 12 15:26:31 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 12 15:26:31 host-121 kernel: lvconvert       D ffff880021862f10     0 31679  31678 0x00000080
Apr 12 15:26:31 host-121 kernel: ffff88000107f8b0 0000000000000086 ffff88000107ffd8 ffff88000107ffd8
Apr 12 15:26:31 host-121 kernel: ffff88000107ffd8 0000000000016cc0 ffff880020840000 ffff88003fc16cc0
Apr 12 15:26:31 host-121 kernel: 0000000000000000 7fffffffffffffff ffff88003ff5d7e8 ffffffffbe494380
Apr 12 15:26:31 host-121 kernel: Call Trace:
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4960f9>] schedule+0x29/0x70
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493d69>] schedule_timeout+0x239/0x2c0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdec803e>] ? account_entity_dequeue+0xae/0xd0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdecba5c>] ? dequeue_entity+0x11c/0x5d0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ede>] ? kvm_clock_get_cycles+0x1e/0x20
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4958dd>] io_schedule_timeout+0xad/0x130
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe495978>] io_schedule+0x18/0x1a
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494391>] bit_wait_io+0x11/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493eb5>] __wait_on_bit+0x65/0x90
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdf7f231>] wait_on_page_bit+0x81/0xa0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafac0>] ? wake_bit_function+0x40/0x40
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdf7f361>] __filemap_fdatawait_range+0x111/0x190
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdf82157>] filemap_fdatawait_keep_errors+0x27/0x30
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe02af9d>] sync_inodes_sb+0x16d/0x1f0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe030833>] sync_filesystem+0x63/0xb0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0017bf>] freeze_super+0x8f/0x130
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe03b705>] freeze_bdev+0x75/0xd0
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a0868>] __dm_suspend+0xf8/0x210 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a2ea0>] dm_suspend+0xc0/0xd0 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8414>] dev_suspend+0x194/0x250 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8280>] ? table_load+0x390/0x390 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8c45>] ctl_ioctl+0x1e5/0x500 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8f73>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe01264d>] do_vfs_ioctl+0x33d/0x540
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0b072f>] ? file_has_perm+0x9f/0xb0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0009ee>] ? ____fput+0xe/0x10
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0128f1>] SyS_ioctl+0xa1/0xc0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4a1bc9>] system_call_fastpath+0x16/0x1b

Comment 3 Corey Marthaler 2017-04-24 15:57:47 UTC
Created attachment 1273641 [details]
verbose lvconvert w/ snapshot attempt

This was attempted w/o running I/O so it wouldn't deadlock.

Comment 6 Jonathan Earl Brassow 2017-06-19 17:48:25 UTC
Disallowing reshape/takeover while LV is under a snapshot until future release

Comment 11 Heinz Mauelshagen 2017-09-28 16:40:51 UTC
Output for disallowing For completeness related to comment #6:

[root@vm254 ~]# lvs -aoname,size,segtype,stripes,datastripes,syncpercent,reshapelen,origin,devices nvm
  LV           LSize   Type   #Str #DStr Cpy%Sync RSize Origin Devices                    
  r            128.00m raid1     2     2 100.00                r_rimage_0(0),r_rimage_1(0)
  [r_rimage_0] 128.00m linear    1     1                       /dev/sda(1)                
  [r_rimage_1] 128.00m linear    1     1                       /dev/sdq(1)                
  [r_rmeta_0]    4.00m linear    1     1                       /dev/sda(0)                
  [r_rmeta_1]    4.00m linear    1     1                       /dev/sdq(0)                
  s             12.00m linear    1     1                r      /dev/sda(33)

[root@vm254 ~]# lvconvert --ty raid5 -y nvm/r
  Using default stripesize 64.00 KiB.
  Can't convert snapshot origin nvm/r.

Comment 12 Heinz Mauelshagen 2017-09-29 13:13:04 UTC
CANTFIX reasoning:

- though commit f1b78665ef181ccd630209243b74df0627322a35 fixes
  the 2-legged raid1 -> raid5 conversion, this does not provide
  any advantage over just keepiung the raid1 layout unless additionally
  reshaping to more stripes

- reshaping to more (or less; not in this BZs context) stripes involves
  a RaidLV size change after adding (or before removing) stripes

- active classic snapshots require the size of an origin LV to be constant
  and hence need the origin LV to be inactive when resizing via
  e.g. lvresize or "lvconvert --stripes ..."

- on the other hand, inactive RaidLVs can't be resized/converted because
  kernel state is not available but mandatory to decide if the RaidLV is fully
  synchronized/reshaped

-> we can't allow active RaidLVs to be reshaped when classic snapshots
   are on top of them (done in commit f342e803ba3c32890a2b08736fa94bdd541d5e9c
   as of comment #6)

Comment 13 Jonathan Earl Brassow 2017-10-04 00:38:25 UTC
(In reply to Heinz Mauelshagen from comment #11)
> Output for disallowing For completeness related to comment #6:
> 
> [root@vm254 ~]# lvs
> -aoname,size,segtype,stripes,datastripes,syncpercent,reshapelen,origin,
> devices nvm
>   LV           LSize   Type   #Str #DStr Cpy%Sync RSize Origin Devices      
> 
>   r            128.00m raid1     2     2 100.00               
> r_rimage_0(0),r_rimage_1(0)
>   [r_rimage_0] 128.00m linear    1     1                       /dev/sda(1)  
> 
>   [r_rimage_1] 128.00m linear    1     1                       /dev/sdq(1)  
> 
>   [r_rmeta_0]    4.00m linear    1     1                       /dev/sda(0)  
> 
>   [r_rmeta_1]    4.00m linear    1     1                       /dev/sdq(0)  
> 
>   s             12.00m linear    1     1                r      /dev/sda(33)
> 
> [root@vm254 ~]# lvconvert --ty raid5 -y nvm/r
>   Using default stripesize 64.00 KiB.
>   Can't convert snapshot origin nvm/r.

Can we get a clean-up of that error message?  Something like:
"Unable to convert nvm/r while under snapshot(s)."
or
"Snapshots must be removed in order to convert nvm/r."

Otherwise, the user will simply ask, "why the hell not?  what's wrong?".

Comment 14 Heinz Mauelshagen 2017-10-04 15:07:55 UTC
(In reply to Jonathan Earl Brassow from comment #13)
> (In reply to Heinz Mauelshagen from comment #11)
> > Output for disallowing For completeness related to comment #6:
> > 
> > [root@vm254 ~]# lvs
> > -aoname,size,segtype,stripes,datastripes,syncpercent,reshapelen,origin,
> > devices nvm
> >   LV           LSize   Type   #Str #DStr Cpy%Sync RSize Origin Devices      
> > 
> >   r            128.00m raid1     2     2 100.00               
> > r_rimage_0(0),r_rimage_1(0)
> >   [r_rimage_0] 128.00m linear    1     1                       /dev/sda(1)  
> > 
> >   [r_rimage_1] 128.00m linear    1     1                       /dev/sdq(1)  
> > 
> >   [r_rmeta_0]    4.00m linear    1     1                       /dev/sda(0)  
> > 
> >   [r_rmeta_1]    4.00m linear    1     1                       /dev/sdq(0)  
> > 
> >   s             12.00m linear    1     1                r      /dev/sda(33)
> > 
> > [root@vm254 ~]# lvconvert --ty raid5 -y nvm/r
> >   Using default stripesize 64.00 KiB.
> >   Can't convert snapshot origin nvm/r.
> 
> Can we get a clean-up of that error message?  Something like:
> "Unable to convert nvm/r while under snapshot(s)."
> or
> "Snapshots must be removed in order to convert nvm/r."
> 
> Otherwise, the user will simply ask, "why the hell not?  what's wrong?".

Done, commit a95f656d0df0fb81d68fa27bfee2350953677174 enhances the rejection message.

Comment 16 Corey Marthaler 2017-11-16 17:33:33 UTC
Fix verified in the latest rpms.

3.10.0-772.el7.x86_64
lvm2-2.02.176-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
lvm2-libs-2.02.176-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
lvm2-cluster-2.02.176-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
lvm2-lockd-2.02.176-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
lvm2-python-boom-0.8-3.el7    BUILT: Fri Nov 10 07:16:45 CST 2017
cmirror-2.02.176-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
device-mapper-1.02.145-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
device-mapper-libs-1.02.145-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
device-mapper-event-1.02.145-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
device-mapper-event-libs-1.02.145-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
device-mapper-persistent-data-0.7.3-2.el7    BUILT: Tue Oct 10 04:00:07 CDT 2017


[root@host-116 ~]# lvs -o +segtype
  LV       VG            Attr       LSize   Pool Origin   Data%  Meta%  Move Log Cpy%Sync Convert Type      
  snap     centipede2    swi-a-s--- 100.00m      takeover 1.22                                    linear    
  takeover centipede2    owi-a-r---   4.06g                                      100.00           raid6_rs_6

[root@host-116 ~]# lvconvert --yes -R 16384.00k  --type raid5_rs centipede2/takeover
  Using default stripesize 64.00 KiB.
  Can't convert RAID LV centipede2/takeover while under snapshot.

[root@host-116 ~]# lvconvert --yes  --stripes 2 centipede2/takeover
  Using default stripesize 64.00 KiB.
  Can't convert RAID LV centipede2/takeover while under snapshot.

Comment 19 errata-xmlrpc 2018-04-10 15:20:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0853