1439399 – RAID TAKEOVER: takeover on raid volumes containing snapshots doesn't work

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1439399 - RAID TAKEOVER: takeover on raid volumes containing snapshots doesn't work

Summary: RAID TAKEOVER: takeover on raid volumes containing snapshots doesn't work

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Heinz Mauelshagen
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1782045
Blocks:	1469559
TreeView+	depends on / blocked

Reported:	2017-04-05 22:48 UTC by Corey Marthaler
Modified:	2021-09-03 12:39 UTC (History)
CC List:	8 users (show)
Fixed In Version:	lvm2-2.02.175-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-10 15:20:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
verbose lvconvert w/ snapshot attempt (358.87 KB, text/plain) 2017-04-24 15:57 UTC, Corey Marthaler	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:0853	0	None	None	None	2018-04-10 15:21:32 UTC

Description Corey Marthaler 2017-04-05 22:48:38 UTC

Description of problem:
[root@host-122 ~]# lvcreate -m 1 --type raid1 -n LV2 -L 100M black_bird
  Logical volume "LV2" created.
[root@host-122 ~]# lvcreate -m 1 --type raid1 -n LV3 -L 100M black_bird
  Logical volume "LV3" created.
[root@host-122 ~]# lvcreate -m 1 --type raid1 -n LV4 -L 100M black_bird
  Logical volume "LV4" created.

# raid1 -> raid5_n appears to work fine w/o a snapshot
[root@host-122 ~]# lvconvert --type raid5_n black_bird/LV2
  Using default stripesize 64.00 KiB.
  Logical volume black_bird/LV2 successfully converted.
[root@host-122 ~]# lvs -a -o +devices,segtype
  LV             VG          Attr       LSize   Cpy%Sync Devices                         Type   
  LV2            black_bird  rwi-a-r--- 100.00m 100.00   LV2_rimage_0(0),LV2_rimage_1(0) raid5_n
  [LV2_rimage_0] black_bird  iwi-aor--- 100.00m          /dev/sdf1(27)                   linear 
  [LV2_rimage_1] black_bird  iwi-aor--- 100.00m          /dev/sdh1(27)                   linear 
  [LV2_rmeta_0]  black_bird  ewi-aor---   4.00m          /dev/sdf1(26)                   linear 
  [LV2_rmeta_1]  black_bird  ewi-aor---   4.00m          /dev/sdh1(26)                   linear 


# Create snapshots of raids LV3 and LV4
[root@host-122 ~]# lvcreate -L 12M -s black_bird/LV3
  Using default stripesize 64.00 KiB.
  Logical volume "lvol0" created.
[root@host-122 ~]# lvcreate -L 12M -s black_bird/LV4
  Using default stripesize 64.00 KiB.
  Logical volume "lvol1" created.

[root@host-122 ~]# lvconvert --type raid4 black_bird/LV3
  Using default stripesize 64.00 KiB.
  Internal error: Writing metadata in critical section.
  Logical Volume LV3_rimage_0 already exists in volume group black_bird.

[root@host-122 ~]# lvconvert --type raid5 black_bird/LV4
  Using default stripesize 64.00 KiB.
  Internal error: Writing metadata in critical section.
  Logical volume black_bird/LV4 successfully converted.






# This is with an actual filesystem and running I/O:

[root@host-121 ~]# lvconvert --type raid5_n black_bird/synced_primary_raid1_2legs_1
  Using default stripesize 64.00 KiB.
  Internal error: Writing metadata in critical section.

[DEADLOCK]

Apr  5 17:21:15 host-121 kernel: md/raid:mdX: device dm-3 operational as raid disk 0
Apr  5 17:21:15 host-121 kernel: md/raid:mdX: device dm-5 operational as raid disk 1
Apr  5 17:21:15 host-121 kernel: md/raid:mdX: raid level 5 active with 2 out of 2 devices, algorithm 5
Apr  5 17:21:15 host-121 lvm[21484]: No longer monitoring RAID device black_bird-synced_primary_raid1_2legs_1-real for events.
Apr  5 17:21:15 host-121 dmeventd[21484]: No longer monitoring snapshot black_bird-bb_snap1.
Apr  5 17:23:51 host-121 kernel: INFO: task xfsaild/dm-6:21594 blocked for more than 120 seconds.
Apr  5 17:23:51 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  5 17:23:51 host-121 kernel: xfsaild/dm-6    D ffff88003b000fb0     0 21594      2 0x00000080
Apr  5 17:23:51 host-121 kernel: ffff88002028fd48 0000000000000046 ffff88002028ffd8 ffff88002028ffd8
Apr  5 17:23:51 host-121 kernel: ffff88002028ffd8 0000000000016cc0 ffffffffbcbdd460 ffff88003cf81f00
Apr  5 17:23:51 host-121 kernel: 0000000000000000 ffff88003b000fb0 ffff88003b3aed28 ffff88001f5ae000
Apr  5 17:23:51 host-121 kernel: Call Trace:
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8960f9>] schedule+0x29/0x70
Apr  5 17:23:51 host-121 kernel: [<ffffffffc0303d36>] _xfs_log_force+0x1c6/0x2c0 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2c2280>] ? wake_up_state+0x20/0x20
Apr  5 17:23:51 host-121 kernel: [<ffffffffc030fbbc>] ? xfsaild+0x16c/0x6f0 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc0303e5c>] xfs_log_force+0x2c/0x70 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc030fa50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc030fbbc>] xfsaild+0x16c/0x6f0 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc030fa50>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2ae9bf>] kthread+0xcf/0xe0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2ae8f0>] ? insert_kthread_work+0x40/0x40
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8a1b18>] ret_from_fork+0x58/0x90
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2ae8f0>] ? insert_kthread_work+0x40/0x40
Apr  5 17:23:51 host-121 kernel: INFO: task xdoio:21696 blocked for more than 120 seconds.
Apr  5 17:23:51 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  5 17:23:51 host-121 kernel: xdoio           D ffff88003ab01f60     0 21696  21695 0x00000080
Apr  5 17:23:51 host-121 kernel: ffff88001e8b3e78 0000000000000082 ffff88001e8b3fd8 ffff88001e8b3fd8
Apr  5 17:23:51 host-121 kernel: ffff88001e8b3fd8 0000000000016cc0 ffff88001e133ec0 ffff88003b232800
Apr  5 17:23:51 host-121 kernel: 0000000000000001 0000000000000001 0000000000000000 ffff88003b232b08
Apr  5 17:23:51 host-121 kernel: Call Trace:
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8960f9>] schedule+0x29/0x70
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc4014ae>] __sb_start_write+0xde/0x110
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2afa00>] ? wake_up_atomic_t+0x30/0x30
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc3fe5eb>] vfs_write+0x1ab/0x1e0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc3ff30f>] SyS_write+0x7f/0xe0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8a1bc9>] system_call_fastpath+0x16/0x1b
Apr  5 17:23:51 host-121 kernel: INFO: task lvconvert:21730 blocked for more than 120 seconds.
Apr  5 17:23:51 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  5 17:23:51 host-121 kernel: lvconvert       D ffff88003b04bec0     0 21730   2606 0x00000080
Apr  5 17:23:51 host-121 kernel: ffff88001b1078b0 0000000000000086 ffff88001b107fd8 ffff88001b107fd8
Apr  5 17:23:51 host-121 kernel: ffff88001b107fd8 0000000000016cc0 ffff88001e136dd0 ffff88003fc16cc0
Apr  5 17:23:51 host-121 kernel: 0000000000000000 7fffffffffffffff ffff88003ff6ace8 ffffffffbc894380
Apr  5 17:23:51 host-121 kernel: Call Trace:
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc894380>] ? bit_wait+0x50/0x50
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8960f9>] schedule+0x29/0x70
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc893d69>] schedule_timeout+0x239/0x2c0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2c803e>] ? account_entity_dequeue+0xae/0xd0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2cba5c>] ? dequeue_entity+0x11c/0x5d0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc260ede>] ? kvm_clock_get_cycles+0x1e/0x20
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc894380>] ? bit_wait+0x50/0x50
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8958dd>] io_schedule_timeout+0xad/0x130
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc895978>] io_schedule+0x18/0x1a
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc894391>] bit_wait_io+0x11/0x50
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc893eb5>] __wait_on_bit+0x65/0x90
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc37f231>] wait_on_page_bit+0x81/0xa0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc2afac0>] ? wake_bit_function+0x40/0x40
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc37f361>] __filemap_fdatawait_range+0x111/0x190
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc382157>] filemap_fdatawait_keep_errors+0x27/0x30
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc42af9d>] sync_inodes_sb+0x16d/0x1f0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc430833>] sync_filesystem+0x63/0xb0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc4017bf>] freeze_super+0x8f/0x130
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc43b705>] freeze_bdev+0x75/0xd0
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01c7868>] __dm_suspend+0xf8/0x210 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01c9ea0>] dm_suspend+0xc0/0xd0 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01cf414>] dev_suspend+0x194/0x250 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01cf280>] ? table_load+0x390/0x390 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01cfc45>] ctl_ioctl+0x1e5/0x500 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffc01cff73>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc41264d>] do_vfs_ioctl+0x33d/0x540
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc4b072f>] ? file_has_perm+0x9f/0xb0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc4009ee>] ? ____fput+0xe/0x10
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc4128f1>] SyS_ioctl+0xa1/0xc0
Apr  5 17:23:51 host-121 kernel: [<ffffffffbc8a1bc9>] system_call_fastpath+0x16/0x1b


Version-Release number of selected component (if applicable):
3.10.0-635.el7.x86_64

lvm2-2.02.169-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
lvm2-libs-2.02.169-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
lvm2-cluster-2.02.169-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
device-mapper-1.02.138-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
device-mapper-libs-1.02.138-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
device-mapper-event-1.02.138-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
device-mapper-event-libs-1.02.138-3.el7    BUILT: Wed Mar 29 09:17:46 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017

Comment 2 Corey Marthaler 2017-04-12 20:34:06 UTC

This appears to be the case w/ all raid types.

Scenario raid6_nr: Convert Striped raid6_nr volume
********* Take over hash info for this scenario *********
* from type:     raid6_nr
* to type:       raid6_la_6
* snapshot:      1
******************************************************

Creating original volume on host-121...
host-121: lvcreate  --type raid6_nr -i 3 -n takeover -L 500M centipede2

Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Current volume device structure:
  LV                  Attr       LSize   Cpy%Sync Devices
  takeover            rwi-a-r--- 504.00m 100.00   takeover_rimage_0(0),takeover_rimage_1(0),takeover_rimage_2(0),takeover_rimage_3(0),takeover_rimage_4(0)
  [takeover_rimage_0] iwi-aor--- 168.00m          /dev/sdg1(1)                                                                                            
  [takeover_rimage_1] iwi-aor--- 168.00m          /dev/sde1(1)
  [takeover_rimage_2] iwi-aor--- 168.00m          /dev/sda1(1)
  [takeover_rimage_3] iwi-aor--- 168.00m          /dev/sdd1(1)
  [takeover_rimage_4] iwi-aor--- 168.00m          /dev/sdc1(1)
  [takeover_rmeta_0]  ewi-aor---   4.00m          /dev/sdg1(0)
  [takeover_rmeta_1]  ewi-aor---   4.00m          /dev/sde1(0)
  [takeover_rmeta_2]  ewi-aor---   4.00m          /dev/sda1(0)
  [takeover_rmeta_3]  ewi-aor---   4.00m          /dev/sdd1(0)
  [takeover_rmeta_4]  ewi-aor---   4.00m          /dev/sdc1(0)


Creating ext on top of mirror(s) on host-121...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-121...

Writing verification files (checkit) to mirror(s) on...
        ---- host-121 ----

Sleeping 15 seconds to get some outsanding I/O locks before the failure 

Creating a snapshot volume of raid to be changed
        lvcreate --type snapshot -L 100M -n snap -s centipede2/takeover
Verifying files (checkit) on mirror(s) on...
        ---- host-121 ----

lvconvert --yes  --type raid6_la_6 centipede2/takeover
  Internal error: Writing metadata in critical section.


Apr 12 15:24:30 host-121 qarshd[31678]: Running cmdline: lvconvert --yes --type raid6_la_6 centipede2/takeover
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-3 operational as raid disk 0
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-5 operational as raid disk 1
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-7 operational as raid disk 2
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-9 operational as raid disk 3
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: device dm-11 operational as raid disk 4
Apr 12 15:24:31 host-121 kernel: md/raid:mdX: raid level 6 active with 5 out of 5 devices, algorithm 9
Apr 12 15:24:31 host-121 lvm[9616]: No longer monitoring RAID device centipede2-takeover-real for events.
Apr 12 15:24:31 host-121 dmeventd[9616]: No longer monitoring snapshot centipede2-snap.
Apr 12 15:26:31 host-121 kernel: INFO: task jbd2/dm-12-8:31512 blocked for more than 120 seconds.
Apr 12 15:26:31 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 12 15:26:31 host-121 kernel: jbd2/dm-12-8    D ffff88003b003ec0     0 31512      2 0x00000080
Apr 12 15:26:31 host-121 kernel: ffff88000496ba60 0000000000000046 ffff88000496bfd8 ffff88000496bfd8
Apr 12 15:26:31 host-121 kernel: ffff88000496bfd8 0000000000016cc0 ffff880020845e20 ffff88003fc16cc0
Apr 12 15:26:31 host-121 kernel: 0000000000000000 7fffffffffffffff ffff88003ff5a260 ffffffffbe494380
Apr 12 15:26:31 host-121 kernel: Call Trace:
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4960f9>] schedule+0x29/0x70
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493d69>] schedule_timeout+0x239/0x2c0
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a1799>] ? __split_and_process_bio+0x2e9/0x520 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ede>] ? kvm_clock_get_cycles+0x1e/0x20
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdee6c3c>] ? ktime_get_ts64+0x4c/0xf0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4958dd>] io_schedule_timeout+0xad/0x130
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe495978>] io_schedule+0x18/0x1a
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494391>] bit_wait_io+0x11/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493eb5>] __wait_on_bit+0x65/0x90
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493f61>] out_of_line_wait_on_bit+0x81/0xb0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafac0>] ? wake_bit_function+0x40/0x40
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe03394a>] __wait_on_buffer+0x2a/0x30
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b3110>] jbd2_write_superblock+0xa0/0x180 [jbd2]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b3229>] jbd2_journal_update_sb_log_tail+0x39/0xa0 [jbd2]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06ac7f4>] jbd2_journal_commit_transaction+0x17a4/0x1990 [jbd2]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdec803e>] ? account_entity_dequeue+0xae/0xd0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdecba5c>] ? dequeue_entity+0x11c/0x5d0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ebe>] ? kvm_clock_read+0x1e/0x20
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde29557>] ? __switch_to+0xd7/0x4c0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde96edb>] ? lock_timer_base.isra.34+0x2b/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde9738e>] ? try_to_del_timer_sync+0x5e/0x90
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b1a89>] kjournald2+0xc9/0x260 [jbd2]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafa00>] ? wake_up_atomic_t+0x30/0x30
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06b19c0>] ? commit_timeout+0x10/0x10 [jbd2]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeae9bf>] kthread+0xcf/0xe0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde8bf0b>] ? do_exit+0x6bb/0xa40
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeae8f0>] ? insert_kthread_work+0x40/0x40
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4a1b18>] ret_from_fork+0x58/0x90
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeae8f0>] ? insert_kthread_work+0x40/0x40
Apr 12 15:26:31 host-121 kernel: INFO: task xdoio:31533 blocked for more than 120 seconds.
Apr 12 15:26:31 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 12 15:26:31 host-121 kernel: xdoio           D ffff88003b005e20     0 31533  31532 0x00000080
Apr 12 15:26:31 host-121 kernel: ffff8800186efdb0 0000000000000082 ffff8800186effd8 ffff8800186effd8
Apr 12 15:26:31 host-121 kernel: ffff8800186effd8 0000000000016cc0 ffff880020c41f60 ffff88002379f000
Apr 12 15:26:31 host-121 kernel: 0000000000000001 0000000000000001 0000000000000000 ffff88002379f308
Apr 12 15:26:31 host-121 kernel: Call Trace:
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4960f9>] schedule+0x29/0x70
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0014ae>] __sb_start_write+0xde/0x110
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafa00>] ? wake_up_atomic_t+0x30/0x30
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdfff81e>] do_readv_writev+0x20e/0x260
Apr 12 15:26:31 host-121 kernel: [<ffffffffc06d4e10>] ? ext4_dax_fault+0x150/0x150 [ext4]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdffd9c0>] ? do_sync_read+0xd0/0xd0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ede>] ? kvm_clock_get_cycles+0x1e/0x20
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdee816a>] ? __getnstimeofday64+0x3a/0xd0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdfff905>] vfs_writev+0x35/0x60
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdfffabf>] SyS_writev+0x7f/0x110
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4a1bc9>] system_call_fastpath+0x16/0x1b
Apr 12 15:26:31 host-121 kernel: INFO: task lvconvert:31679 blocked for more than 120 seconds.
Apr 12 15:26:31 host-121 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 12 15:26:31 host-121 kernel: lvconvert       D ffff880021862f10     0 31679  31678 0x00000080
Apr 12 15:26:31 host-121 kernel: ffff88000107f8b0 0000000000000086 ffff88000107ffd8 ffff88000107ffd8
Apr 12 15:26:31 host-121 kernel: ffff88000107ffd8 0000000000016cc0 ffff880020840000 ffff88003fc16cc0
Apr 12 15:26:31 host-121 kernel: 0000000000000000 7fffffffffffffff ffff88003ff5d7e8 ffffffffbe494380
Apr 12 15:26:31 host-121 kernel: Call Trace:
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4960f9>] schedule+0x29/0x70
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493d69>] schedule_timeout+0x239/0x2c0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdec803e>] ? account_entity_dequeue+0xae/0xd0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdecba5c>] ? dequeue_entity+0x11c/0x5d0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbde60ede>] ? kvm_clock_get_cycles+0x1e/0x20
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494380>] ? bit_wait+0x50/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4958dd>] io_schedule_timeout+0xad/0x130
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe495978>] io_schedule+0x18/0x1a
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe494391>] bit_wait_io+0x11/0x50
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe493eb5>] __wait_on_bit+0x65/0x90
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdf7f231>] wait_on_page_bit+0x81/0xa0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdeafac0>] ? wake_bit_function+0x40/0x40
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdf7f361>] __filemap_fdatawait_range+0x111/0x190
Apr 12 15:26:31 host-121 kernel: [<ffffffffbdf82157>] filemap_fdatawait_keep_errors+0x27/0x30
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe02af9d>] sync_inodes_sb+0x16d/0x1f0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe030833>] sync_filesystem+0x63/0xb0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0017bf>] freeze_super+0x8f/0x130
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe03b705>] freeze_bdev+0x75/0xd0
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a0868>] __dm_suspend+0xf8/0x210 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a2ea0>] dm_suspend+0xc0/0xd0 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8414>] dev_suspend+0x194/0x250 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8280>] ? table_load+0x390/0x390 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8c45>] ctl_ioctl+0x1e5/0x500 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffc02a8f73>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe01264d>] do_vfs_ioctl+0x33d/0x540
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0b072f>] ? file_has_perm+0x9f/0xb0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0009ee>] ? ____fput+0xe/0x10
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe0128f1>] SyS_ioctl+0xa1/0xc0
Apr 12 15:26:31 host-121 kernel: [<ffffffffbe4a1bc9>] system_call_fastpath+0x16/0x1b

Comment 3 Corey Marthaler 2017-04-24 15:57:47 UTC

Created attachment 1273641 [details]
verbose lvconvert w/ snapshot attempt

This was attempted w/o running I/O so it wouldn't deadlock.

Comment 6 Jonathan Earl Brassow 2017-06-19 17:48:25 UTC

Disallowing reshape/takeover while LV is under a snapshot until future release

Comment 11 Heinz Mauelshagen 2017-09-28 16:40:51 UTC

Output for disallowing For completeness related to comment #6:

[root@vm254 ~]# lvs -aoname,size,segtype,stripes,datastripes,syncpercent,reshapelen,origin,devices nvm
  LV           LSize   Type   #Str #DStr Cpy%Sync RSize Origin Devices                    
  r            128.00m raid1     2     2 100.00                r_rimage_0(0),r_rimage_1(0)
  [r_rimage_0] 128.00m linear    1     1                       /dev/sda(1)                
  [r_rimage_1] 128.00m linear    1     1                       /dev/sdq(1)                
  [r_rmeta_0]    4.00m linear    1     1                       /dev/sda(0)                
  [r_rmeta_1]    4.00m linear    1     1                       /dev/sdq(0)                
  s             12.00m linear    1     1                r      /dev/sda(33)

[root@vm254 ~]# lvconvert --ty raid5 -y nvm/r
  Using default stripesize 64.00 KiB.
  Can't convert snapshot origin nvm/r.

Comment 12 Heinz Mauelshagen 2017-09-29 13:13:04 UTC

CANTFIX reasoning:

- though commit f1b78665ef181ccd630209243b74df0627322a35 fixes
  the 2-legged raid1 -> raid5 conversion, this does not provide
  any advantage over just keepiung the raid1 layout unless additionally
  reshaping to more stripes

- reshaping to more (or less; not in this BZs context) stripes involves
  a RaidLV size change after adding (or before removing) stripes

- active classic snapshots require the size of an origin LV to be constant
  and hence need the origin LV to be inactive when resizing via
  e.g. lvresize or "lvconvert --stripes ..."

- on the other hand, inactive RaidLVs can't be resized/converted because
  kernel state is not available but mandatory to decide if the RaidLV is fully
  synchronized/reshaped

-> we can't allow active RaidLVs to be reshaped when classic snapshots
   are on top of them (done in commit f342e803ba3c32890a2b08736fa94bdd541d5e9c
   as of comment #6)

Comment 13 Jonathan Earl Brassow 2017-10-04 00:38:25 UTC

(In reply to Heinz Mauelshagen from comment #11)
> Output for disallowing For completeness related to comment #6:
> 
> [root@vm254 ~]# lvs
> -aoname,size,segtype,stripes,datastripes,syncpercent,reshapelen,origin,
> devices nvm
>   LV           LSize   Type   #Str #DStr Cpy%Sync RSize Origin Devices      
> 
>   r            128.00m raid1     2     2 100.00               
> r_rimage_0(0),r_rimage_1(0)
>   [r_rimage_0] 128.00m linear    1     1                       /dev/sda(1)  
> 
>   [r_rimage_1] 128.00m linear    1     1                       /dev/sdq(1)  
> 
>   [r_rmeta_0]    4.00m linear    1     1                       /dev/sda(0)  
> 
>   [r_rmeta_1]    4.00m linear    1     1                       /dev/sdq(0)  
> 
>   s             12.00m linear    1     1                r      /dev/sda(33)
> 
> [root@vm254 ~]# lvconvert --ty raid5 -y nvm/r
>   Using default stripesize 64.00 KiB.
>   Can't convert snapshot origin nvm/r.

Can we get a clean-up of that error message?  Something like:
"Unable to convert nvm/r while under snapshot(s)."
or
"Snapshots must be removed in order to convert nvm/r."

Otherwise, the user will simply ask, "why the hell not?  what's wrong?".

Comment 14 Heinz Mauelshagen 2017-10-04 15:07:55 UTC

(In reply to Jonathan Earl Brassow from comment #13)
> (In reply to Heinz Mauelshagen from comment #11)
> > Output for disallowing For completeness related to comment #6:
> > 
> > [root@vm254 ~]# lvs
> > -aoname,size,segtype,stripes,datastripes,syncpercent,reshapelen,origin,
> > devices nvm
> >   LV           LSize   Type   #Str #DStr Cpy%Sync RSize Origin Devices      
> > 
> >   r            128.00m raid1     2     2 100.00               
> > r_rimage_0(0),r_rimage_1(0)
> >   [r_rimage_0] 128.00m linear    1     1                       /dev/sda(1)  
> > 
> >   [r_rimage_1] 128.00m linear    1     1                       /dev/sdq(1)  
> > 
> >   [r_rmeta_0]    4.00m linear    1     1                       /dev/sda(0)  
> > 
> >   [r_rmeta_1]    4.00m linear    1     1                       /dev/sdq(0)  
> > 
> >   s             12.00m linear    1     1                r      /dev/sda(33)
> > 
> > [root@vm254 ~]# lvconvert --ty raid5 -y nvm/r
> >   Using default stripesize 64.00 KiB.
> >   Can't convert snapshot origin nvm/r.
> 
> Can we get a clean-up of that error message?  Something like:
> "Unable to convert nvm/r while under snapshot(s)."
> or
> "Snapshots must be removed in order to convert nvm/r."
> 
> Otherwise, the user will simply ask, "why the hell not?  what's wrong?".

Done, commit a95f656d0df0fb81d68fa27bfee2350953677174 enhances the rejection message.

Comment 16 Corey Marthaler 2017-11-16 17:33:33 UTC

Fix verified in the latest rpms.

3.10.0-772.el7.x86_64
lvm2-2.02.176-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
lvm2-libs-2.02.176-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
lvm2-cluster-2.02.176-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
lvm2-lockd-2.02.176-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
lvm2-python-boom-0.8-3.el7    BUILT: Fri Nov 10 07:16:45 CST 2017
cmirror-2.02.176-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
device-mapper-1.02.145-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
device-mapper-libs-1.02.145-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
device-mapper-event-1.02.145-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
device-mapper-event-libs-1.02.145-3.el7    BUILT: Fri Nov 10 07:12:10 CST 2017
device-mapper-persistent-data-0.7.3-2.el7    BUILT: Tue Oct 10 04:00:07 CDT 2017


[root@host-116 ~]# lvs -o +segtype
  LV       VG            Attr       LSize   Pool Origin   Data%  Meta%  Move Log Cpy%Sync Convert Type      
  snap     centipede2    swi-a-s--- 100.00m      takeover 1.22                                    linear    
  takeover centipede2    owi-a-r---   4.06g                                      100.00           raid6_rs_6

[root@host-116 ~]# lvconvert --yes -R 16384.00k  --type raid5_rs centipede2/takeover
  Using default stripesize 64.00 KiB.
  Can't convert RAID LV centipede2/takeover while under snapshot.

[root@host-116 ~]# lvconvert --yes  --stripes 2 centipede2/takeover
  Using default stripesize 64.00 KiB.
  Can't convert RAID LV centipede2/takeover while under snapshot.

Comment 19 errata-xmlrpc 2018-04-10 15:20:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0853

Note You need to log in before you can comment on or make changes to this bug.