1533639 – scrubbing raids can lead to resource busy failures

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1533639 - scrubbing raids can lead to resource busy failures

Summary: scrubbing raids can lead to resource busy failures

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.5
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Heinz Mauelshagen
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1653744
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-11 20:49 UTC by Corey Marthaler
Modified:	2021-09-03 12:36 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1653744 (view as bug list)
Environment:
Last Closed:	2018-12-21 16:50:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
verbose lvchange attempt (56.40 KB, text/plain) 2018-01-11 20:51 UTC, Corey Marthaler	no flags	Details
View All

Description Corey Marthaler 2018-01-11 20:49:57 UTC

Description of problem:
I've only reproduced this a few times in many iterations.

SCENARIO (raid10) - [small_raid_creation]
Make small raids
50 host-083: lvcreate  --type raid10 -i 2 -n small_50M -L 50M raid_sanity
45 host-083: lvcreate  --type raid10 -i 2 -n small_45M -L 45M raid_sanity
40 host-083: lvcreate  --type raid10 -i 2 -n small_40M -L 40M raid_sanity
35 host-083: lvcreate  --type raid10 -i 2 -n small_35M -L 35M raid_sanity
30 host-083: lvcreate  --type raid10_near -i 2 -n small_30M -L 30M raid_sanity
25 host-083: lvcreate  --type raid10_near -i 2 -n small_25M -L 25M raid_sanity
20 host-083: lvcreate  --type raid10_near -i 2 -n small_20M -L 20M raid_sanity
15 host-083: lvcreate  --type raid10 -i 2 -n small_15M -L 15M raid_sanity
10 host-083: lvcreate  --type raid10 -i 2 -n small_10M -L 10M raid_sanity
5 host-083: lvcreate  --type raid10_near -i 2 -n small_5M -L 5M raid_sanity

Removing the small raids
5
perform raid scrubbing (lvchange --syncaction repair) on raid raid_sanity/small_5M
  raid_sanity/small_5M state is currently "resync".  Unable to switch to "repair".
Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Deactivating raid small_5M... and removing
10
perform raid scrubbing (lvchange --syncaction check) on raid raid_sanity/small_10M
  device-mapper: message ioctl on  LVM-zxytISeT3ilzK6YxO3tAZxiIWVcncdwppdP9PqAAnuE0sRk8pv4fQEuNCMNjgLSf failed: Device or resource busy
raid scrubbing failed


Version-Release number of selected component (if applicable):
3.10.0-827.el7.x86_64

lvm2-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-libs-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-cluster-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-lockd-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-python-boom-0.8.1-5.el7    BUILT: Wed Dec  6 11:15:40 CET 2017
cmirror-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-libs-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-event-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-event-libs-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 12:07:18 CET 2017
sanlock-3.6.0-1.el7    BUILT: Tue Dec  5 18:47:21 CET 2017
sanlock-lib-3.6.0-1.el7    BUILT: Tue Dec  5 18:47:21 CET 2017
vdo-6.1.0.106-13    BUILT: Thu Dec 21 16:00:07 CET 2017
kmod-kvdo-6.1.0.106-11.el7    BUILT: Thu Dec 21 17:09:12 CET 2017


How reproducible:
Seldom

Comment 2 Corey Marthaler 2018-01-11 20:51:54 UTC

Created attachment 1380222 [details]
verbose lvchange attempt

Comment 3 Corey Marthaler 2018-02-14 18:51:29 UTC

I don't think this is "small raid" dependent as I just repo'ed this on a 3G raid1 thin pool volume.

SCENARIO - [full_EXT_thin_snap_verification]
Verify full snapshots on top of EXT can be unmounted and removed
Making pool volume
Converting *Raid* volumes to thin pool and thin pool metadata devices
lvcreate  --type raid1 -m 1  --zero n -L 4M -n meta snapper_thinp
  WARNING: Logical volume snapper_thinp/meta not zeroed.
lvcreate  --type raid1 -m 1  --zero n -L 1G -n POOL snapper_thinp
  WARNING: Logical volume snapper_thinp/POOL not zeroed.
Waiting until all mirror|raid volumes become fully syncd...
   1/2 mirror(s) are fully synced: ( 62.95% 100.00% )
   2/2 mirror(s) are fully synced: ( 100.00% 100.00% )
Sleeping 15 sec
Sleeping 15 sec
lvconvert --zero n --thinpool snapper_thinp/POOL --poolmetadata meta --yes
  WARNING: Converting snapper_thinp/POOL and snapper_thinp/meta to thin pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)

Sanity checking pool device (POOL) metadata
thin_check /dev/mapper/snapper_thinp-meta_swap.503
examining superblock
examining devices tree
examining mapping tree
checking space map counts


Making origin volume
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n origin
  WARNING: Sum of all thin volume sizes (2.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
lvcreate  -V 2G -T snapper_thinp/POOL -n other1
  WARNING: Sum of all thin volume sizes (4.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n other2
  WARNING: Sum of all thin volume sizes (6.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n other3
  WARNING: Sum of all thin volume sizes (8.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
lvcreate  -V 2G -T snapper_thinp/POOL -n other4
  WARNING: Sum of all thin volume sizes (10.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n other5
  WARNING: Sum of all thin volume sizes (12.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
Placing an EXT filesystem on origin volume
mke2fs 1.42.9 (28-Dec-2013)
Making snapshot of origin volume
lvcreate  -y -k n -s /dev/snapper_thinp/origin -n full_snap
Mounting snapshot volume

Writing files to /mnt/full_snap
Checking files on /mnt/full_snap

Filling snapshot filesystem: /mnt/full_snap
dd if=/dev/zero of=/mnt/full_snap/fill_file count=1500 bs=1M oflag=direct
dd: error writing ‘/mnt/full_snap/fill_file’: Read-only file system
777+0 records in
776+0 records out
813694976 bytes (814 MB) copied, 74.567 s, 10.9 MB/s
Attempt to create other virtual volumes while pool is full and in RO mode
lvcreate  -V 1G -T snapper_thinp/POOL -n other6
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
Extend the pool to kick it back into RW mode
lvextend -L +2G snapper_thinp/POOL
  WARNING: Sum of all thin volume sizes (14.00 GiB) exceeds the size of thin pools (3.00 GiB).
Again, attempt to create other virtual volumes while pool is full back in RW mode
lvcreate  -V 1G -T snapper_thinp/POOL -n other7
  WARNING: Sum of all thin volume sizes (15.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (3.00 GiB).

Checking files on /mnt/full_snap

Removing snap volume snapper_thinp/full_snap
lvremove -f /dev/snapper_thinp/full_snap

perform raid scrubbing on POOL_tdata
(lvchange --syncaction check snapper_thinp/POOL_tdata)
  device-mapper: message ioctl on  LVM-IQd88oo4U73QeJzlVHkP9S9EADIglPe5brRykOxO8D1TQWA7q5nF16aPSQmNrWxb-tdata failed: Device or resource busy
raid scrubbing check failed


3.10.0-851.el7.x86_64

lvm2-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
lvm2-libs-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
lvm2-cluster-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
lvm2-lockd-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
lvm2-python-boom-0.8.3-2.el7    BUILT: Wed Feb  7 17:42:09 CET 2018
cmirror-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-libs-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-event-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-event-libs-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 12:07:18 CET 2017

Comment 8 Jonathan Earl Brassow 2018-12-05 15:42:58 UTC

corey, is it possible this is a testing bug?  i.e. not waiting for a conflicting sync action to finish before starting a scrub?

Comment 9 Jonathan Earl Brassow 2018-12-05 16:17:48 UTC

(In reply to Jonathan Earl Brassow from comment #8)
> corey, is it possible this is a testing bug?  i.e. not waiting for a
> conflicting sync action to finish before starting a scrub?

<corey> the scrubbing tests actually wait for the raid to be sync'ed before attempted the repair scrub. Also, a "state is currently "resync".  Unable..." doesn't result in a failure, and certainly shouldn't produce some d-m ioclt message, so the test just moves on in those cases:

[root@hayes-03 ~]# lvcreate --type raid1 -m 1 -n raid -L 800G test
  Logical volume "raid" created.
[root@hayes-03 ~]# lvs -a -o +devices
  LV              VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                          
  raid            test rwi-a-r--- 800.00g                                    0.07             raid_rimage_0(0),raid_rimage_1(0)
  [raid_rimage_0] test Iwi-aor--- 800.00g                                                     /dev/sdf1(1)                     
  [raid_rimage_1] test Iwi-aor--- 800.00g                                                     /dev/sdg1(1)                     
  [raid_rmeta_0]  test ewi-aor---   4.00m                                                     /dev/sdf1(0)                     
  [raid_rmeta_1]  test ewi-aor---   4.00m                                                     /dev/sdg1(0)                     
[root@hayes-03 ~]# lvchange --syncaction repair test/raid
  test/raid state is currently "resync".  Unable to switch to "repair".
[root@hayes-03 ~]# echo $?
0

Comment 11 Heinz Mauelshagen 2018-12-07 17:10:59 UTC

(In reply to Jonathan Earl Brassow from comment #9)
> (In reply to Jonathan Earl Brassow from comment #8)
> > corey, is it possible this is a testing bug?  i.e. not waiting for a
> > conflicting sync action to finish before starting a scrub?
> 
> <corey> the scrubbing tests actually wait for the raid to be sync'ed before
> attempted the repair scrub. Also, a "state is currently "resync". 
> Unable..." doesn't result in a failure, and certainly shouldn't produce some
> d-m ioclt message, so the test just moves on in those cases:

dm-raid reports -EBUSY when a check/repair is being requested
(which we do via the targets message interface) in case the state
of the raid is not 'idle').  This looks like a remaining race effect
in message processing sync action checks to me, because I'm not using decipher_sync_action() there as oposed to the status interface but check
recovery flags directly as before...

> 
> [root@hayes-03 ~]# lvcreate --type raid1 -m 1 -n raid -L 800G test
>   Logical volume "raid" created.
> [root@hayes-03 ~]# lvs -a -o +devices
>   LV              VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log
> Cpy%Sync Convert Devices                          
>   raid            test rwi-a-r--- 800.00g                                   
> 0.07             raid_rimage_0(0),raid_rimage_1(0)
>   [raid_rimage_0] test Iwi-aor--- 800.00g                                   
> /dev/sdf1(1)                     
>   [raid_rimage_1] test Iwi-aor--- 800.00g                                   
> /dev/sdg1(1)                     
>   [raid_rmeta_0]  test ewi-aor---   4.00m                                   
> /dev/sdf1(0)                     
>   [raid_rmeta_1]  test ewi-aor---   4.00m                                   
> /dev/sdg1(0)                     
> [root@hayes-03 ~]# lvchange --syncaction repair test/raid
>   test/raid state is currently "resync".  Unable to switch to "repair".
> [root@hayes-03 ~]# echo $?
> 0

Comment 12 Heinz Mauelshagen 2018-12-21 16:50:40 UTC

This is unlikely to get triggered in the field, because the scrub needs to requested immediately after creation or a resync request.

Closing, please reopen in case there's further insight that the fix needs to go into RHEL7.

Note You need to log in before you can comment on or make changes to this bug.