Bug 1533639

Summary: scrubbing raids can lead to resource busy failures
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID QA Contact: cluster-qe <cluster-qe>
Status: CLOSED DEFERRED Docs Contact:
Severity: high    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, rhandlin, zkabelac
Version: 7.5   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1653744 (view as bug list) Environment:
Last Closed: 2018-12-21 16:50:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1653744    
Bug Blocks:    
Attachments:
Description Flags
verbose lvchange attempt none

Description Corey Marthaler 2018-01-11 20:49:57 UTC
Description of problem:
I've only reproduced this a few times in many iterations.

SCENARIO (raid10) - [small_raid_creation]
Make small raids
50 host-083: lvcreate  --type raid10 -i 2 -n small_50M -L 50M raid_sanity
45 host-083: lvcreate  --type raid10 -i 2 -n small_45M -L 45M raid_sanity
40 host-083: lvcreate  --type raid10 -i 2 -n small_40M -L 40M raid_sanity
35 host-083: lvcreate  --type raid10 -i 2 -n small_35M -L 35M raid_sanity
30 host-083: lvcreate  --type raid10_near -i 2 -n small_30M -L 30M raid_sanity
25 host-083: lvcreate  --type raid10_near -i 2 -n small_25M -L 25M raid_sanity
20 host-083: lvcreate  --type raid10_near -i 2 -n small_20M -L 20M raid_sanity
15 host-083: lvcreate  --type raid10 -i 2 -n small_15M -L 15M raid_sanity
10 host-083: lvcreate  --type raid10 -i 2 -n small_10M -L 10M raid_sanity
5 host-083: lvcreate  --type raid10_near -i 2 -n small_5M -L 5M raid_sanity

Removing the small raids
5
perform raid scrubbing (lvchange --syncaction repair) on raid raid_sanity/small_5M
  raid_sanity/small_5M state is currently "resync".  Unable to switch to "repair".
Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )
Sleeping 15 sec

Deactivating raid small_5M... and removing
10
perform raid scrubbing (lvchange --syncaction check) on raid raid_sanity/small_10M
  device-mapper: message ioctl on  LVM-zxytISeT3ilzK6YxO3tAZxiIWVcncdwppdP9PqAAnuE0sRk8pv4fQEuNCMNjgLSf failed: Device or resource busy
raid scrubbing failed


Version-Release number of selected component (if applicable):
3.10.0-827.el7.x86_64

lvm2-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-libs-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-cluster-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-lockd-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
lvm2-python-boom-0.8.1-5.el7    BUILT: Wed Dec  6 11:15:40 CET 2017
cmirror-2.02.176-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-libs-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-event-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-event-libs-1.02.145-5.el7    BUILT: Wed Dec  6 11:13:07 CET 2017
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 12:07:18 CET 2017
sanlock-3.6.0-1.el7    BUILT: Tue Dec  5 18:47:21 CET 2017
sanlock-lib-3.6.0-1.el7    BUILT: Tue Dec  5 18:47:21 CET 2017
vdo-6.1.0.106-13    BUILT: Thu Dec 21 16:00:07 CET 2017
kmod-kvdo-6.1.0.106-11.el7    BUILT: Thu Dec 21 17:09:12 CET 2017


How reproducible:
Seldom

Comment 2 Corey Marthaler 2018-01-11 20:51:54 UTC
Created attachment 1380222 [details]
verbose lvchange attempt

Comment 3 Corey Marthaler 2018-02-14 18:51:29 UTC
I don't think this is "small raid" dependent as I just repo'ed this on a 3G raid1 thin pool volume.

SCENARIO - [full_EXT_thin_snap_verification]
Verify full snapshots on top of EXT can be unmounted and removed
Making pool volume
Converting *Raid* volumes to thin pool and thin pool metadata devices
lvcreate  --type raid1 -m 1  --zero n -L 4M -n meta snapper_thinp
  WARNING: Logical volume snapper_thinp/meta not zeroed.
lvcreate  --type raid1 -m 1  --zero n -L 1G -n POOL snapper_thinp
  WARNING: Logical volume snapper_thinp/POOL not zeroed.
Waiting until all mirror|raid volumes become fully syncd...
   1/2 mirror(s) are fully synced: ( 62.95% 100.00% )
   2/2 mirror(s) are fully synced: ( 100.00% 100.00% )
Sleeping 15 sec
Sleeping 15 sec
lvconvert --zero n --thinpool snapper_thinp/POOL --poolmetadata meta --yes
  WARNING: Converting snapper_thinp/POOL and snapper_thinp/meta to thin pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)

Sanity checking pool device (POOL) metadata
thin_check /dev/mapper/snapper_thinp-meta_swap.503
examining superblock
examining devices tree
examining mapping tree
checking space map counts


Making origin volume
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n origin
  WARNING: Sum of all thin volume sizes (2.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
lvcreate  -V 2G -T snapper_thinp/POOL -n other1
  WARNING: Sum of all thin volume sizes (4.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n other2
  WARNING: Sum of all thin volume sizes (6.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n other3
  WARNING: Sum of all thin volume sizes (8.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
lvcreate  -V 2G -T snapper_thinp/POOL -n other4
  WARNING: Sum of all thin volume sizes (10.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
lvcreate  --virtualsize 2G -T snapper_thinp/POOL -n other5
  WARNING: Sum of all thin volume sizes (12.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB).
Placing an EXT filesystem on origin volume
mke2fs 1.42.9 (28-Dec-2013)
Making snapshot of origin volume
lvcreate  -y -k n -s /dev/snapper_thinp/origin -n full_snap
Mounting snapshot volume

Writing files to /mnt/full_snap
Checking files on /mnt/full_snap

Filling snapshot filesystem: /mnt/full_snap
dd if=/dev/zero of=/mnt/full_snap/fill_file count=1500 bs=1M oflag=direct
dd: error writing ‘/mnt/full_snap/fill_file’: Read-only file system
777+0 records in
776+0 records out
813694976 bytes (814 MB) copied, 74.567 s, 10.9 MB/s
Attempt to create other virtual volumes while pool is full and in RO mode
lvcreate  -V 1G -T snapper_thinp/POOL -n other6
  Cannot create new thin volume, free space in thin pool snapper_thinp/POOL reached threshold.
Extend the pool to kick it back into RW mode
lvextend -L +2G snapper_thinp/POOL
  WARNING: Sum of all thin volume sizes (14.00 GiB) exceeds the size of thin pools (3.00 GiB).
Again, attempt to create other virtual volumes while pool is full back in RW mode
lvcreate  -V 1G -T snapper_thinp/POOL -n other7
  WARNING: Sum of all thin volume sizes (15.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (3.00 GiB).

Checking files on /mnt/full_snap

Removing snap volume snapper_thinp/full_snap
lvremove -f /dev/snapper_thinp/full_snap

perform raid scrubbing on POOL_tdata
(lvchange --syncaction check snapper_thinp/POOL_tdata)
  device-mapper: message ioctl on  LVM-IQd88oo4U73QeJzlVHkP9S9EADIglPe5brRykOxO8D1TQWA7q5nF16aPSQmNrWxb-tdata failed: Device or resource busy
raid scrubbing check failed


3.10.0-851.el7.x86_64

lvm2-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
lvm2-libs-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
lvm2-cluster-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
lvm2-lockd-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
lvm2-python-boom-0.8.3-2.el7    BUILT: Wed Feb  7 17:42:09 CET 2018
cmirror-2.02.177-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-libs-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-event-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-event-libs-1.02.146-2.el7    BUILT: Wed Feb  7 17:39:26 CET 2018
device-mapper-persistent-data-0.7.3-3.el7    BUILT: Tue Nov 14 12:07:18 CET 2017

Comment 8 Jonathan Earl Brassow 2018-12-05 15:42:58 UTC
corey, is it possible this is a testing bug?  i.e. not waiting for a conflicting sync action to finish before starting a scrub?

Comment 9 Jonathan Earl Brassow 2018-12-05 16:17:48 UTC
(In reply to Jonathan Earl Brassow from comment #8)
> corey, is it possible this is a testing bug?  i.e. not waiting for a
> conflicting sync action to finish before starting a scrub?

<corey> the scrubbing tests actually wait for the raid to be sync'ed before attempted the repair scrub. Also, a "state is currently "resync".  Unable..." doesn't result in a failure, and certainly shouldn't produce some d-m ioclt message, so the test just moves on in those cases:

[root@hayes-03 ~]# lvcreate --type raid1 -m 1 -n raid -L 800G test
  Logical volume "raid" created.
[root@hayes-03 ~]# lvs -a -o +devices
  LV              VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                          
  raid            test rwi-a-r--- 800.00g                                    0.07             raid_rimage_0(0),raid_rimage_1(0)
  [raid_rimage_0] test Iwi-aor--- 800.00g                                                     /dev/sdf1(1)                     
  [raid_rimage_1] test Iwi-aor--- 800.00g                                                     /dev/sdg1(1)                     
  [raid_rmeta_0]  test ewi-aor---   4.00m                                                     /dev/sdf1(0)                     
  [raid_rmeta_1]  test ewi-aor---   4.00m                                                     /dev/sdg1(0)                     
[root@hayes-03 ~]# lvchange --syncaction repair test/raid
  test/raid state is currently "resync".  Unable to switch to "repair".
[root@hayes-03 ~]# echo $?
0

Comment 11 Heinz Mauelshagen 2018-12-07 17:10:59 UTC
(In reply to Jonathan Earl Brassow from comment #9)
> (In reply to Jonathan Earl Brassow from comment #8)
> > corey, is it possible this is a testing bug?  i.e. not waiting for a
> > conflicting sync action to finish before starting a scrub?
> 
> <corey> the scrubbing tests actually wait for the raid to be sync'ed before
> attempted the repair scrub. Also, a "state is currently "resync". 
> Unable..." doesn't result in a failure, and certainly shouldn't produce some
> d-m ioclt message, so the test just moves on in those cases:

dm-raid reports -EBUSY when a check/repair is being requested
(which we do via the targets message interface) in case the state
of the raid is not 'idle').  This looks like a remaining race effect
in message processing sync action checks to me, because I'm not using decipher_sync_action() there as oposed to the status interface but check
recovery flags directly as before...

> 
> [root@hayes-03 ~]# lvcreate --type raid1 -m 1 -n raid -L 800G test
>   Logical volume "raid" created.
> [root@hayes-03 ~]# lvs -a -o +devices
>   LV              VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log
> Cpy%Sync Convert Devices                          
>   raid            test rwi-a-r--- 800.00g                                   
> 0.07             raid_rimage_0(0),raid_rimage_1(0)
>   [raid_rimage_0] test Iwi-aor--- 800.00g                                   
> /dev/sdf1(1)                     
>   [raid_rimage_1] test Iwi-aor--- 800.00g                                   
> /dev/sdg1(1)                     
>   [raid_rmeta_0]  test ewi-aor---   4.00m                                   
> /dev/sdf1(0)                     
>   [raid_rmeta_1]  test ewi-aor---   4.00m                                   
> /dev/sdg1(0)                     
> [root@hayes-03 ~]# lvchange --syncaction repair test/raid
>   test/raid state is currently "resync".  Unable to switch to "repair".
> [root@hayes-03 ~]# echo $?
> 0

Comment 12 Heinz Mauelshagen 2018-12-21 16:50:40 UTC
This is unlikely to get triggered in the field, because the scrub needs to requested immediately after creation or a resync request.

Closing, please reopen in case there's further insight that the fix needs to go into RHEL7.