Bug 1653744
Summary: | scrubbing raids can lead to resource busy failures | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Corey Marthaler <cmarthal> |
Component: | kernel | Assignee: | Heinz Mauelshagen <heinzm> |
kernel sub component: | RAID | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | agk, cluster-qe, heinzm, hkrzesin, jbrassow, msnitzer, prajnoha, rhandlin, zkabelac |
Version: | 8.0 | Keywords: | Reopened |
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-4.18.0-58.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1533639 | Environment: | |
Last Closed: | 2019-06-14 01:18:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1533639 |
Comment 1
Corey Marthaler
2018-11-27 14:35:16 UTC
Sending a check/repair message infrequently leads to -EBUSY as oposed to identifying an active resync properly. Cause is raid_message() testing recovery bits in a racy way. Fix by calling decipher_sync_action() from raid_message() to identify the idle state of the RAID device properly. Upstream patch -> https://patchwork.kernel.org/patch/10736023/ *** This bug has been marked as a duplicate of bug 1658757 *** Brew test kernel build -> http://brew-task-repos.usersys.redhat.com/repos/scratch/heinzm/kernel/4.18.0/57.el8.bz1653744/ I was able to regress "Device or resource busy\nraid scrubbing failed" without the patch with: for i in {0..99}; do lvremove -y $vg/$lv ; lvcreate -n$lv -L8 -m1 -y $vg ; lvchange --syncaction check/repair $vg/$lv ; done quickly within a few cycles. With the patch, it did not regress within a thousand cycles. Marking this verified in the latest kernel. kernel-4.18.0-75.el8 BUILT: Fri Mar 1 11:37:34 CST 2019 lvm2-2.03.02-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 lvm2-libs-2.03.02-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 lvm2-dbusd-2.03.02-6.el8 BUILT: Fri Feb 22 04:50:28 CST 2019 lvm2-lockd-2.03.02-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 device-mapper-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 device-mapper-libs-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 device-mapper-event-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 device-mapper-event-libs-1.02.155-6.el8 BUILT: Fri Feb 22 04:47:54 CST 2019 device-mapper-persistent-data-0.7.6-1.el8 BUILT: Sun Aug 12 04:21:55 CDT 2018 Raid scrubbing has been turned back on in LVM testing and no errors were seen as of late. We'll open a new bug if issues are seen again. ============================================================ Iteration 135 of 20000 started at Fri Mar 15 10:19:25 CDT 2019 ============================================================ SCENARIO (raid10) - [small_raid_creation] Make small raids 50 hayes-03: lvcreate --type raid10_near -i 2 -n small_50M -L 50M raid_sanity 45 hayes-03: lvcreate --type raid10_near -i 2 -n small_45M -L 45M raid_sanity 40 hayes-03: lvcreate --type raid10_near -i 2 -n small_40M -L 40M raid_sanity 35 hayes-03: lvcreate --type raid10 -i 2 -n small_35M -L 35M raid_sanity 30 hayes-03: lvcreate --type raid10_near -i 2 -n small_30M -L 30M raid_sanity 25 hayes-03: lvcreate --type raid10 -i 2 -n small_25M -L 25M raid_sanity 20 hayes-03: lvcreate --type raid10 -i 2 -n small_20M -L 20M raid_sanity 15 hayes-03: lvcreate --type raid10 -i 2 -n small_15M -L 15M raid_sanity 10 hayes-03: lvcreate --type raid10_near -i 2 -n small_10M -L 10M raid_sanity 5 hayes-03: lvcreate --type raid10 -i 2 -n small_5M -L 5M raid_sanity Removing the small raids 5 perform raid scrubbing (lvchange --syncaction check) on raid raid_sanity/small_5M Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Deactivating raid small_5M... and removing 10 perform raid scrubbing (lvchange --syncaction repair) on raid raid_sanity/small_10M Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Deactivating raid small_10M... and removing 15 Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec perform raid rebuilding (lvchange --yes --rebuild /dev/sdd1 raid_sanity/small_15M) rebuild attr:raid_sanity-small_15M: 0 32768 raid raid10 4 AAAA 32768/32768 idle 0 0 - Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Deactivating raid small_15M... and removing 20 perform raid scrubbing (lvchange --syncaction repair) on raid raid_sanity/small_20M Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Deactivating raid small_20M... and removing 25 Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec perform raid rebuilding (lvchange --yes --rebuild /dev/sdd1 raid_sanity/small_25M) rebuild attr:raid_sanity-small_25M: 0 65536 raid raid10 4 AAAA 65536/65536 idle 0 0 - Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Deactivating raid small_25M... and removing 30 Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec perform raid rebuilding (lvchange --yes --rebuild /dev/sdd1 raid_sanity/small_30M) rebuild attr:raid_sanity-small_30M: 0 65536 raid raid10 4 AAAA 65536/65536 idle 0 0 - Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Deactivating raid small_30M... and removing 35 Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec perform raid rebuilding (lvchange --yes --rebuild /dev/sdd1 raid_sanity/small_35M) rebuild attr:raid_sanity-small_35M: 0 81920 raid raid10 4 aAAA 40320/81920 recover 0 0 - Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Deactivating raid small_35M... and removing 40 Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec perform raid rebuilding (lvchange --yes --rebuild /dev/sdd1 raid_sanity/small_40M) rebuild attr:raid_sanity-small_40M: 0 81920 raid raid10 4 aAAA 37632/81920 recover 0 0 - Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Deactivating raid small_40M... and removing 45 perform raid scrubbing (lvchange --syncaction check) on raid raid_sanity/small_45M Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Deactivating raid small_45M... and removing 50 perform raid scrubbing (lvchange --syncaction check) on raid raid_sanity/small_50M Waiting until all mirror|raid volumes become fully syncd... 1/1 mirror(s) are fully synced: ( 100.00% ) Sleeping 15 sec Deactivating raid small_50M... and removing |