Bug 1751887
| Summary: | allocation policy does not work for NON synced raid image failures (Unable to replace devices while it is not in-sync) | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Corey Marthaler <cmarthal> | |
| Component: | lvm2 | Assignee: | Heinz Mauelshagen <heinzm> | |
| lvm2 sub component: | Mirroring and RAID | QA Contact: | cluster-qe <cluster-qe> | |
| Status: | CLOSED WONTFIX | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | medium | CC: | agk, heinzm, jbrassow, msnitzer, ncroxon, pasik, prajnoha, rhandlin, zkabelac | |
| Version: | 8.1 | Keywords: | Triaged | |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
|
| Target Release: | 8.5 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1765638 (view as bug list) | Environment: | ||
| Last Closed: | 2021-03-12 07:31:52 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1765638 | |||
|
Description
Corey Marthaler
2019-09-12 21:03:36 UTC
Here almost the exact same scenario as in comment #0, but this iteration it passes as expected even though it's just as out of sync before the failure. This makes the verification of bug 1729303 difficult since we can't definitively lift all the hacks in our tests if results can be random. ================================================================================ Iteration 0.7 started at Thu Sep 12 17:20:50 CDT 2019 ================================================================================ Scenario kill_random_non_synced_raid1_3legs: Kill random leg of NON synced 3 leg raid1 volume(s) ********* RAID hash info for this scenario ********* * names: non_synced_random_raid1_3legs_1 * sync: 0 * type: raid1 * -m |-i value: 3 * leg devices: /dev/sdm1 /dev/sdl1 /dev/sdp1 /dev/sdn1 * spanned legs: 0 * manual repair: 1 * no MDA devices: * failpv(s): /dev/sdn1 * additional snap: /dev/sdm1 * failnode(s): hayes-02 * raid fault policy: allocate ****************************************************** Creating raids(s) on hayes-02... hayes-02: lvcreate --type raid1 -m 3 -n non_synced_random_raid1_3legs_1 -L 3G black_bird /dev/sdm1:0-3600 /dev/sdl1:0-3600 /dev/sdp1:0-3600 /dev/sdn1:0-3600 Current mirror/raid device structure(s): LV Attr LSize Cpy%Sync Devices non_synced_random_raid1_3legs_1 rwi-a-r--- 3.00g 0.00 non_synced_random_raid1_3legs_1_rimage_0(0),non_synced_random_raid1_3legs_1_rimage_1(0),non_synced_random_raid1_3legs_1_rimage_2(0),non_synced_random_raid1_3legs_1_rimage_3(0) [non_synced_random_raid1_3legs_1_rimage_0] Iwi-aor--- 3.00g /dev/sdm1(1) [non_synced_random_raid1_3legs_1_rimage_1] Iwi-aor--- 3.00g /dev/sdl1(1) [non_synced_random_raid1_3legs_1_rimage_2] Iwi-aor--- 3.00g /dev/sdp1(1) [non_synced_random_raid1_3legs_1_rimage_3] Iwi-aor--- 3.00g /dev/sdn1(1) [non_synced_random_raid1_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdm1(0) [non_synced_random_raid1_3legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdl1(0) [non_synced_random_raid1_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdp1(0) [non_synced_random_raid1_3legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sdn1(0) Creating xfs on top of mirror(s) on hayes-02... Mounting mirrored xfs filesystems on hayes-02... PV=/dev/sdn1 non_synced_random_raid1_3legs_1_rimage_3: 1.0 non_synced_random_raid1_3legs_1_rmeta_3: 1.0 Creating a snapshot volume of each of the raids Writing verification files (checkit) to mirror(s) on... ---- hayes-02 ---- Verifying files (checkit) on mirror(s) on... ---- hayes-02 ---- Current sync percent just before failure ( 69.94% ) Disabling device sdn on hayes-02rescan device... Attempting I/O to cause mirror down conversion(s) on hayes-02 dd if=/dev/zero of=/mnt/non_synced_random_raid1_3legs_1/ddfile count=10 bs=4M 10+0 records in 10+0 records out 41943040 bytes (42 MB, 40 MiB) copied, 0.0198669 s, 2.1 GB/s Verifying current sanity of lvm after the failure Current mirror/raid device structure(s): /dev/sdn: open failed: No such device or address /dev/sdn1: open failed: No such device or address /dev/sdn: open failed: No such device or address /dev/sdn1: open failed: No such device or address WARNING: Couldn't find device with uuid x9IWTZ-18yX-fzcZ-20W7-ffzP-Upqx-ndrFCj. WARNING: VG black_bird is missing PV x9IWTZ-18yX-fzcZ-20W7-ffzP-Upqx-ndrFCj. LV Attr LSize Cpy%Sync Devices bb_snap1 swi-a-s--- 252.00m /dev/sdm1(769) non_synced_random_raid1_3legs_1 owi-aor--- 3.00g 100.00 non_synced_random_raid1_3legs_1_rimage_0(0),non_synced_random_raid1_3legs_1_rimage_1(0),non_synced_random_raid1_3legs_1_rimage_2(0),non_synced_random_raid1_3legs_1_rimage_3(0) [non_synced_random_raid1_3legs_1_rimage_0] iwi-aor--- 3.00g /dev/sdm1(1) [non_synced_random_raid1_3legs_1_rimage_1] iwi-aor--- 3.00g /dev/sdl1(1) [non_synced_random_raid1_3legs_1_rimage_2] iwi-aor--- 3.00g /dev/sdp1(1) [non_synced_random_raid1_3legs_1_rimage_3] iwi-aor--- 3.00g /dev/sdo1(1) [non_synced_random_raid1_3legs_1_rmeta_0] ewi-aor--- 4.00m /dev/sdm1(0) [non_synced_random_raid1_3legs_1_rmeta_1] ewi-aor--- 4.00m /dev/sdl1(0) [non_synced_random_raid1_3legs_1_rmeta_2] ewi-aor--- 4.00m /dev/sdp1(0) [non_synced_random_raid1_3legs_1_rmeta_3] ewi-aor--- 4.00m /dev/sdo1(0) Verifying FAILED device /dev/sdn1 is *NOT* in the volume(s) Verifying IMAGE device /dev/sdm1 *IS* in the volume(s) Verifying IMAGE device /dev/sdl1 *IS* in the volume(s) Verifying IMAGE device /dev/sdp1 *IS* in the volume(s) Verify the rimage/rmeta dm devices remain after the failures Checking EXISTENCE and STATE of non_synced_random_raid1_3legs_1_rimage_3 on: hayes-02 This automatic repair worked Sep 12 17:21:11 hayes-02 qarshd[40643]: Running cmdline: sync Sep 12 17:21:12 hayes-02 kernel: sd 0:2:13:0: rejecting I/O to offline device Sep 12 17:21:12 hayes-02 kernel: print_req_error: I/O error, dev sdn, sector 12496 flags 0 Sep 12 17:21:12 hayes-02 kernel: raid1_end_read_request: 55 callbacks suppressed Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: dm-7: rescheduling sector 2216 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: dm-7: rescheduling sector 2224 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: dm-7: rescheduling sector 2232 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: dm-7: rescheduling sector 2240 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: dm-7: rescheduling sector 2248 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: dm-7: rescheduling sector 2256 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: dm-7: rescheduling sector 2264 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: dm-7: rescheduling sector 2272 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: dm-7: rescheduling sector 2280 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: dm-7: rescheduling sector 2288 Sep 12 17:21:12 hayes-02 kernel: sd 0:2:13:0: rejecting I/O to offline device Sep 12 17:21:12 hayes-02 kernel: print_req_error: I/O error, dev sdn, sector 12496 flags 800 Sep 12 17:21:12 hayes-02 kernel: sd 0:2:13:0: rejecting I/O to offline device Sep 12 17:21:12 hayes-02 kernel: print_req_error: I/O error, dev sdn, sector 12496 flags 801 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: Disk failure on dm-7, disabling device.#012md/raid1:mdX: Operation continuing on 3 devices. Sep 12 17:21:12 hayes-02 kernel: raid1_read_request: 55 callbacks suppressed Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: redirecting sector 2216 to other mirror: dm-1 Sep 12 17:21:12 hayes-02 lvm[18462]: WARNING: Device #3 of raid1 array, black_bird-non_synced_random_raid1_3legs_1-real, has failed. Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: redirecting sector 2224 to other mirror: dm-1 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: redirecting sector 2232 to other mirror: dm-1 Sep 12 17:21:12 hayes-02 lvm[18462]: /dev/sdn: open failed: No such device or address Sep 12 17:21:12 hayes-02 kernel: sd 0:2:13:0: rejecting I/O to offline device Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: redirecting sector 2240 to other mirror: dm-1 Sep 12 17:21:12 hayes-02 kernel: print_req_error: I/O error, dev sdn, sector 40 flags 0 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: redirecting sector 2248 to other mirror: dm-1 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: redirecting sector 2256 to other mirror: dm-1 Sep 12 17:21:12 hayes-02 lvm[18462]: WARNING: Couldn't find device with uuid x9IWTZ-18yX-fzcZ-20W7-ffzP-Upqx-ndrFCj. Sep 12 17:21:12 hayes-02 lvm[18462]: WARNING: VG black_bird is missing PV x9IWTZ-18yX-fzcZ-20W7-ffzP-Upqx-ndrFCj. Sep 12 17:21:12 hayes-02 lvm[18462]: WARNING: Couldn't find all devices for LV black_bird/non_synced_random_raid1_3legs_1_rimage_3 while checking used and assumed devices. Sep 12 17:21:12 hayes-02 lvm[18462]: WARNING: Couldn't find all devices for LV black_bird/non_synced_random_raid1_3legs_1_rmeta_3 while checking used and assumed devices. Sep 12 17:21:12 hayes-02 lvm[18462]: WARNING: Couldn't find device with uuid x9IWTZ-18yX-fzcZ-20W7-ffzP-Upqx-ndrFCj. Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: redirecting sector 2264 to other mirror: dm-1 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: redirecting sector 2272 to other mirror: dm-1 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: redirecting sector 2280 to other mirror: dm-1 Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: redirecting sector 2288 to other mirror: dm-1 Sep 12 17:21:12 hayes-02 lvm[18462]: WARNING: Couldn't find device with uuid x9IWTZ-18yX-fzcZ-20W7-ffzP-Upqx-ndrFCj. Sep 12 17:21:12 hayes-02 kernel: device-mapper: raid: Device 3 specified for rebuild; clearing superblock Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: active with 3 out of 4 mirrors Sep 12 17:21:12 hayes-02 kernel: md: recovery of RAID array mdX Sep 12 17:21:12 hayes-02 lvm[18462]: WARNING: Couldn't find device with uuid x9IWTZ-18yX-fzcZ-20W7-ffzP-Upqx-ndrFCj. Sep 12 17:21:12 hayes-02 kernel: md/raid1:mdX: active with 3 out of 4 mirrors Sep 12 17:21:12 hayes-02 kernel: md: mdX: recovery interrupted. Sep 12 17:21:12 hayes-02 kernel: md: recovery of RAID array mdX Sep 12 17:21:12 hayes-02 lvm[18462]: Faulty devices in black_bird/non_synced_random_raid1_3legs_1 successfully replaced. Sep 12 17:21:12 hayes-02 lvm[18462]: raid1 array, black_bird-non_synced_random_raid1_3legs_1-real, is not in-sync. Sep 12 17:21:28 hayes-02 kernel: md: mdX: recovery done. Just a side comment... the warning "Sep 12 13:06:02 hayes-02 lvm[18462]: WARNING: Sync status for black_bird/non_synced_random_raid1_3legs_1 is inconsistent." simply means that MD is likely undergoing a state transition and the status will be retried... I believe it is harmless. In the case that failed (comment 0), we correctly read the RAID array as not-in-sync: "Sep 12 13:06:02 hayes-02 lvm[18462]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync." Therefore, we do not replace the failed device. We /could/ replace it, given that there is plenty of redundancy in the array. However, we choose to warn the user - requiring them to run 'repair' later on their own. This is also acceptable behavior, as long as it is defined and consistent. It would be better add some language to the raid_fault_policy setting to read: # allocate # Attempt to use any extra physical volumes in the VG as spares and # replace faulty devices. If the array has not yet completed an initial resync... In the case that succeeded (comment 1), we seem to have a strange sequence: Sep 12 17:21:12 hayes-02 lvm[18462]: Faulty devices in black_bird/non_synced_random_raid1_3legs_1 successfully replaced. Sep 12 17:21:12 hayes-02 lvm[18462]: raid1 array, black_bird-non_synced_random_raid1_3legs_1-real, is not in-sync. The 'not in-sync' should have been detected first and the replacement rejected... but here is what I /think/ is going on. The initial sync /did/ succeed! Evidence of this would have been listed in the logs before those provided in comment 1. The report of "*-real is not in-sync" came after the repair was made and an event (likely from the snapshot over the top of the RAID) caused it to do a status inquiry on the RAID and see it was not yet in-sync after the repair was made. My conclusion is that there is no bug here at all. However, 1) the 'allocate' fault policy could be better documented 2) I find it strange that the output of the 'lvs' command after a device failure in comment 1 sees the array as in-sync (100%) and all the subLVs in perfect order. This would not have been the case for 16 seconds after the automated repair was made, so what's up with that? Another important artifact that I noticed when testing myself (I apparently forgot RAID did this)... When a RAID with a failed device in it finishes sync'ing, an event is raised and dmeventd responds. In the normal case, dmeventd would simply respond by saying the RAID was in-sync now. However, when a failed device is present and 'allocate' is set, it immediately performs a repair - after all, it did just receive an event. The end result is that the repair happens anyway when the sync is finished. I've confirmed this on RAID LVs with and without snapshots on them. I even get this handy log message: Sep 19 13:30:42 bp-01 lvm[22398]: WARNING: waiting for resynchronization to finish before initiating repair on RAID device vg-lv. ... which strangely, I don't see in comment 0 or comment 1. So far, we are up to three work items for this bug (assuming I am correct and that there is no bug here): 1) update comments for 'allocate' fault policy in conf file 2) revert commit ad560a286a0b5d08086324e6194b060c136e9353, but follow up by only pulling the 'goto' (basically, restore the warning) 3) figure out why the 'lvs' in comment 1 is the way it is (see comment 2 for further explanation) Heinz Seconded > 2) revert commit ad560a286a0b5d08086324e6194b060c136e9353, but follow up by Ok > only pulling the 'goto' (basically, restore the warning) That would render us with the wrong message semantics because the repair may still succeed anyway when not bailing out and calling lvconvert. I.e. we have to change the message reverting and removing the goto to "not in-sync, we try the repair anyway". This is then down to the basic kernel race I was trying to explain a few times on bzs and calls (DM status thread running concurrently to the MD sync thread updating mddev/md_rdev properties non-atomically) Even without tackling the basic kernel race, we may be able to do better in the dmeventd raid plugin by rechecking sync action and sync ratio in a loop with sub second delays imposing a short timeout. I know, this is suboptimal but we'd be again working around aforementioned kernel race. > 3) figure out why the 'lvs' in comment 1 is the way it is (see comment 2 for > further explanation) Sorry, seconded was relative to "update comments for 'allocate' fault policy in conf file" We concluded to reintroduce the warning message and let lvconvert run as before. lvm2 master commit ad560a286a0b5d08086324e6194b060c136e9353 reverted in 6fc46af8ecd9532aca41df43fd588fb207ed4e92 warning reintroduced in 6f355c673631b0d7959191c8a56a577b3a0e97c9 lvm2 stable-2.02 commit 9e438b4bc6b9240b63fc79acfef3c77c01a848d8 reverted in 0585754593d7c010d83274c3a25dd6c3e8c8b4a8 warning reintroduced in 8d3e01ff4f94a8d36b16520a5e402dbc7539dd2c Corey will try to add more gathered info relative to about when his STS tests believe the mirror/raid to be out of sync, but the kernel believes it to be "in sync". I've been able to reproduce the scenario where the raid is *not* auto repaired due to not being in-sync quite a few times (comment #0). [root@hayes-03 ~]# grep "while it is not in-sync" /var/log/messages Aug 12 22:01:33 hayes-03 lvm[3436]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 10:26:48 hayes-03 lvm[3436]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 11:41:08 hayes-03 lvm[3436]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 11:54:15 hayes-03 lvm[3436]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 12:07:04 hayes-03 lvm[3436]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 12:19:57 hayes-03 lvm[3436]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 13:19:56 hayes-03 lvm[2812]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 13:32:46 hayes-03 lvm[2812]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 13:45:30 hayes-03 lvm[2812]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 13:58:01 hayes-03 lvm[2812]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 14:10:31 hayes-03 lvm[2812]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 14:49:56 hayes-03 lvm[2812]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 16:06:03 hayes-03 lvm[2812]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. Aug 13 18:02:09 hayes-03 lvm[2812]: Unable to replace devices in black_bird/non_synced_random_raid1_3legs_1 while it is not in-sync. However, I have not been able to reproduce the scenario where it *is* auto repaired (comment #1). So maybe we should add to the lvm.conf file (like Jon mentioned) "If the array has not yet completed an initial resync..." at a minimum. I'll adjust the tests to look for scenarios where it is auto repaired and report a bug if that happens again. kernel-4.18.0-232.el8 BUILT: Mon Aug 10 02:17:54 CDT 2020 lvm2-2.03.09-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 lvm2-libs-2.03.09-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 lvm2-dbusd-2.03.09-5.el8 BUILT: Wed Aug 12 15:49:44 CDT 2020 lvm2-lockd-2.03.09-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 device-mapper-1.02.171-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 device-mapper-libs-1.02.171-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 device-mapper-event-1.02.171-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 device-mapper-event-libs-1.02.171-5.el8 BUILT: Wed Aug 12 15:51:50 CDT 2020 Continued testing over the weekend appears to prove that a raid not in sync at failure time will *not* be auto repaired and will need to be manually repaired. We should mention this in the docs or in the raid_fault_policy section of the lvm.conf file like mentioned in the above comment. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |