Bug 903249
Summary: | LVM RAID: 'lvs' does not always report the proper status of a RAID LV | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jonathan Earl Brassow <jbrassow> | |
Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> | |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 6.4 | CC: | agk, cmarthal, dwysocha, heinzm, jbrassow, lnovich, msnitzer, nperic, prajnoha, prockai, slevine, thornber, zkabelac | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | lvm2-2.02.100-1.el6 | Doc Type: | Bug Fix | |
Doc Text: |
Previously, if a device temporarily failed the kernel would notice the interruption and regard the device failed. The kernel needs to be notified before it regards the device as alive again. LVM, however, would be able to see the device and 'lvs' would report the device as operating normally (i.e. without the partial attribute) - even though the kernel still regarded the device as failed. The user had to use 'dmsetup' in order to find out the true state of the device.
Now 'lvs' will print a 'p' (partial) attribute if a device is missing and will also print a 'r' (refresh/replace) if the device is present but the kernel regards the device as still missing. Upon seeing an 'r' attribute for a RAID logical volume, the user can then decide if the array should be refresh (reloaded into the kernel using 'lvchange --refresh') or if the device should be replaced.
|
Story Points: | --- | |
Clone Of: | ||||
: | 987094 (view as bug list) | Environment: | ||
Last Closed: | 2013-11-21 23:19:29 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 912336, 987094 |
Description
Jonathan Earl Brassow
2013-01-23 15:00:04 UTC
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux. It is not just the (p)artial flag that is misreported, the character that indicates whether a particular image in a RAID LV is in-sync or not can also be improved. Here is an excerpt from a recent commit: The other case where 'lvs' gives incomplete or improper output is when a device is replaced or added to a RAID LV. It should display that the RAID LV is in the process of sync'ing and that the new device is the only one that is not-in-sync - as indicated by a leading 'I' in the Attr column. (Remember that 'i' indicates an (i)mage that is in-sync and 'I' indicates an (I)mage that is not in sync.) Here's an example of the old incorrect behaviour: [root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg LV VG Attr Cpy%Sync Devices lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0) [lv_rimage_0] vg iwi-aor-- /dev/sda1(1) [lv_rimage_1] vg iwi-aor-- /dev/sdb1(1) [lv_rmeta_0] vg ewi-aor-- /dev/sda1(0) [lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0) [root@bp-01 lvm2]# lvconvert -m +1 vg/lv; lvs -a -o name,vg_name,attr,copy_p LV VG Attr Cpy%Sync Devices lv vg rwi-a-r-- 0.00 lv_rimage_0(0),lv_rimage_1(0),lv_rim [lv_rimage_0] vg Iwi-aor-- /dev/sda1(1) [lv_rimage_1] vg Iwi-aor-- /dev/sdb1(1) [lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1) [lv_rmeta_0] vg ewi-aor-- /dev/sda1(0) [lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0) [lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0) ** Note that only the last device that has been added should be marked 'I'. Here is an example of the correct output after this patch is applied: [root@bp-01 lvm2]# lvs -a -o name,vg_name,attr,copy_percent,devices vg LV VG Attr Cpy%Sync Devices lv vg rwi-a-r-- 100.00 lv_rimage_0(0),lv_rimage_1(0) [lv_rimage_0] vg iwi-aor-- /dev/sda1(1) [lv_rimage_1] vg iwi-aor-- /dev/sdb1(1) [lv_rmeta_0] vg ewi-aor-- /dev/sda1(0) [lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0) [root@bp-01 lvm2]# lvconvert -m +1 vg/lv; lvs -a -o name,vg_name,attr,copy_p LV VG Attr Cpy%Sync Devices lv vg rwi-a-r-- 0.00 lv_rimage_0(0),lv_rimage_1(0),lv_rim [lv_rimage_0] vg iwi-aor-- /dev/sda1(1) [lv_rimage_1] vg iwi-aor-- /dev/sdb1(1) [lv_rimage_2] vg Iwi-aor-- /dev/sdc1(1) [lv_rmeta_0] vg ewi-aor-- /dev/sda1(0) [lv_rmeta_1] vg ewi-aor-- /dev/sdb1(0) [lv_rmeta_2] vg ewi-aor-- /dev/sdc1(0) ** Note only the last image is marked with an 'I'. This is correct and we c tell that it isn't the whole array that is sync'ing, but just the new device. These following 3 commits are in LVM upstream version 2.02.99: PATCH3: commit 801d4f96a8a2333361d7292d9c79ffdb5a96fac3 Author: Jonathan Brassow <jbrassow> Date: Fri Feb 1 11:33:54 2013 -0600 PATCH2: commit 37ffe6a13ad56122abdc808c13af9eeb1adf6731 Author: Jonathan Brassow <jbrassow> Date: Fri Feb 1 11:32:18 2013 -0600 PATCH1: commit c8242e5cf4895f13e16b598b387c876c6fab7180 Author: Jonathan Brassow <jbrassow> Date: Fri Feb 1 11:31:47 2013 -0600 Commit improves the 'lvs' output even further: PATCH4 (see other 3 in comment 4): commit ff64e3500f6acf93dce017388445c4828111d06f Author: Jonathan Brassow <jbrassow> Date: Thu Apr 11 15:33:59 2013 -0500 RAID: Add scrubbing support for RAID LVs New options to 'lvchange' allow users to scrub their RAID LVs. Synopsis: lvchange --syncaction {check|repair} vg/raid_lv RAID scrubbing is the process of reading all the data and parity blocks in an array and checking to see whether they are coherent. 'lvchange' can now initaite the two scrubbing operations: "check" and "repair". "check" will go over the array and recored the number of discrepancies but not repair them. "repair" will correct the discrepancies as it finds them. 'lvchange --syncaction repair vg/raid_lv' is not to be confused with 'lvconvert --repair vg/raid_lv'. The former initiates a background synchronization operation on the array, while the latter is designed to repair/replace failed devices in a mirror or RAID logical volume. Additional reporting has been added for 'lvs' to support the new operations. Two new printable fields (which are not printed by default) have been added: "syncaction" and "mismatches". These can be accessed using the '-o' option to 'lvs', like: lvs -o +syncaction,mismatches vg/lv "syncaction" will print the current synchronization operation that the RAID volume is performing. It can be one of the following: - idle: All sync operations complete (doing nothing) - resync: Initializing an array or recovering after a machine failur - recover: Replacing a device in the array - check: Looking for array inconsistencies - repair: Looking for and repairing inconsistencies The "mismatches" field with print the number of descrepancies found during a check or repair operation. The 'Cpy%Sync' field already available to 'lvs' will print the progress of any of the above syncactions, including check and repair. Finally, the lv_attr field has changed to accomadate the scrubbing operation as well. The role of the 'p'artial character in the lv_attr report field as expanded. "Partial" is really an indicator for the health of a logical volume and it makes sense to extend this include other health indicators as well, specifically: 'm'ismatches: Indicates that there are discrepancies in a RAID LV. This character is shown after a scrubbing operation has detected that portions of the RAID are not coherent. 'r'efresh : Indicates that a device in a RAID array has suffered a failure and the kernel regards it as failed - even though LVM can read the device label and considers the device to be ok. The LV should be 'r'efreshed to notify the kernel that the device is now available, or the device should be 'r'eplaced if it is suspected of failing. The printable fields syncaction and mismatches are not present in lvs: lvs -o +mismatches vg/raid10 Unrecognised field: mismatches lvs -o +syncaction vg/raid10 Unrecognised field: syncaction Tested with version: lvm2-2.02.100-3.el6.x86_64 According to Comment 7, this should be present. Could you please clarify if this was maybe not included for some reason and should not be tested for? Try lvs -o help: copy_percent - For RAID, mirrors and pvmove, current percentage in -sync. sync_percent - For RAID, mirrors and pvmove, current percentage in -sync. raid_mismatch_count - For RAID, number of mismatches found or repaired. raid_sync_action - For RAID, the current synchronization action being performed. raid_write_behind - For RAID1, the number of outstanding writes allowed to writemostly devices. raid_min_recovery_rate - For RAID1, the minimum recovery I/O load in kiB/sec/disk. raid_max_recovery_rate - For RAID1, the maximum recovery I/O load in kiB/sec/disk. Ok cool, so this was changed compared to the instructions/remarks in Comment 7. Thanks for the pointer. Tested lvs output with raid1, raid4, raid5, raid6 and raid10. The I flag only shows on a devace which was replaced (as expected). here's the output from raid5 test: [root@virt-013 yum.repos.d]# lvs -a -o name,vg_name,attr,raid_sync_action,lv_size,copy_percent,devices /dev/sdc1: read failed after 0 of 1024 at 10733879296: Input/output error /dev/sdc1: read failed after 0 of 1024 at 10733948928: Input/output error /dev/sdc1: read failed after 0 of 1024 at 0: Input/output error /dev/sdc1: read failed after 0 of 1024 at 4096: Input/output error /dev/sdc1: read failed after 0 of 2048 at 0: Input/output error Couldn't find device with uuid 6w287X-9pmZ-hZGL-rQW3-pUY5-6O7j-zcdPrf. LV VG Attr SyncAction LSize Cpy%Sync Devices raid5 vg rwi-a-r-p- idle 2.00g 100.00 raid5_rimage_0(0),raid5_rimage_1(0),raid5_rimage_2(0),raid5_rimage_3(0) [raid5_rimage_0] vg iwi-aor--- 684.00m /dev/sda1(1) [raid5_rimage_1] vg iwi-aor--- 684.00m /dev/sdb1(1) [raid5_rimage_2] vg iwi-aor-p- 684.00m unknown device(1) [raid5_rimage_3] vg iwi-aor--- 684.00m /dev/sdd1(1) [raid5_rmeta_0] vg ewi-aor--- 4.00m /dev/sda1(0) [raid5_rmeta_1] vg ewi-aor--- 4.00m /dev/sdb1(0) [raid5_rmeta_2] vg ewi-aor-p- 4.00m unknown device(0) [raid5_rmeta_3] vg ewi-aor--- 4.00m /dev/sdd1(0) lv_root vg_virt013 -wi-ao---- 6.71g /dev/vda2(0) lv_swap vg_virt013 -wi-ao---- 816.00m /dev/vda2(1718) [root@virt-013 yum.repos.d]# dd if=/dev/urandom of=/dev/vg/raid5 bs=512 count=10240 [root@virt-013 yum.repos.d]# lvs -a -o name,vg_name,attr,raid_sync_action,lv_size,copy_percent,devices Couldn't find device with uuid 6w287X-9pmZ-hZGL-rQW3-pUY5-6O7j-zcdPrf. LV VG Attr SyncAction LSize Cpy%Sync Devices raid5 vg rwi-a-r--- recover 2.00g 60.62 raid5_rimage_0(0),raid5_rimage_1(0),raid5_rimage_2(0),raid5_rimage_3(0) [raid5_rimage_0] vg iwi-aor--- 684.00m /dev/sda1(1) [raid5_rimage_1] vg iwi-aor--- 684.00m /dev/sdb1(1) [raid5_rimage_2] vg Iwi-aor--- 684.00m /dev/sde1(1) [raid5_rimage_3] vg iwi-aor--- 684.00m /dev/sdd1(1) [raid5_rmeta_0] vg ewi-aor--- 4.00m /dev/sda1(0) [raid5_rmeta_1] vg ewi-aor--- 4.00m /dev/sdb1(0) [raid5_rmeta_2] vg ewi-aor--- 4.00m /dev/sde1(0) [raid5_rmeta_3] vg ewi-aor--- 4.00m /dev/sdd1(0) Only the addded device /dev/sde1 was shown as syncing (I). marking verified with: lvm2-2.02.100-4.el6.x86_64 Adding one more test just showing the behavior during removal and restoration of the same device: Removed /dev/sdd1 LV VG Attr SyncAction LSize Cpy%Sync Devices raid5 vg rwi-a-r-p- idle 2.00g 100.00 raid5_rimage_0(0),raid5_rimage_1(0),raid5_rimage_2(0),raid5_rimage_3(0) [raid5_rimage_0] vg iwi-aor--- 684.00m /dev/sda1(1) [raid5_rimage_1] vg iwi-aor--- 684.00m /dev/sdb1(1) [raid5_rimage_2] vg iwi-aor--- 684.00m /dev/sde1(1) [raid5_rimage_3] vg iwi-aor-p- 684.00m unknown device(1) [raid5_rmeta_0] vg ewi-aor--- 4.00m /dev/sda1(0) [raid5_rmeta_1] vg ewi-aor--- 4.00m /dev/sdb1(0) [raid5_rmeta_2] vg ewi-aor--- 4.00m /dev/sde1(0) [raid5_rmeta_3] vg ewi-aor-p- 4.00m unknown device(0) Wrote some data, and brought the device back: [root@virt-013 yum.repos.d]# lvs -a -o name,vg_name,attr,raid_sync_action,lv_size,copy_percent,devices Couldn't find device with uuid 6w287X-9pmZ-hZGL-rQW3-pUY5-6O7j-zcdPrf. LV VG Attr SyncAction LSize Cpy%Sync Devices raid5 vg rwi-a-r-r- idle 2.00g 100.00 raid5_rimage_0(0),raid5_rimage_1(0),raid5_rimage_2(0),raid5_rimage_3(0) [raid5_rimage_0] vg iwi-aor--- 684.00m /dev/sda1(1) [raid5_rimage_1] vg iwi-aor--- 684.00m /dev/sdb1(1) [raid5_rimage_2] vg iwi-aor--- 684.00m /dev/sde1(1) [raid5_rimage_3] vg iwi-aor-r- 684.00m /dev/sdd1(1) [raid5_rmeta_0] vg ewi-aor--- 4.00m /dev/sda1(0) [raid5_rmeta_1] vg ewi-aor--- 4.00m /dev/sdb1(0) [raid5_rmeta_2] vg ewi-aor--- 4.00m /dev/sde1(0) [raid5_rmeta_3] vg ewi-aor-r- 4.00m /dev/sdd1(0) The LV has 'r' on 9th bit, indicating that a refresh is needed. dmsetup sees the device as dead still: vg-raid5: 0 4202496 raid raid5_ls 4 AAAD 1400832/1400832 idle 0 [root@virt-013 yum.repos.d]# lvchange --refresh vg/raid5 [root@virt-013 yum.repos.d]# lvs -a -o name,vg_name,attr,raid_sync_action,raid_mismatch_count,copy_percent,devices LV VG Attr SyncAction Mismatches Cpy%Sync Devices raid5 vg rwi-a-r--- idle 0 100.00 raid5_rimage_0(0),raid5_rimage_1(0),raid5_rimage_2(0),raid5_rimage_3(0) [raid5_rimage_0] vg iwi-aor--- /dev/sda1(1) [raid5_rimage_1] vg iwi-aor--- /dev/sdb1(1) [raid5_rimage_2] vg iwi-aor--- /dev/sde1(1) [raid5_rimage_3] vg iwi-aor--- /dev/sdd1(1) [raid5_rmeta_0] vg ewi-aor--- /dev/sda1(0) [raid5_rmeta_1] vg ewi-aor--- /dev/sdb1(0) [raid5_rmeta_2] vg ewi-aor--- /dev/sde1(0) [raid5_rmeta_3] vg ewi-aor--- /dev/sdd1(0) lv_root vg_virt013 -wi-ao---- /dev/vda2(0) lv_swap vg_virt013 -wi-ao---- /dev/vda2(1718) [root@virt-013 yum.repos.d]# lvs -a -o name,vg_name,attr,raid_sync_action,raid_mismatch_count,copy_percent,devices LV VG Attr SyncAction Mismatches Cpy%Sync Devices raid5 vg rwi-a-r--- idle 0 100.00 raid5_rimage_0(0),raid5_rimage_1(0),raid5_rimage_2(0),raid5_rimage_3(0) [raid5_rimage_0] vg iwi-aor--- /dev/sda1(1) [raid5_rimage_1] vg iwi-aor--- /dev/sdb1(1) [raid5_rimage_2] vg iwi-aor--- /dev/sde1(1) [raid5_rimage_3] vg iwi-aor--- /dev/sdd1(1) [raid5_rmeta_0] vg ewi-aor--- /dev/sda1(0) [raid5_rmeta_1] vg ewi-aor--- /dev/sdb1(0) [raid5_rmeta_2] vg ewi-aor--- /dev/sde1(0) [raid5_rmeta_3] vg ewi-aor--- /dev/sdd1(0) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1704.html |