Summary: | When lvmetad is used, LVs do not properly report as 'p'artial | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jonathan Earl Brassow <jbrassow> |
Component: | lvm2 | Assignee: | Petr Rockai <prockai> |
lvm2 sub component: | LVM Metadata / lvmetad (RHEL6) | QA Contact: | Cluster QE <mspqa-list> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | agk, cmarthal, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, slevine, zkabelac |
Version: | 6.5 | ||
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.02.108-1.el6 | Doc Type: | Bug Fix |
Doc Text: |
Cause: Information about physical volume availability can be out of date when lvmetad is in use.
Consequence: The status string in the output of the 'lvs' command for a RAID volume may be different in identical situations depending on whether lvmetad is used or not (indicating 'r'efresh instead of 'p'artial in the lvmetad case).
Fix: The dmeventd volume monitoring daemon now updates physical volume information in lvmetad for devices participating in a RAID array that has encountered an error.
Result: If dmeventd is active (which is recommended regardless of this issue), the lvs output is the same in both the lvmetad and non-lvmetad cases. When dmeventd is disabled, it is recommended to run an 'lvscan --cache' for faulty RAID arrays, to ensure up-to-date information in lvs output.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2014-10-14 08:25:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Bug Depends On: | |||
Bug Blocks: | 1089170, 1089369 |
Description
Jonathan Earl Brassow
2014-04-08 22:18:58 UTC
This behavior could have something to do with the way I am killing the device. # echo offline > /sys/block/$dev/device/state I believe QA uses some other mechanism. I don't know if that means this bug should be closed or if a customer would encounter a problem with a device in a similar state. Has anyone tried pulling the plug on a devices to see what happens, rather than using software to emulate failures? (In reply to Jonathan Earl Brassow from comment #3) > This behavior could have something to do with the way I am killing the > device. > # echo offline > /sys/block/$dev/device/state The "echo offline" does not generate an event (and the device is gone just in half since the /sysfs content is still there). It's probably better to use "echo 1 > /sys/block/$dev/device/delete" which removes the device completely from the system with the REMOVE event generated. You can still use the "echo offline", but then you always need to call "pvscan --cache $dev". (In reply to Peter Rajnoha from comment #4) > You can still use the "echo offline", but then you always need to call > "pvscan --cache $dev". (...in which case we're not testing the whole thing with udev rules btw. So using the "echo 1 > ...device/delete" and then rescanning the scsi bus to make the device present is probably the correct way to test this completely with all events and mechanisms included.) Ok, that makes sense. However, is a udev event get generated when power or connectivity is lost to a drive in all cases? I'm still curious whether a real failure event can look like "echo offline". If we are sure that real failure events are all handled, then this bug can be closed. Is there a case where power and connectivity are still available, but the drive throws errors for I/O? Would that trigger a REMOVE event? We may need to document the 'pvscan --cache $dev' step for users in those cases - or augment the RAID code to print something sensible or detect the problem. For the RAID code, this may be as simple as running another check for kernel device status... If dmeventd is running and monitoring RAID devices and you do something to trigger a failure (echo offline and writing to the device should do), the status string in device mapper should reflect that the leg is offline. When that happens and dmeventd runs lvconvert --repair, the latter will notice the status and mark the LV as missing even if there was no REMOVE event. So in a production system, you should be covered even for "echo offline"-like events. If this doesn't happen, this might be a problem in lvconvert --repair not parsing raid1 status info correctly (this definitely used to work for old-style mirrors). Well, actually, it doesn't work quite well with lvmetad - bug #1089170, bug #1089369. (In reply to Jonathan Earl Brassow from comment #0) > This is not good, because it causes repair operations to fail. They fail > because if the device is seen by LVM, it assumes that the failed device has > returned - thus, it only needs a refresh. If the device can't be seen by > LVM (and has a 'p'artial flag), then it will be replaced by repair. > Bug 1089170 and bug 1089369 are now tested examples of how this is manifested. We need a solution to this lvmetad problem. The solution could be to cause a rescan if '--repair' or '--refresh' are used on the command line. The device must be reread in order to determine if the device is dead or if it was a transient failure. It is not enough to simply check the kernel status. So should dmeventd/lvconvert --repair run with lvmetad disabled then? The lvconvert --repair must see the IO error and without touching the device directly, I can't imagine how we can detect that... (In reply to Peter Rajnoha from comment #13) > So should dmeventd/lvconvert --repair run with lvmetad disabled then? The > lvconvert --repair must see the IO error and without touching the device > directly, I can't imagine how we can detect that.. Probably a good idea, but then 'lvs' would still be wrong - and it needs to be right for customers to take the appropriate action. For example, on RAID if I saw a 'r'efresh flag, I would perform a 'lvchange --refresh vg/lv'. OTOH, if I saw a 'p'artial flag, I would rather perform a 'lvconvert --repair vg/lv'. 'dmeventd' is notified when there is a write failure. Perhaps there is a rescan command that could be run by dmeventd to inform lvmetad about the issue? I have checked the code, and lvconvert --repair for RAID does not take device status into account at all -- a big chunk of code is entirely missing. For old-style mirrors, we check the device mapper status of mirror LVs and feed that into --repair. So when the code was adapted to RAID, this was left out and needs to be added. This is not related to lvmetad in fact: if a device goes away, you write to the mirror and it comes back, lvconvert --repair will not work either. The only change with lvmetad is that this also happens if the device is still inaccessible at the time of lvconvert --repair. The problem here is that lots of code is duplicated for raid that was there for mirrors, and a straightforward fix will make this even worse. So we are in for some refactoring of the status parsing code so it can work with both old-style mirrors and RAID. Most of the issues should go away then. This should be fixed upstream in 5dc6671bb550f4b480befee03d234373d08e188a, as long as dmeventd is in use. Non-dmeventd users need to issue lvscan --cache on the affected LV to update the partial/refresh flags on RAID LVs. Marking this one VERIFIED. Although this is not the same thing as a device which is still present in the system but gives out I/O errors, the echo 'offline' now works in such a way that LVM can see that an LV is partial and either replace the failed device or mark the LV as partial depending on the settings: [root@tardis-01 ~]# echo offline >/sys/block/sdc/device/state [root@tardis-01 ~]# lvs -a -o+devices /dev/sdc1: read failed after 0 of 512 at 16104947712: Input/output error /dev/sdc1: read failed after 0 of 512 at 16105054208: Input/output error /dev/sdc1: read failed after 0 of 512 at 0: Input/output error /dev/sdc1: read failed after 0 of 512 at 4096: Input/output error LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices raid1 vg rwi-a-r--- 1.00g 100.00 raid1_rimage_0(0),raid1_rimage_1(0) [raid1_rimage_0] vg iwi-aor--- 1.00g /dev/sdb1(1) [raid1_rimage_1] vg iwi-aor--- 1.00g /dev/sdc1(1) [raid1_rmeta_0] vg ewi-aor--- 4.00m /dev/sdb1(0) [raid1_rmeta_1] vg ewi-aor--- 4.00m /dev/sdc1(0) lv_home vg_tardis01 -wi-ao---- 224.88g /dev/sda2(12800) lv_root vg_tardis01 -wi-ao---- 50.00g /dev/sda2(0) lv_swap vg_tardis01 -wi-ao---- 4.00g /dev/sda2(70368) [root@tardis-01 ~]# dd if=/dev/zero of=/dev/vg/raid1 count=10 10+0 records in 10+0 records out 5120 bytes (5.1 kB) copied, 0.0295471 s, 173 kB/s [root@tardis-01 ~]# lvs -a -o+devices PV TzPlnL-QIfn-5TAn-PPHH-2eDs-OGzq-HYbGtq not recognised. Is the device missing? PV TzPlnL-QIfn-5TAn-PPHH-2eDs-OGzq-HYbGtq not recognised. Is the device missing? LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices raid1 vg rwi-a-r-p- 1.00g 100.00 raid1_rimage_0(0),raid1_rimage_1(0) [raid1_rimage_0] vg iwi-aor--- 1.00g /dev/sdb1(1) [raid1_rimage_1] vg iwi-aor-p- 1.00g unknown device(1) [raid1_rmeta_0] vg ewi-aor--- 4.00m /dev/sdb1(0) [raid1_rmeta_1] vg ewi-aor-p- 4.00m unknown device(0) lv_home vg_tardis01 -wi-ao---- 224.88g /dev/sda2(12800) lv_root vg_tardis01 -wi-ao---- 50.00g /dev/sda2(0) lv_swap vg_tardis01 -wi-ao---- 4.00g /dev/sda2(70368) [root@tardis-01 ~]# with: kernel 2.6.32-495.el6.x86_64 lvm2-2.02.109-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 lvm2-libs-2.02.109-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 lvm2-cluster-2.02.109-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 udev-147-2.57.el6 BUILT: Thu Jul 24 15:48:47 CEST 2014 device-mapper-1.02.88-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 device-mapper-libs-1.02.88-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 device-mapper-event-1.02.88-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 device-mapper-event-libs-1.02.88-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 device-mapper-persistent-data-0.3.2-1.el6 BUILT: Fri Apr 4 15:43:06 CEST 2014 cmirror-2.02.109-1.el6 BUILT: Tue Aug 5 17:36:23 CEST 2014 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1387.html |