1085553 – When lvmetad is used, LVs do not properly report as 'p'artial

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1085553 - When lvmetad is used, LVs do not properly report as 'p'artial

Summary: When lvmetad is used, LVs do not properly report as 'p'artial

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	6.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Petr Rockai
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1089170 1089369
TreeView+	depends on / blocked

Reported:	2014-04-08 22:18 UTC by Jonathan Earl Brassow
Modified:	2014-10-14 08:25 UTC (History)
CC List:	10 users (show)
Fixed In Version:	lvm2-2.02.108-1.el6
Doc Type:	Bug Fix
Doc Text:	Cause: Information about physical volume availability can be out of date when lvmetad is in use. Consequence: The status string in the output of the 'lvs' command for a RAID volume may be different in identical situations depending on whether lvmetad is used or not (indicating 'r'efresh instead of 'p'artial in the lvmetad case). Fix: The dmeventd volume monitoring daemon now updates physical volume information in lvmetad for devices participating in a RAID array that has encountered an error. Result: If dmeventd is active (which is recommended regardless of this issue), the lvs output is the same in both the lvmetad and non-lvmetad cases. When dmeventd is disabled, it is recommended to run an 'lvscan --cache' for faulty RAID arrays, to ensure up-to-date information in lvs output.
Clone Of:
Environment:
Last Closed:	2014-10-14 08:25:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2014:1387	0	normal	SHIPPED_LIVE	lvm2 bug fix and enhancement update	2014-10-14 01:39:47 UTC

Description Jonathan Earl Brassow 2014-04-08 22:18:58 UTC

When using lvmetad, if you fail a device in a RAID LV, it does not report as partial.

[root@bp-01 ~]# off.sh sdf
Turning off sdf
[root@bp-01 ~]# devices vg
  /dev/sdf1: read failed after 0 of 512 at 898381381632: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 898381488128: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 4096: Input/output error
  LV            Attr       Cpy%Sync Devices                                                    
  lv            rwi-a-r---   100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0),lv_rimage_3(0)
  [lv_rimage_0] iwi-aor---          /dev/sdb1(1)                                               
  [lv_rimage_0] iwi-aor---          /dev/sdf1(0)                                               
  [lv_rimage_1] iwi-aor---          /dev/sdc1(1)                                               
  [lv_rimage_1] iwi-aor---          /dev/sdg1(0)                                               
  [lv_rimage_2] iwi-aor---          /dev/sdd1(1)                                               
  [lv_rimage_2] iwi-aor---          /dev/sdh1(0)                                               
  [lv_rimage_3] iwi-aor---          /dev/sde1(1)                                               
  [lv_rimage_3] iwi-aor---          /dev/sdi1(0)                                               
  [lv_rmeta_0]  ewi-aor---          /dev/sdb1(0)                                               
  [lv_rmeta_1]  ewi-aor---          /dev/sdc1(0)                                               
  [lv_rmeta_2]  ewi-aor---          /dev/sdd1(0)                                               
  [lv_rmeta_3]  ewi-aor---          /dev/sde1(0)                                


If you perform some writes to the device, the kernel notices the problem and then the LVs are reported as 'r'eplace/'r'efresh.
[root@bp-01 ~]# dd if=/dev/zero of=/dev/vg/lv bs=4M count=10
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.478818 s, 87.6 MB/s
[root@bp-01 ~]# devices vg
  /dev/sdf1: read failed after 0 of 512 at 898381381632: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 898381488128: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 4096: Input/output error
  LV            Attr       Cpy%Sync Devices                                                    
  lv            rwi-a-r-r-   100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0),lv_rimage_3(0)
  [lv_rimage_0] iwi-aor-r-          /dev/sdb1(1)                                               
  [lv_rimage_0] iwi-aor-r-          /dev/sdf1(0)                                               
  [lv_rimage_1] iwi-aor---          /dev/sdc1(1)                                               
  [lv_rimage_1] iwi-aor---          /dev/sdg1(0)                                               
  [lv_rimage_2] iwi-aor---          /dev/sdd1(1)                                               
  [lv_rimage_2] iwi-aor---          /dev/sdh1(0)                                               
  [lv_rimage_3] iwi-aor---          /dev/sde1(1)                                               
  [lv_rimage_3] iwi-aor---          /dev/sdi1(0)                                               
  [lv_rmeta_0]  ewi-aor-r-          /dev/sdb1(0)                                               
  [lv_rmeta_1]  ewi-aor---          /dev/sdc1(0)                                               
  [lv_rmeta_2]  ewi-aor---          /dev/sdd1(0)                                               
  [lv_rmeta_3]  ewi-aor---          /dev/sde1(0)                                


This is not good, because it causes repair operations to fail.  They fail because if the device is seen by LVM, it assumes that the failed device has returned - thus, it only needs a refresh.  If the device can't be seen by LVM (and has a 'p'artial flag), then it will be replaced by repair.

The distinction is important.

Without lvmetad, the 'p'artial flag shows up correctly:
[root@bp-01 ~]# nano /etc/lvm/lvm.conf 
[root@bp-01 ~]# killall -9 lvmetad
[root@bp-01 ~]# devices vg
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  /dev/sdf1: read failed after 0 of 512 at 898381381632: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 898381488128: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 512 at 4096: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid ceqxK0-V1Pe-i640-BlKJ-bYWW-7iaH-nwmCfW.
  LV            Attr       Cpy%Sync Devices                                                    
  lv            rwi-a-r-p-   100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0),lv_rimage_3(0)
  [lv_rimage_0] iwi-aor-p-          /dev/sdb1(1)                                               
  [lv_rimage_0] iwi-aor-p-          unknown device(0)                                          
  [lv_rimage_1] iwi-aor---          /dev/sdc1(1)                                               
  [lv_rimage_1] iwi-aor---          /dev/sdg1(0)                                               
  [lv_rimage_2] iwi-aor---          /dev/sdd1(1)                                               
  [lv_rimage_2] iwi-aor---          /dev/sdh1(0)                                               
  [lv_rimage_3] iwi-aor---          /dev/sde1(1)                                               
  [lv_rimage_3] iwi-aor---          /dev/sdi1(0)                                               
  [lv_rmeta_0]  ewi-aor-r-          /dev/sdb1(0)                                               
  [lv_rmeta_1]  ewi-aor---          /dev/sdc1(0)                                               
  [lv_rmeta_2]  ewi-aor---          /dev/sdd1(0)                                               
  [lv_rmeta_3]  ewi-aor---          /dev/sde1(0)

Comment 3 Jonathan Earl Brassow 2014-04-08 22:25:47 UTC

This behavior could have something to do with the way I am killing the device.
# echo offline > /sys/block/$dev/device/state

I believe QA uses some other mechanism.  I don't know if that means this bug should be closed or if a customer would encounter a problem with a device in a similar state.  Has anyone tried pulling the plug on a devices to see what happens, rather than using software to emulate failures?

Comment 4 Peter Rajnoha 2014-04-09 07:12:07 UTC

(In reply to Jonathan Earl Brassow from comment #3)
> This behavior could have something to do with the way I am killing the
> device.
> # echo offline > /sys/block/$dev/device/state

The "echo offline" does not generate an event (and the device is gone just in half since the /sysfs content is still there). It's probably better to use "echo 1 > /sys/block/$dev/device/delete" which removes the device completely from the system with the REMOVE event generated.

You can still use the "echo offline", but then you always need to call "pvscan --cache $dev".

Comment 5 Peter Rajnoha 2014-04-09 07:44:02 UTC

(In reply to Peter Rajnoha from comment #4)
> You can still use the "echo offline", but then you always need to call
> "pvscan --cache $dev".

(...in which case we're not testing the whole thing with udev rules btw. So using the "echo 1 > ...device/delete" and then rescanning the scsi bus to make the device present is probably the correct way to test this completely with all events and mechanisms included.)

Comment 6 Jonathan Earl Brassow 2014-04-09 15:51:56 UTC

Ok, that makes sense.  However, is a udev event get generated when power or connectivity is lost to a drive in all cases?  I'm still curious whether a real failure event can look like "echo offline".  If we are sure that real failure events are all handled, then this bug can be closed.

Comment 7 Jonathan Earl Brassow 2014-04-10 16:03:51 UTC

Is there a case where power and connectivity are still available, but the drive throws errors for I/O?  Would that trigger a REMOVE event?

We may need to document the 'pvscan --cache $dev' step for users in those cases - or augment the RAID code to print something sensible or detect the problem.  For the RAID code, this may be as simple as running another check for kernel device status...

Comment 8 Petr Rockai 2014-04-10 16:39:59 UTC

If dmeventd is running and monitoring RAID devices and you do something to trigger a failure (echo offline and writing to the device should do), the status string in device mapper should reflect that the leg is offline. When that happens and dmeventd runs lvconvert --repair, the latter will notice the status and mark the LV as missing even if there was no REMOVE event. So in a production system, you should be covered even for "echo offline"-like events. If this doesn't happen, this might be a problem in lvconvert --repair not parsing raid1 status info correctly (this definitely used to work for old-style mirrors).

Comment 9 Peter Rajnoha 2014-04-22 11:37:45 UTC

Well, actually, it doesn't work quite well with lvmetad - bug #1089170, bug #1089369.

Comment 10 Jonathan Earl Brassow 2014-04-22 19:28:46 UTC

(In reply to Jonathan Earl Brassow from comment #0)
> This is not good, because it causes repair operations to fail.  They fail
> because if the device is seen by LVM, it assumes that the failed device has
> returned - thus, it only needs a refresh.  If the device can't be seen by
> LVM (and has a 'p'artial flag), then it will be replaced by repair.
> 

Bug 1089170 and bug 1089369 are now tested examples of how this is manifested.  We need a solution to this lvmetad problem.

The solution could be to cause a rescan if '--repair' or '--refresh' are used on the command line.  The device must be reread in order to determine if the device is dead or if it was a transient failure.  It is not enough to simply check the kernel status.

Comment 13 Peter Rajnoha 2014-04-23 07:15:34 UTC

So should dmeventd/lvconvert --repair run with lvmetad disabled then? The lvconvert --repair must see the IO error and without touching the device directly, I can't imagine how we can detect that...

Comment 15 Jonathan Earl Brassow 2014-05-02 04:41:26 UTC

(In reply to Peter Rajnoha from comment #13)
> So should dmeventd/lvconvert --repair run with lvmetad disabled then? The
> lvconvert --repair must see the IO error and without touching the device
> directly, I can't imagine how we can detect that..

Probably a good idea, but then 'lvs' would still be wrong - and it needs to be right for customers to take the appropriate action.  For example, on RAID if I saw a 'r'efresh flag, I would perform a 'lvchange --refresh vg/lv'.  OTOH, if I saw a 'p'artial flag, I would rather perform a 'lvconvert --repair vg/lv'.

'dmeventd' is notified when there is a write failure.  Perhaps there is a rescan command that could be run by dmeventd to inform lvmetad about the issue?

Comment 16 Petr Rockai 2014-05-05 11:05:55 UTC

I have checked the code, and lvconvert --repair for RAID does not take device status into account at all -- a big chunk of code is entirely missing. For old-style mirrors, we check the device mapper status of mirror LVs and feed that into --repair. So when the code was adapted to RAID, this was left out and needs to be added. This is not related to lvmetad in fact: if a device goes away, you write to the mirror and it comes back, lvconvert --repair will not work either. The only change with lvmetad is that this also happens if the device is still inaccessible at the time of lvconvert --repair.

The problem here is that lots of code is duplicated for raid that was there for mirrors, and a straightforward fix will make this even worse. So we are in for some refactoring of the status parsing code so it can work with both old-style mirrors and RAID. Most of the issues should go away then.

Comment 18 Petr Rockai 2014-07-28 15:23:46 UTC

This should be fixed upstream in 5dc6671bb550f4b480befee03d234373d08e188a, as long as dmeventd is in use. Non-dmeventd users need to issue lvscan --cache on the affected LV to update the partial/refresh flags on RAID LVs.

Comment 21 Nenad Peric 2014-08-07 17:04:36 UTC

Marking this one VERIFIED. 
Although this is not the same thing as a device which is still present in the system but gives out I/O errors, the echo 'offline' now works in such a way that LVM can see that an LV is partial and either replace the failed device or mark the LV as partial depending on the settings:

[root@tardis-01 ~]# echo offline >/sys/block/sdc/device/state

[root@tardis-01 ~]# lvs -a -o+devices
  /dev/sdc1: read failed after 0 of 512 at 16104947712: Input/output error
  /dev/sdc1: read failed after 0 of 512 at 16105054208: Input/output error
  /dev/sdc1: read failed after 0 of 512 at 0: Input/output error
  /dev/sdc1: read failed after 0 of 512 at 4096: Input/output error
  LV               VG          Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                            
  raid1            vg          rwi-a-r---   1.00g                                    100.00           raid1_rimage_0(0),raid1_rimage_1(0)
  [raid1_rimage_0] vg          iwi-aor---   1.00g                                                     /dev/sdb1(1)                       
  [raid1_rimage_1] vg          iwi-aor---   1.00g                                                     /dev/sdc1(1)                       
  [raid1_rmeta_0]  vg          ewi-aor---   4.00m                                                     /dev/sdb1(0)                       
  [raid1_rmeta_1]  vg          ewi-aor---   4.00m                                                     /dev/sdc1(0)                       
  lv_home          vg_tardis01 -wi-ao---- 224.88g                                                     /dev/sda2(12800)                   
  lv_root          vg_tardis01 -wi-ao----  50.00g                                                     /dev/sda2(0)                       
  lv_swap          vg_tardis01 -wi-ao----   4.00g                                                     /dev/sda2(70368)        

[root@tardis-01 ~]# dd if=/dev/zero of=/dev/vg/raid1 count=10
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.0295471 s, 173 kB/s

[root@tardis-01 ~]# lvs -a -o+devices
  PV TzPlnL-QIfn-5TAn-PPHH-2eDs-OGzq-HYbGtq not recognised. Is the device missing?
  PV TzPlnL-QIfn-5TAn-PPHH-2eDs-OGzq-HYbGtq not recognised. Is the device missing?
  LV               VG          Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                            
  raid1            vg          rwi-a-r-p-   1.00g                                    100.00           raid1_rimage_0(0),raid1_rimage_1(0)
  [raid1_rimage_0] vg          iwi-aor---   1.00g                                                     /dev/sdb1(1)                       
  [raid1_rimage_1] vg          iwi-aor-p-   1.00g                                                     unknown device(1)                  
  [raid1_rmeta_0]  vg          ewi-aor---   4.00m                                                     /dev/sdb1(0)                       
  [raid1_rmeta_1]  vg          ewi-aor-p-   4.00m                                                     unknown device(0)                  
  lv_home          vg_tardis01 -wi-ao---- 224.88g                                                     /dev/sda2(12800)                   
  lv_root          vg_tardis01 -wi-ao----  50.00g                                                     /dev/sda2(0)                       
  lv_swap          vg_tardis01 -wi-ao----   4.00g                                                     /dev/sda2(70368)                   
[root@tardis-01 ~]# 


with:

kernel 2.6.32-495.el6.x86_64

lvm2-2.02.109-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
lvm2-libs-2.02.109-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
lvm2-cluster-2.02.109-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
udev-147-2.57.el6    BUILT: Thu Jul 24 15:48:47 CEST 2014
device-mapper-1.02.88-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
device-mapper-libs-1.02.88-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
device-mapper-event-1.02.88-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
device-mapper-event-libs-1.02.88-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014
device-mapper-persistent-data-0.3.2-1.el6    BUILT: Fri Apr  4 15:43:06 CEST 2014
cmirror-2.02.109-1.el6    BUILT: Tue Aug  5 17:36:23 CEST 2014

Comment 22 errata-xmlrpc 2014-10-14 08:25:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1387.html

Note You need to log in before you can comment on or make changes to this bug.