Bug 672816 - LVM2: lvs output is missing an existing lv
Summary: LVM2: lvs output is missing an existing lv
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2
Version: 5.6
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Milan Broz
QA Contact: Corey Marthaler
URL:
Whiteboard:
Depends On:
Blocks: 673615 673981
TreeView+ depends on / blocked
 
Reported: 2011-01-26 14:14 UTC by Avi Tal
Modified: 2016-04-26 14:43 UTC (History)
22 users (show)

Fixed In Version: lvm2-2.02.74-6.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 673615 (view as bug list)
Environment:
Last Closed: 2011-07-21 10:50:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
spm vdsm log (1.26 MB, application/gzip)
2011-01-26 14:14 UTC, Avi Tal
no flags Details
output of only lvs command (15.87 KB, text/plain)
2011-01-26 14:14 UTC, Avi Tal
no flags Details
output of lvs command followed by lvm name (254 bytes, text/plain)
2011-01-26 14:15 UTC, Avi Tal
no flags Details
no spm vdsm log (1.68 MB, application/gzip)
2011-01-26 14:17 UTC, Avi Tal
no flags Details
diff of direct read from disk and normal read (50.74 KB, image/jpeg)
2011-01-26 16:24 UTC, Milan Broz
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1071 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2011-07-21 10:50:01 UTC

Description Avi Tal 2011-01-26 14:14:13 UTC
Created attachment 475394 [details]
spm vdsm log

Description of problem:
lvs output is missing an existing lv

running the same lvs command but with the lv name will pring out the lv but running lvs with no following parameters will miss this specific lv

Steps to Reproduce:
1. run export vm 
2. vdsmd stop while the export process is running
  
system output:
[root@navy-vds3 ~]# lvs | grep 84df6830-decf-4f73-8d86-f07a8ec372fc
[root@navy-vds3 ~]#

but 

[root@navy-vds3 ~]# lvs f9d17796-5304-45dc-a3ac-f1f1c8a7d14a/84df6830-decf-4f73-8d86-f07a8ec372fc
  LV                                   VG                                   Attr   LSize Origin Snap%  Move Log Copy%  Convert
  84df6830-decf-4f73-8d86-f07a8ec372fc f9d17796-5304-45dc-a3ac-f1f1c8a7d14a -wi--- 1.00G

Comment 1 Avi Tal 2011-01-26 14:14:50 UTC
Created attachment 475395 [details]
output of only lvs command

Comment 2 Avi Tal 2011-01-26 14:15:30 UTC
Created attachment 475396 [details]
output of lvs command followed by lvm name

Comment 3 Avi Tal 2011-01-26 14:17:10 UTC
Created attachment 475397 [details]
no spm vdsm log

Comment 4 Milan Broz 2011-01-26 15:20:18 UTC
For some reason lvm reads different metadata in each command:

#lvmcmdline.c:1060         Processing: lvs --config 'devices { filter= [ "a/big01/", "r/.*/" ] }' -vvvv

Metadata version 111:

#format_text/format-text.c:525         Read f9d17796-5304-45dc-a3ac-f1f1c8a7d14a metadata (111) from /dev/mpath/1atal_data_big01 at 400384 size 2650
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 0:      0      4: metadata(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 1:      4     16: leases(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 2:     20      1: ids(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 3:     21      1: inbox(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 4:     22      1: outbox(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 5:     23      8: master(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 6:     31     27: NULL(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 7:     58     27: 6580ec0f-3bdc-4bbf-885c-9d91e510e88c(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 8:     85    114: NULL(0:0)

---
#lvmcmdline.c:1060         Processing: lvs --config 'devices { filter= [ "a/big01/", "r/.*/" ] }' f9d17796-5304-45dc-a3ac-f1f1c8a7d14a -vvvv

Metadata version 115:

#format_text/format-text.c:525         Read f9d17796-5304-45dc-a3ac-f1f1c8a7d14a metadata (115) from /dev/mpath/1atal_data_big01 at 412672 size 3025
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 0:      0      4: metadata(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 1:      4     16: leases(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 2:     20      1: ids(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 3:     21      1: inbox(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 4:     22      1: outbox(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 5:     23      8: master(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 6:     31      8: 84df6830-decf-4f73-8d86-f07a8ec372fc(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 7:     39     19: NULL(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 8:     58     27: 6580ec0f-3bdc-4bbf-885c-9d91e510e88c(0:0)
#metadata/pv_manip.c:296         /dev/mpath/1atal_data_big01 9:     85    114: NULL(0:0)

Comment 5 Milan Broz 2011-01-26 16:17:40 UTC
lvm seems to use direct for one operation and reads different data

 open("/dev/mpath/1atal_data_big01", O_RDONLY|O_NOATIME) = 5
 open("/dev/mpath/1atal_data_big01", O_RDONLY|O_DIRECT|O_NOATIME) = 5

so let's try this:

# dd if=/dev/mpath/1atal_data_big01 of=mpath2.img bs=1M count=1
# dd if=/dev/mpath/1atal_data_big01 of=mpath.img bs=1M count=1 iflag=direct
# diff mpath.img mpath2.img 
Binary files mpath.img and mpath2.img differ

Comment 6 Milan Broz 2011-01-26 16:24:19 UTC
Created attachment 475431 [details]
diff of direct read from disk and normal read

Comment 7 Mike Snitzer 2011-01-26 19:14:27 UTC
(In reply to comment #5) 
> so let's try this:
> 
> # dd if=/dev/mpath/1atal_data_big01 of=mpath2.img bs=1M count=1
> # dd if=/dev/mpath/1atal_data_big01 of=mpath.img bs=1M count=1 iflag=direct
> # diff mpath.img mpath2.img 
> Binary files mpath.img and mpath2.img differ

The following resolved the buffered vs directio inconsistency:
  echo 3 > /proc/sys/vm/drop_caches

NOTE: echo 1 > /proc/sys/vm/drop_caches should work too...

Both drop the page cache (1 confines the drop to the page cache, 3 drops inode dentry and page chaches).

So for some reason the associated pages in the page cache were _not_ invalidated by the directio access (either directio read or write should invalidate the page cache).

Comment 8 Mike Snitzer 2011-01-26 20:11:33 UTC
(In reply to comment #7)
> So for some reason the associated pages in the page cache were _not_
> invalidated by the directio access (either directio read or write should
> invalidate the page cache).

To clarify (from Jeff Moyer):

SO for this BZ the underlying concern is point "2)" in the following sequence:
1) buffered read
2) direct write w/o invalidate
3) buffered read shows stale data

Meaning, it's more important that the direct write invalidates the page cache than any direct read.

direct reads don't invalidate the page cache.
they write out the pages, to ensure that the read gets the most recent data.
</end snippets from irc with jeff moyer>

Though, if direct io read writes back any dirty pages in page cache; we'd see the stale page cache data get written to disk and there wouldn't be any inconsistency between the directio read and the buffered read.  So it would seem that the directio read saw no reason to write any pages back?


Another concern is: are we certain that this 'navy-vds3' system is _the_ system that performed the LVM operation that resulted metadata being written via directio write?

If it was _not_ then the LVM2 operation that mistakenly used a buffered read would always get stale data (because the kernel never had a reason to invalidate the page it already had for the associated region of disk).

Comment 9 Jeff Moyer 2011-01-26 20:40:10 UTC
(In reply to comment #8)

> Though, if direct io read writes back any dirty pages in page cache; we'd see
> the stale page cache data get written to disk and there wouldn't be any
> inconsistency between the directio read and the buffered read.  So it would
> seem that the directio read saw no reason to write any pages back?

Right.  Why would the page in the page cache be dirty?  The direct write won't dirty it.

Now, having said that, the code in RHEL 5 absolutely does invalidate page cache pages for the range written both before AND after the direct write.  I doubt this is your problem.

> Another concern is: are we certain that this 'navy-vds3' system is _the_ system
> that performed the LVM operation that resulted metadata being written via
> directio write?
> 
> If it was _not_ then the LVM2 operation that mistakenly used a buffered read
> would always get stale data (because the kernel never had a reason to
> invalidate the page it already had for the associated region of disk).

Right, and this sounds more likely to be the problem.

Comment 10 Mike Snitzer 2011-01-26 21:35:47 UTC
Setting needinfo based on questions at the bottom of comment#8

Comment 11 Alasdair Kergon 2011-01-27 00:20:04 UTC
(In reply to comment #5)
> lvm seems to use direct for one operation and reads different data
>  open("/dev/mpath/1atal_data_big01", O_RDONLY|O_NOATIME) = 5

filter.c:150
        if (!dev_open_flags(dev, O_RDONLY, 0, 1)) {
3rd parm should be 1

Speculating it only became a problem due to recent optimisations that reduced repeated reading, failing to recognise that some of the re-reads (now optimised out) were with O_DIRECT?

(But it should have been changed to 1 as soon as code was added to read data during the open anyway.)

Comment 12 Avi Tal 2011-01-27 08:54:27 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > So for some reason the associated pages in the page cache were _not_
> > invalidated by the directio access (either directio read or write should
> > invalidate the page cache).
> 
> To clarify (from Jeff Moyer):
> 
> SO for this BZ the underlying concern is point "2)" in the following sequence:
> 1) buffered read
> 2) direct write w/o invalidate
> 3) buffered read shows stale data
> 
> Meaning, it's more important that the direct write invalidates the page cache
> than any direct read.
> 
> direct reads don't invalidate the page cache.
> they write out the pages, to ensure that the read gets the most recent data.
> </end snippets from irc with jeff moyer>
> 
> Though, if direct io read writes back any dirty pages in page cache; we'd see
> the stale page cache data get written to disk and there wouldn't be any
> inconsistency between the directio read and the buffered read.  So it would
> seem that the directio read saw no reason to write any pages back?
> 
> 
> Another concern is: are we certain that this 'navy-vds3' system is _the_ system
> that performed the LVM operation that resulted metadata being written via
> directio write?
> 
> If it was _not_ then the LVM2 operation that mistakenly used a buffered read
> would always get stale data (because the kernel never had a reason to
> invalidate the page it already had for the associated region of disk).

I am not sure that what you meant but any ways, my scenario was running export VM when my SPM was navy-vds1,  than running some Failure scenario that caused switching of SPM to navy-vds3 that were supposed to rollback the vdsm operation.

the next time i tried to export the same VM to the same export domain, i got this bad lvm issue.

Comment 13 Milan Broz 2011-01-27 09:24:04 UTC
There is some path which stores pv label into cache using buffered read, probaly side effect of some other change.

Seems that RHEV environment can cause this quite ofter becase metadata changes are performet outside of the host.

Comment 14 Moran Goldboim 2011-01-27 10:09:14 UTC
this bug happened to me as well (on 4 of my servers).
2.6.18-238.el5
lvm2-2.02.74-5.el5
device-mapper-multipath-0.4.7-42.el5
vdsm22-4.5-63.14.el5_6

Comment 15 Milan Broz 2011-01-27 12:37:10 UTC
ok, patch seem sto be on-liner, already in upstream queue.

Comment 22 Milan Broz 2011-01-29 00:19:27 UTC
Fixed in lvm2-2.02.74-6.el5.

Comment 26 Ivan Makfinsky 2011-03-15 19:51:01 UTC
I'm working at a customer site that is experiencing the same issue:

lvs != lvdisplay output

(The following content has been edited for TV)

[root@ml1 ~]# lvs
  LV               VG          Attr     LSize
...
  lv_ml_gfs_1   vg_ml_gfs_1   -wi-ao     1.00T
  lv_ml_gfs_2   vg_ml_gfs_2   -wi-ao   800.00G < --------
  lv_ml_gfs_3   vg_ml_gfs_3   -wi-ao     1.00T
...
[root@ml1 ~]# lvs /dev/vg_ml_gfs_2/lv_ml_gfs_2
  LV               VG          Attr      LSize
  lv_ml_gfs_2   vg_ml_gfs_2   -wi-ao      1.00T <---------

Notice that "lvs" indicates 800G for lv_ml_gfs_2, while "lvs /dev/vg_ml_gfs_2/lv_ml_gfs_2" indicates 1T, the correct size.

+1 for the updated lvm2 rpm.

- Ivan

Comment 27 Alasdair Kergon 2011-03-15 20:05:25 UTC
Can you add '-o +vg_seqno' to that command to check if it differs too?
And (if the version of lvm you have there supports it) 'pvs -o+vg_mda_count,vg_mda_used_count,pv_mda_count,pv_mda_used_count vg_ml_gfs_2'

Comment 28 Milan Broz 2011-03-15 20:16:12 UTC
(In reply to comment #26)

> +1 for the updated lvm2 rpm.

Updated lvms are already released in zstream, see bug #673981:
use lvm2-2.02.74-5.el5_6.1	lvm2-cluster-2.02.74-3.el5_6.1

Please update packages and try again.
(This bug just covers minor update - 5.7)

Comment 29 Ivan Makfinsky 2011-03-16 15:58:38 UTC
The updated packages resolve the issue.

Thanks Milan and everyone else who helped!

- Ivan

Comment 30 Corey Marthaler 2011-04-29 18:47:34 UTC
Marking verified (Sanity Only) as QE was never able to reproduce this issue.

Comment 32 errata-xmlrpc 2011-07-21 10:50:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1071.html

Comment 33 errata-xmlrpc 2011-07-21 12:29:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1071.html


Note You need to log in before you can comment on or make changes to this bug.