Bug 788096

Summary:	LVM: getting false report about faulty lun from pvs
Product:	Red Hat Enterprise Linux 6	Reporter:	Dafna Ron <dron>
Component:	lvm2	Assignee:	LVM and device-mapper development team <lvm-team>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Cluster QE <mspqa-list>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.2	CC:	abaron, agk, dwysocha, ewarszaw, hateya, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac
Target Milestone:	rc	Keywords:	Reopened
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-04-24 20:54:01 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	798635, 896505

Description Dafna Ron 2012-02-07 13:31:54 UTC

Description of problem:

I have a lun which is shown as added to vg and seems to be completely fine (I can log into it) but when running pvs on the specific target it is shown as not part of the vg. 

This was reproduced with both fedora and rhel6.2 with the same lun. 
lun is located on Red Hat Enterprise Linux Server release 6.2


Version-Release number of selected component (if applicable):

storage (rhel6.2)

[root@orion ~]# rpm -qa |grep lvm
lvm2-libs-2.02.87-6.el6.x86_64
lvm2-2.02.87-6.el6.x86_64

host is fedora and also reproduced on rhel6.2 host
fedora lvm is: 
[root@blond-vdsf ~]# rpm -qa |grep lvm
llvm-libs-2.9-4.fc16.x86_64
lvm2-libs-2.02.86-5.fc16.x86_64
lvm2-2.02.86-5.fc16.x86_64

How reproducible:

not sure what happened to this lun - I am keeping it in case you wish to take a closer look

Actual results:

pvs will show lun as part of vg
pvs on specific lun will show that it is not part of vg

Expected results:

we should have consistent report

Additional info: 

lun is Dafna-02 and I am logged in to it via iscsi:

[root@blond-vdsf ~]# iscsiadm -m session
tcp: [154] 10.35.64.10:3260,1 Dafna-01
tcp: [155] 10.35.64.10:3260,1 Dafna-02
tcp: [156] 10.35.64.10:3260,1 Dafna-03

root@blond-vdsf ~]# vgs
  VG                                   #PV #LV #SN Attr   VSize  VFree 
  5938a153-5eae-42cf-8089-68a89f1aa2fd   3   6   0 wz--n- 58.88g 55.00g
  vg0                                    1   3   0 wz--n- 74.01g     0 
[root@blond-vdsf ~]# pvs -o pv_name,vg_name
  PV                                                    VG                                  
  /dev/mapper/1ATA_WDC_WD800JD-75MSA3_WD-WMAM9DEH4849p3 vg0                                 
  /dev/mapper/1Dafna-011328104                          5938a153-5eae-42cf-8089-68a89f1aa2fd
  /dev/mapper/1Dafna-021328104                          5938a153-5eae-42cf-8089-68a89f1aa2fd
  /dev/mapper/1Dafna-031328104                          5938a153-5eae-42cf-8089-68a89f1aa2fd
[root@blond-vdsf ~]# pvs -o pv_name,vg_name /dev/mapper/1Dafna-021328104 /dev/mapper/1Dafna-011328104 /dev/mapper/1Dafna-031328104
  PV                           VG                                  
  /dev/mapper/1Dafna-011328104 5938a153-5eae-42cf-8089-68a89f1aa2fd
  /dev/mapper/1Dafna-021328104                                     
  /dev/mapper/1Dafna-031328104 5938a153-5eae-42cf-8089-68a89f1aa2fd
[root@blond-vdsf ~]#

Comment 1 Peter Rajnoha 2012-02-09 11:20:48 UTC

(In reply to comment #0)
> [root@blond-vdsf ~]# pvs -o pv_name,vg_name
>   PV                                                    VG                      
>   /dev/mapper/1ATA_WDC_WD800JD-75MSA3_WD-WMAM9DEH4849p3 vg0                     
>   /dev/mapper/1Dafna-011328104                         
> 5938a153-5eae-42cf-8089-68a89f1aa2fd
>   /dev/mapper/1Dafna-021328104                         
> 5938a153-5eae-42cf-8089-68a89f1aa2fd
>   /dev/mapper/1Dafna-031328104                         
> 5938a153-5eae-42cf-8089-68a89f1aa2fd

> [root@blond-vdsf ~]# pvs -o pv_name,vg_name /dev/mapper/1Dafna-021328104
> /dev/mapper/1Dafna-011328104 /dev/mapper/1Dafna-031328104
>   PV                           VG                                  
>   /dev/mapper/1Dafna-011328104 5938a153-5eae-42cf-8089-68a89f1aa2fd
>   /dev/mapper/1Dafna-021328104                                     
>   /dev/mapper/1Dafna-031328104 5938a153-5eae-42cf-8089-68a89f1aa2fd

Hmm, interesting. Could you please rerun those pvs commands with the "-vvvv", grab the debug output it produces and attach it here. Also, attach the output of "lsblk" as well. Thanks.

Comment 2 Dafna Ron 2012-02-09 12:43:37 UTC

unfortunately its not reproduced any more. 
after I remove a large number of lvs from my storage with -f option (not these ones) I got an error on map locks: 

  Internal error: Maps lock 15212544 < unlock 15974400

and since then the lun is shown correctly. 

so it might have been a lock issue on storage side 
storage is also rhel with package: 
lvm2-2.02.87-6.el6.x86_64

If there is anything I can get you from storage side I would be happy but the pv's are now shown correctly: 


[root@blond-vdsh tmp]# pvs -o pv_name,vg_name /dev/mapper/1Dafna-011328104 /dev/mapper/1Dafna-021328104 /dev/mapper/1Dafna-031328104
  PV                           VG                                  
  /dev/mapper/1Dafna-011328104 6c256292-6674-48f6-9742-39442e3d89cb
  /dev/mapper/1Dafna-021328104 6c256292-6674-48f6-9742-39442e3d89cb
  /dev/mapper/1Dafna-031328104 6c256292-6674-48f6-9742-39442e3d89cb

Comment 3 Zdenek Kabelac 2012-02-09 13:13:28 UTC

Could you be more specific with 'large' number ?

It might be you need to set bigger preallocated memory in /etc/lvm/lvm.conf file.
Look for  "reserved_memory" and set it to higher number (i.e. 32768 instead of 8192).

In future it might be more automatic, but for now user needs to give some memory hints to application if there is really a huge set of lvs.

Comment 4 Dafna Ron 2012-02-09 13:22:26 UTC

(In reply to comment #3)
> Could you be more specific with 'large' number ?
> 
100 lv's (small one's, 1G each - created for specific test). 

> It might be you need to set bigger preallocated memory in /etc/lvm/lvm.conf
> file.
> Look for  "reserved_memory" and set it to higher number (i.e. 32768 instead of
> 8192).
> 
> In future it might be more automatic, but for now user needs to give some
> memory hints to application if there is really a huge set of lvs.

Comment 5 Zdenek Kabelac 2012-02-09 13:45:23 UTC

Ok - so ~700KB looks like VG write happened during some lvremove operation and I assume this has been addressed by this patch in version 2.02.89.

https://www.redhat.com/archives/lvm-devel/2011-November/msg00126.html

When there is larger mda size - it may easily go over the preallocated size,
even though in this case, the memory does not needed to be guarded at this moment.
In your case, specifying larger preallocated size would be probably fix the issue as well, with the version 2.02.89 your current size should be still good enough value.

Comment 7 Alasdair Kergon 2012-02-09 18:35:49 UTC

The lock/unlock message is a side-issue.

For the original problem, please attach the output of 'pvs -o all -vvvv' from that system.  If that doesn't give us enough clues, then we'll close this as irreproducible.

(BTW You should use sosreport or lvmdump in future to capture the state of a
system for diagnosis.)

Comment 9 Dafna Ron 2012-02-12 08:32:22 UTC

(In reply to comment #7)
> The lock/unlock message is a side-issue.
> 
> For the original problem, please attach the output of 'pvs -o all -vvvv' from
> that system.  If that doesn't give us enough clues, then we'll close this as
> irreproducible.

I cannot attach anything since the issue is no longer reproduced since I deleted the multiple lv's. 
moved bug to closed and if I manage to reproduce I will reopen it with all attached 

> 
> (BTW You should use sosreport or lvmdump in future to capture the state of a
> system for diagnosis.)

Comment 10 Alasdair Kergon 2012-02-12 14:58:36 UTC

OK.  Well the original symptoms made it look like an internal lvm problem related to device/metadata caching.  We'll see whether anyone else encounters similar problems, but we are updating this code again in 6.3.

Comment 11 Dafna Ron 2012-02-29 14:23:47 UTC

we have encountered this issue again.
seems like a lvm cash issue. 

please see depended bug for vdsm which causes some of our hosts to not see a vg created out of multiple targets hence the targets will not be connected. 

https://bugzilla.redhat.com/show_bug.cgi?id=798635


please contact us for reproduction. 

we have a script on our vds hosts that reproduces the issue.

Comment 12 Milan Broz 2012-03-30 11:41:38 UTC

Please attach "lvmdump -m" and output of "pvs -o all -vvvv".

Comment 13 Alasdair Kergon 2012-04-24 20:54:01 UTC

Closing.

Reopen this if it happens again and you succeed in obtaining diagnostics.