788096 – LVM: getting false report about faulty lun from pvs

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 788096 - LVM: getting false report about faulty lun from pvs

Summary: LVM: getting false report about faulty lun from pvs

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	6.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	LVM and device-mapper development team
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	798635 896505
TreeView+	depends on / blocked

Reported:	2012-02-07 13:31 UTC by Dafna Ron
Modified:	2013-01-17 13:17 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-04-24 20:54:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Dafna Ron 2012-02-07 13:31:54 UTC

Description of problem:

I have a lun which is shown as added to vg and seems to be completely fine (I can log into it) but when running pvs on the specific target it is shown as not part of the vg. 

This was reproduced with both fedora and rhel6.2 with the same lun. 
lun is located on Red Hat Enterprise Linux Server release 6.2


Version-Release number of selected component (if applicable):

storage (rhel6.2)

[root@orion ~]# rpm -qa |grep lvm
lvm2-libs-2.02.87-6.el6.x86_64
lvm2-2.02.87-6.el6.x86_64

host is fedora and also reproduced on rhel6.2 host
fedora lvm is: 
[root@blond-vdsf ~]# rpm -qa |grep lvm
llvm-libs-2.9-4.fc16.x86_64
lvm2-libs-2.02.86-5.fc16.x86_64
lvm2-2.02.86-5.fc16.x86_64

How reproducible:

not sure what happened to this lun - I am keeping it in case you wish to take a closer look

Actual results:

pvs will show lun as part of vg
pvs on specific lun will show that it is not part of vg

Expected results:

we should have consistent report

Additional info: 

lun is Dafna-02 and I am logged in to it via iscsi:

[root@blond-vdsf ~]# iscsiadm -m session
tcp: [154] 10.35.64.10:3260,1 Dafna-01
tcp: [155] 10.35.64.10:3260,1 Dafna-02
tcp: [156] 10.35.64.10:3260,1 Dafna-03

root@blond-vdsf ~]# vgs
  VG                                   #PV #LV #SN Attr   VSize  VFree 
  5938a153-5eae-42cf-8089-68a89f1aa2fd   3   6   0 wz--n- 58.88g 55.00g
  vg0                                    1   3   0 wz--n- 74.01g     0 
[root@blond-vdsf ~]# pvs -o pv_name,vg_name
  PV                                                    VG                                  
  /dev/mapper/1ATA_WDC_WD800JD-75MSA3_WD-WMAM9DEH4849p3 vg0                                 
  /dev/mapper/1Dafna-011328104                          5938a153-5eae-42cf-8089-68a89f1aa2fd
  /dev/mapper/1Dafna-021328104                          5938a153-5eae-42cf-8089-68a89f1aa2fd
  /dev/mapper/1Dafna-031328104                          5938a153-5eae-42cf-8089-68a89f1aa2fd
[root@blond-vdsf ~]# pvs -o pv_name,vg_name /dev/mapper/1Dafna-021328104 /dev/mapper/1Dafna-011328104 /dev/mapper/1Dafna-031328104
  PV                           VG                                  
  /dev/mapper/1Dafna-011328104 5938a153-5eae-42cf-8089-68a89f1aa2fd
  /dev/mapper/1Dafna-021328104                                     
  /dev/mapper/1Dafna-031328104 5938a153-5eae-42cf-8089-68a89f1aa2fd
[root@blond-vdsf ~]#

Comment 1 Peter Rajnoha 2012-02-09 11:20:48 UTC

(In reply to comment #0)
> [root@blond-vdsf ~]# pvs -o pv_name,vg_name
>   PV                                                    VG                      
>   /dev/mapper/1ATA_WDC_WD800JD-75MSA3_WD-WMAM9DEH4849p3 vg0                     
>   /dev/mapper/1Dafna-011328104                         
> 5938a153-5eae-42cf-8089-68a89f1aa2fd
>   /dev/mapper/1Dafna-021328104                         
> 5938a153-5eae-42cf-8089-68a89f1aa2fd
>   /dev/mapper/1Dafna-031328104                         
> 5938a153-5eae-42cf-8089-68a89f1aa2fd

> [root@blond-vdsf ~]# pvs -o pv_name,vg_name /dev/mapper/1Dafna-021328104
> /dev/mapper/1Dafna-011328104 /dev/mapper/1Dafna-031328104
>   PV                           VG                                  
>   /dev/mapper/1Dafna-011328104 5938a153-5eae-42cf-8089-68a89f1aa2fd
>   /dev/mapper/1Dafna-021328104                                     
>   /dev/mapper/1Dafna-031328104 5938a153-5eae-42cf-8089-68a89f1aa2fd

Hmm, interesting. Could you please rerun those pvs commands with the "-vvvv", grab the debug output it produces and attach it here. Also, attach the output of "lsblk" as well. Thanks.

Comment 2 Dafna Ron 2012-02-09 12:43:37 UTC

unfortunately its not reproduced any more. 
after I remove a large number of lvs from my storage with -f option (not these ones) I got an error on map locks: 

  Internal error: Maps lock 15212544 < unlock 15974400

and since then the lun is shown correctly. 

so it might have been a lock issue on storage side 
storage is also rhel with package: 
lvm2-2.02.87-6.el6.x86_64

If there is anything I can get you from storage side I would be happy but the pv's are now shown correctly: 


[root@blond-vdsh tmp]# pvs -o pv_name,vg_name /dev/mapper/1Dafna-011328104 /dev/mapper/1Dafna-021328104 /dev/mapper/1Dafna-031328104
  PV                           VG                                  
  /dev/mapper/1Dafna-011328104 6c256292-6674-48f6-9742-39442e3d89cb
  /dev/mapper/1Dafna-021328104 6c256292-6674-48f6-9742-39442e3d89cb
  /dev/mapper/1Dafna-031328104 6c256292-6674-48f6-9742-39442e3d89cb

Comment 3 Zdenek Kabelac 2012-02-09 13:13:28 UTC

Could you be more specific with 'large' number ?

It might be you need to set bigger preallocated memory in /etc/lvm/lvm.conf file.
Look for  "reserved_memory" and set it to higher number (i.e. 32768 instead of 8192).

In future it might be more automatic, but for now user needs to give some memory hints to application if there is really a huge set of lvs.

Comment 4 Dafna Ron 2012-02-09 13:22:26 UTC

(In reply to comment #3)
> Could you be more specific with 'large' number ?
> 
100 lv's (small one's, 1G each - created for specific test). 

> It might be you need to set bigger preallocated memory in /etc/lvm/lvm.conf
> file.
> Look for  "reserved_memory" and set it to higher number (i.e. 32768 instead of
> 8192).
> 
> In future it might be more automatic, but for now user needs to give some
> memory hints to application if there is really a huge set of lvs.

Comment 5 Zdenek Kabelac 2012-02-09 13:45:23 UTC

Ok - so ~700KB looks like VG write happened during some lvremove operation and I assume this has been addressed by this patch in version 2.02.89.

https://www.redhat.com/archives/lvm-devel/2011-November/msg00126.html

When there is larger mda size - it may easily go over the preallocated size,
even though in this case, the memory does not needed to be guarded at this moment.
In your case, specifying larger preallocated size would be probably fix the issue as well, with the version 2.02.89 your current size should be still good enough value.

Comment 7 Alasdair Kergon 2012-02-09 18:35:49 UTC

The lock/unlock message is a side-issue.

For the original problem, please attach the output of 'pvs -o all -vvvv' from that system.  If that doesn't give us enough clues, then we'll close this as irreproducible.

(BTW You should use sosreport or lvmdump in future to capture the state of a
system for diagnosis.)

Comment 9 Dafna Ron 2012-02-12 08:32:22 UTC

(In reply to comment #7)
> The lock/unlock message is a side-issue.
> 
> For the original problem, please attach the output of 'pvs -o all -vvvv' from
> that system.  If that doesn't give us enough clues, then we'll close this as
> irreproducible.

I cannot attach anything since the issue is no longer reproduced since I deleted the multiple lv's. 
moved bug to closed and if I manage to reproduce I will reopen it with all attached 

> 
> (BTW You should use sosreport or lvmdump in future to capture the state of a
> system for diagnosis.)

Comment 10 Alasdair Kergon 2012-02-12 14:58:36 UTC

OK.  Well the original symptoms made it look like an internal lvm problem related to device/metadata caching.  We'll see whether anyone else encounters similar problems, but we are updating this code again in 6.3.

Comment 11 Dafna Ron 2012-02-29 14:23:47 UTC

we have encountered this issue again.
seems like a lvm cash issue. 

please see depended bug for vdsm which causes some of our hosts to not see a vg created out of multiple targets hence the targets will not be connected. 

https://bugzilla.redhat.com/show_bug.cgi?id=798635


please contact us for reproduction. 

we have a script on our vds hosts that reproduces the issue.

Comment 12 Milan Broz 2012-03-30 11:41:38 UTC

Please attach "lvmdump -m" and output of "pvs -o all -vvvv".

Comment 13 Alasdair Kergon 2012-04-24 20:54:01 UTC

Closing.

Reopen this if it happens again and you succeed in obtaining diagnostics.

Note You need to log in before you can comment on or make changes to this bug.