Bug 261521

Summary: pvdisplay of 250 luns with 4 paths each (1000 paths) takes many hours or days and consumes 4+GB of RAM
Product: Red Hat Enterprise Linux 4 Reporter: Dave Wysochanski <dwysocha>
Component: lvm2Assignee: Milan Broz <mbroz>
Status: CLOSED ERRATA QA Contact: Corey Marthaler <cmarthal>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 4.5CC: agk, ahecox, andriusb, berthiaume_wayne, bhinson, bmr, coughlan, csm, dwysocha, evuraan, jbrassow, marting, mbroz, prockai, pvrabec, rjones, rsarraf, tao
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: RHBA-2008-0776 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 20:07:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 442308, 442309    
Attachments:
Description Flags
lvmdump file of system with the problem
none
lvmdump file none

Description Dave Wysochanski 2007-08-28 19:35:40 UTC
Basic issue is pvdisplay is taking an unreasonably long time and consuming an
unreasonably large amount of RAM.

I tried a simple test with rhel4u5 xen node connected to iSCSI LUNs (about 100
luns each with 2 paths) and got execution time of around 33s and memory
consumption of 14MB.

Did a quick check of the lvmdump provided (see attached) - checked the
/etc/lvm/lvm.conf filter line and it looks correct, as does the .cache file. 
Looks like the pvdisplay process was blocked doing direct IO when the cmd was
run (not surprising).

Might be storage dependent (EMC with powerpath).


Snips from IRC session:

Aug 27 15:40:45 <deepthot>      is this a cluster or just single node with
multipathing?
Aug 27 15:40:55 <deepthot>      the output from lvmdump will tell a lot of the
details.
Aug 27 15:57:51 <csm-laptop>    64 gig of ram
Aug 27 15:58:05 <csm-laptop>    it is not a cluster
Aug 27 15:58:21 <deepthot>      could be related to multipath - something like
https://bugzilla.redhat.com/show_bug.cgi?id=217130
Aug 27 15:58:37 <deepthot>      so you have 1000 luns, 4 paths / LUN, and so
4000 paths total?
Aug 27 15:58:46 <deepthot>      That will definately cause a problem with
dm-multipath
Aug 27 15:59:03 <deepthot>      I am not sure it is your pvdisplay problem
though - could be another problem altogether
Aug 27 16:00:43 <csm-laptop>    i see no mp file in etc at all
Aug 27 16:01:45 <csm-laptop>    so this is using powerpath
Aug 27 16:01:54 <csm-laptop>    i have the storage guy sitting next to me
Aug 27 16:02:50 <deepthot>      csm-laptop: ok, I'm not familar with powerpath
Aug 27 16:02:59 <deepthot>      know what it is, but don't know it really
Aug 27 16:03:44 <csm-laptop>    225 devices *4 paths = 1000 devices
Aug 27 16:05:36 <csm-laptop>    okay so pvs is just sort of hanging there too
Aug 27 16:18:52 <deepthot>      are you getting any output at all, or is it just
hanging and consuming memory?
Aug 27 16:22:28 <deepthot>      I'm looking at the code now and have a rhel4u5
machine I can work with
Aug 27 16:22:45 <deepthot>      on my setup, I do see the memory consumption
going up as the command executes - I have 100 LUNs with 2 paths each
Aug 27 16:23:48 <deepthot>      Output is somewhat slow, but it does complete -
I'm getting like 14MB consumption and ~33s execution time for ~200 paths so
nowhere near what you are seeing
Aug 27 16:32:38 <deepthot>      have you tried tweaking the "filter" line in
/etc/lvm/lvm.conf?
Aug 27 16:32:48 <csm-laptop>    this has been running for hours and has not finished
Aug 27 16:33:34 <deepthot>      If you know, for instance, that all /dev/sd*
devices are underlying paths, and you just want to scan /dev/foobar* devices
instead (these are the multipath devices), you could add a filter line that
would exclude them
Aug 27 16:33:58 <deepthot>      dwysocha
Aug 27 16:34:28 <deepthot>      and you can't kill it?
Aug 27 16:35:42 <csm-laptop>    email on it's way
Aug 27 16:37:06 <csm-laptop>    I have not tried to kill the process
Aug 27 16:37:31 <csm-laptop>    also I have not tweaked the filter
Aug 27 16:37:44 <csm-laptop>    take a look at the lvmdump and lets talk tomorrow?



Aug 28 14:49:55 <deepthot>      So are you getting any output at all, or does it
just freeze?
Aug 28 15:00:01 <csm-laptop>    eventually the process finishes.... it just
takes freaking forever!
Aug 28 15:00:45 <deepthot>      so on my system, it looks like I get output for
one pv, followed by a pause of say 500ms, followed by the next one,
Aug 28 15:01:02 <deepthot>      but on your system, the pause is in minutes,
hours, or days?
Aug 28 15:01:24 <csm-laptop>    hours
Aug 28 15:04:55 <deepthot>      did you try running pvdisplay on just one of the
PVs?
Aug 28 15:06:00 <deepthot>      you could try running on one PV in verbose mode,
e.g. " pvdisplay -vvvv /dev/mapper/mpath0 2&>output.txt"
Aug 28 15:07:43 <csm-laptop>    well I can't test anything as the host is down
for maintenance right now

Comment 1 Dave Wysochanski 2007-08-28 19:35:40 UTC
Created attachment 177301 [details]
lvmdump file of system with the problem

Comment 2 Chuck Mead 2007-09-26 17:37:26 UTC
This problem continues to haunt us here at Bloomberg. Has there been any
progress on this issue?

Comment 3 Chuck Mead 2007-09-26 18:33:13 UTC
More importantly, the issue also causes extremely long boot times (1-2 hours) 
because the vgscan in the init scripts has similar behavior. It appears to 
perform (number of devices)^(number of devices) device stats (about ~800k in 
our case) - for every device it finds, it appears to recheck every device 
including the ones already checked. Also, the filter in lvm.conf is configured
to ignore /dev/sd* devices (the LUNs); however this appears to only apply as to
whether or not it will consider any metadata on the devices - it still stats the
device nonetheless.


Comment 4 Chuck Mead 2007-09-26 18:34:59 UTC
I have changed the priority on this to medium instead of low... given the nature
of the problem and the fact that the machine, on boot, is out of service so long
it seems to merit that. Please change it back if I am wrong.

Comment 5 Chuck Mead 2007-09-26 18:44:13 UTC
Created attachment 207341 [details]
lvmdump file

Comment 6 Dave Wysochanski 2007-09-26 21:53:19 UTC
I have not made any progress on this.  I am about to leave on a short vacation
but will try to take at least a brief look when I get back next week.

I think I know roughly why this is but not sure how hard it is to fix.  Probably
not easy but maybe there is something we can do to improve the situation.

Comment 7 Chuck Mead 2007-09-27 14:25:04 UTC
If there is anything you would like us to provide or test please let us know.

Comment 8 Chuck Mead 2007-10-03 18:10:33 UTC
presuming that vacation is over do we have anything to report about this?

Comment 9 Dave Wysochanski 2007-10-03 18:22:52 UTC
Not yet - other things getting in the way sorry.

Comment 10 Alasdair Kergon 2007-10-03 18:41:46 UTC
Did you set up the VG specifically to contain a large number of PVs or are you
just using the default settings?  (See man pvcreate --metadatacopies and
--metadatasize etc.)

Comment 11 Alasdair Kergon 2007-10-03 18:48:28 UTC
[We know about the two performance enhancements needed (lack of internal
metadata caching so operations are repeated needlessly; lack of automated VG
metadata area mangement).]

Comment 12 Chuck Mead 2007-10-03 19:08:12 UTC
In our testing here the first point you make about repeated operations seems to
be our likely problem. I am working on getting answers to how this was
created... since I didn't do it I really don't know.

Comment 14 Chuck Mead 2007-10-03 19:37:29 UTC
I have confirmed that the default settings were used in creation of the PVs.

Comment 16 Brad Hinson 2007-10-29 19:46:50 UTC
Customer in IT 133260 seeing this as well.  I've reproduced this internally with
about 500 (small) PVs created with default options.  Using pvcreate with
--metadatacopies 0 gets rid of the huge delays on VG/LV/PV operations.

Does the suggestion in bug 229560 make sense here (add VG name to .cache file)?

Comment 24 Chuck Mead 2007-12-19 19:26:06 UTC
What progress do we see on this? Customer wants to know.

Comment 25 Chris Evich 2007-12-19 19:44:40 UTC
Largely the answer is removing the majority of MDAs as discussed.  It's the
direction upstream seems to be taking.  There are also tool updates coming down
the pipe which will help with managing such a setup.

Comment 28 Chuck Mead 2008-01-17 19:21:33 UTC
Anything new to report on this? It's a month on from the last update and I am
sure to get hammered soon!

Comment 35 Issue Tracker 2008-02-01 14:34:36 UTC
----- Additional Comments From thoss.com  2008-02-01 03:39 EDT
-------
Is there any update at the RedHat site for that Bugzilla ? Do you need
any
assistance from IBM ? 


This event sent from IssueTracker by jkachuck 
 issue 136514

Comment 36 Milan Broz 2008-02-01 15:12:26 UTC
There are basically two steps to speed up this process we are working on
1) use internal cache for device labels
2) use internal cache for metadata areas

A solution for problem 1) was just submitted in upstream code (but need some
subsequent patches for non-mda PVs), we are working on 2) issue.

I will update this bugzilla when patches are ready.
Then some testing on affected configuration would be nice of course.

Comment 39 Darin Langone 2008-03-26 15:22:50 UTC
Any further updates regarding the availability of patches.  

Comment 42 Darin Langone 2008-03-28 12:59:06 UTC
So it's almost 2 months from the last update at this point, the Solaris and AIX 
people are laughing about how long this is taking and the lack of patch 
availability.  I have to admit that this is less than optimal in terms of 
support for an "Enterprise" solution.  

Comment 47 Tom Coughlan 2008-03-31 19:16:32 UTC
The fix for this BZ is planned for RHEL 4.7. A prerequisite is to get the change
reviewed and accepted upstream, and thoroughly tested. This work is underway,
and continues to be a high priority.  

Comment 48 Milan Broz 2008-04-02 09:47:40 UTC
Setting this bug to POST status because crucial patch (solving the activation
time) is now in upstream CVS.

(Several previous commits were already in tree and solved partial problems -
like caching of device labels (see comment #36).

Anyway, several steps are needed now to prepare test package for RHEL4, I will
update this bugzilla when we have packages ready.


Comment 55 Milan Broz 2008-04-17 08:52:55 UTC
Testing build for RHEL4 already exist now.

If anyone want test it before it reach public beta testing phase, please contact
Red Hat support.

(For reference, upstream package containing fixes is LVM2 2.02.35 release.)

Thanks for your patience.


Comment 56 Andrius Benokraitis 2008-04-23 03:37:26 UTC
Added storage-related partners for their heads-up and request for testing.

Comment 73 Issue Tracker 2008-07-01 13:37:41 UTC
----- Additional Comments From mgrf.com  2008-07-01 05:33 EDT
-------
Hello Red Hat, 
Can you please post your test results for the improved fix ?
Thx 


This event sent from IssueTracker by jkachuck 
 issue 136514

Comment 76 errata-xmlrpc 2008-07-24 20:07:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0776.html