Basic issue is pvdisplay is taking an unreasonably long time and consuming an unreasonably large amount of RAM. I tried a simple test with rhel4u5 xen node connected to iSCSI LUNs (about 100 luns each with 2 paths) and got execution time of around 33s and memory consumption of 14MB. Did a quick check of the lvmdump provided (see attached) - checked the /etc/lvm/lvm.conf filter line and it looks correct, as does the .cache file. Looks like the pvdisplay process was blocked doing direct IO when the cmd was run (not surprising). Might be storage dependent (EMC with powerpath). Snips from IRC session: Aug 27 15:40:45 <deepthot> is this a cluster or just single node with multipathing? Aug 27 15:40:55 <deepthot> the output from lvmdump will tell a lot of the details. Aug 27 15:57:51 <csm-laptop> 64 gig of ram Aug 27 15:58:05 <csm-laptop> it is not a cluster Aug 27 15:58:21 <deepthot> could be related to multipath - something like https://bugzilla.redhat.com/show_bug.cgi?id=217130 Aug 27 15:58:37 <deepthot> so you have 1000 luns, 4 paths / LUN, and so 4000 paths total? Aug 27 15:58:46 <deepthot> That will definately cause a problem with dm-multipath Aug 27 15:59:03 <deepthot> I am not sure it is your pvdisplay problem though - could be another problem altogether Aug 27 16:00:43 <csm-laptop> i see no mp file in etc at all Aug 27 16:01:45 <csm-laptop> so this is using powerpath Aug 27 16:01:54 <csm-laptop> i have the storage guy sitting next to me Aug 27 16:02:50 <deepthot> csm-laptop: ok, I'm not familar with powerpath Aug 27 16:02:59 <deepthot> know what it is, but don't know it really Aug 27 16:03:44 <csm-laptop> 225 devices *4 paths = 1000 devices Aug 27 16:05:36 <csm-laptop> okay so pvs is just sort of hanging there too Aug 27 16:18:52 <deepthot> are you getting any output at all, or is it just hanging and consuming memory? Aug 27 16:22:28 <deepthot> I'm looking at the code now and have a rhel4u5 machine I can work with Aug 27 16:22:45 <deepthot> on my setup, I do see the memory consumption going up as the command executes - I have 100 LUNs with 2 paths each Aug 27 16:23:48 <deepthot> Output is somewhat slow, but it does complete - I'm getting like 14MB consumption and ~33s execution time for ~200 paths so nowhere near what you are seeing Aug 27 16:32:38 <deepthot> have you tried tweaking the "filter" line in /etc/lvm/lvm.conf? Aug 27 16:32:48 <csm-laptop> this has been running for hours and has not finished Aug 27 16:33:34 <deepthot> If you know, for instance, that all /dev/sd* devices are underlying paths, and you just want to scan /dev/foobar* devices instead (these are the multipath devices), you could add a filter line that would exclude them Aug 27 16:33:58 <deepthot> dwysocha Aug 27 16:34:28 <deepthot> and you can't kill it? Aug 27 16:35:42 <csm-laptop> email on it's way Aug 27 16:37:06 <csm-laptop> I have not tried to kill the process Aug 27 16:37:31 <csm-laptop> also I have not tweaked the filter Aug 27 16:37:44 <csm-laptop> take a look at the lvmdump and lets talk tomorrow? Aug 28 14:49:55 <deepthot> So are you getting any output at all, or does it just freeze? Aug 28 15:00:01 <csm-laptop> eventually the process finishes.... it just takes freaking forever! Aug 28 15:00:45 <deepthot> so on my system, it looks like I get output for one pv, followed by a pause of say 500ms, followed by the next one, Aug 28 15:01:02 <deepthot> but on your system, the pause is in minutes, hours, or days? Aug 28 15:01:24 <csm-laptop> hours Aug 28 15:04:55 <deepthot> did you try running pvdisplay on just one of the PVs? Aug 28 15:06:00 <deepthot> you could try running on one PV in verbose mode, e.g. " pvdisplay -vvvv /dev/mapper/mpath0 2&>output.txt" Aug 28 15:07:43 <csm-laptop> well I can't test anything as the host is down for maintenance right now
Created attachment 177301 [details] lvmdump file of system with the problem
This problem continues to haunt us here at Bloomberg. Has there been any progress on this issue?
More importantly, the issue also causes extremely long boot times (1-2 hours) because the vgscan in the init scripts has similar behavior. It appears to perform (number of devices)^(number of devices) device stats (about ~800k in our case) - for every device it finds, it appears to recheck every device including the ones already checked. Also, the filter in lvm.conf is configured to ignore /dev/sd* devices (the LUNs); however this appears to only apply as to whether or not it will consider any metadata on the devices - it still stats the device nonetheless.
I have changed the priority on this to medium instead of low... given the nature of the problem and the fact that the machine, on boot, is out of service so long it seems to merit that. Please change it back if I am wrong.
Created attachment 207341 [details] lvmdump file
I have not made any progress on this. I am about to leave on a short vacation but will try to take at least a brief look when I get back next week. I think I know roughly why this is but not sure how hard it is to fix. Probably not easy but maybe there is something we can do to improve the situation.
If there is anything you would like us to provide or test please let us know.
presuming that vacation is over do we have anything to report about this?
Not yet - other things getting in the way sorry.
Did you set up the VG specifically to contain a large number of PVs or are you just using the default settings? (See man pvcreate --metadatacopies and --metadatasize etc.)
[We know about the two performance enhancements needed (lack of internal metadata caching so operations are repeated needlessly; lack of automated VG metadata area mangement).]
In our testing here the first point you make about repeated operations seems to be our likely problem. I am working on getting answers to how this was created... since I didn't do it I really don't know.
I have confirmed that the default settings were used in creation of the PVs.
Customer in IT 133260 seeing this as well. I've reproduced this internally with about 500 (small) PVs created with default options. Using pvcreate with --metadatacopies 0 gets rid of the huge delays on VG/LV/PV operations. Does the suggestion in bug 229560 make sense here (add VG name to .cache file)?
What progress do we see on this? Customer wants to know.
Largely the answer is removing the majority of MDAs as discussed. It's the direction upstream seems to be taking. There are also tool updates coming down the pipe which will help with managing such a setup.
Anything new to report on this? It's a month on from the last update and I am sure to get hammered soon!
----- Additional Comments From thoss.com 2008-02-01 03:39 EDT ------- Is there any update at the RedHat site for that Bugzilla ? Do you need any assistance from IBM ? This event sent from IssueTracker by jkachuck issue 136514
There are basically two steps to speed up this process we are working on 1) use internal cache for device labels 2) use internal cache for metadata areas A solution for problem 1) was just submitted in upstream code (but need some subsequent patches for non-mda PVs), we are working on 2) issue. I will update this bugzilla when patches are ready. Then some testing on affected configuration would be nice of course.
Any further updates regarding the availability of patches.
So it's almost 2 months from the last update at this point, the Solaris and AIX people are laughing about how long this is taking and the lack of patch availability. I have to admit that this is less than optimal in terms of support for an "Enterprise" solution.
The fix for this BZ is planned for RHEL 4.7. A prerequisite is to get the change reviewed and accepted upstream, and thoroughly tested. This work is underway, and continues to be a high priority.
Setting this bug to POST status because crucial patch (solving the activation time) is now in upstream CVS. (Several previous commits were already in tree and solved partial problems - like caching of device labels (see comment #36). Anyway, several steps are needed now to prepare test package for RHEL4, I will update this bugzilla when we have packages ready.
Testing build for RHEL4 already exist now. If anyone want test it before it reach public beta testing phase, please contact Red Hat support. (For reference, upstream package containing fixes is LVM2 2.02.35 release.) Thanks for your patience.
Added storage-related partners for their heads-up and request for testing.
----- Additional Comments From mgrf.com 2008-07-01 05:33 EDT ------- Hello Red Hat, Can you please post your test results for the improved fix ? Thx This event sent from IssueTracker by jkachuck issue 136514
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0776.html