Description of problem: During testing of system with large numbers of LUNs, I was able to isolate a slow down in the boot up sequence to LVM. The original configuration of the system was a set of 256 LUNs accessible via four paths with a volume group spanning all LUNs. This caused a major slow down in booting. To isolate the cause, I removed multipathing and made many measurements. The goal was testing configurations with 1024 LUNs, however, this became difficult, as booting the system would take several hours with a volume group on the devices. I created a quick script to increase the number of PVs in a volume group one by one, recording timing information for three commands: vgextend, vgscan, and vgdisplay. I found that each command had similar run times which increased quite rapidly with a growing number of physical volumes. With 200 volumes, the run time of any of the aforemention commands was near two minutes, however, with 400 volumes, the run time is nearly 12 minutes. I used approximately 425 data points and fit a curve to it, and with 1024 physical volumes, any of the operations would take approximately two hours. Boot times followed a similar trend. Measurements are made based on times associated with runlevel changes reported by `last -x`. The system, with 1024 LUNs (not configured as physical volumes) booted in 9 minutes. There was no measurable slow down in boot time (1 minute granularity) until over 100 physical volumes were used by the system. At 200 volumes, the boot time had increased by two minutes, however, at 250 volumes, the boot time had increased by 7 minutes. With 500 volumes, booting took a full hour. Using the same fitting that I did earlier on the LVM operations, a boot with 1024 volumes is estimated to take four hours. This is similar to my observation of one actual boot with 1024 volumes in a volume group (however, I had no accurate timing information for the boot beyond message logs). All measurements were made on an 8 x 2.0GHz Xeon machine with 50GB of RAM. The devices used for physical volumes were 1GB LUNs on a ClariIon CX500 fibre channel array connected via two links to an Emulex HBA using the lpfc module. Although there was only a single path to each LUN, multipath was entirely disabled during measurement. Version-Release number of selected component (if applicable): lvm2-2.02.06-3.0.RHEL4 How reproducible: Always Steps to Reproduce: 1. Create volume group with a large number of volumes ( > 200 ) 2. time vgdisplay 3. Time a boot Actual results: With greater than 200 volumes, boots take 2 minutes to 4 hours longer than usual, and operations like vgdisplay or vgextend take 2 minutes to 2 hours. Expected results: While a delay is expected, with the resources available in the system, some of the values seem unreasonable. Additional info:
This seems like problem with too many copies of lvm metadata. You should probably use --metadatacopies 0 for most of PV's (and save only few backup copies and increase metadata area). See man vgcreate,pvcreate --metadatacopies remark (and also comment in 158687).
I'm fairly certain I used --metadatacopies when originally creating the physical volumes. I did see the remark in the man pages. However, I have already pvremoved all the volumes, so I have no way to be sure if it was used or not. I will go back and test a few data points explicitly with --metadatacopies 0 and report findings.
It turns out I misunderstood the notes in the man page. I just did a quick test using 10 copies of the metadata out of 1024 physical volumes and the run times of these tasks dropped to very acceptable levels. I'm marking this NOTABUG because it was user error.
Hello Ryan, Couple of questions: 1. Where you using MPIO to manage the multiple paths? If so were the logical volumes created on the mpath devices or were you still using sd names? 2. Did you have any special definitions for the "filter" and "types" in /etc/lvm/lvm.conf? Regards, Hari (eLab EMC)
The original configuration was using 256 x 4 with dm-multipath with volumes created on the mapped devices. During boot, I isolated the slowdown to LVM which is because of the misconfiguration and too many copies of volume group metadata. At the time, there was a filter setup to blacklist /dev/sd* from LVM (so that it would use the multipath maps), however, I don't recall defining any types. All the timing testing was done with 1024 unique LUNs. There was one path for each set of 256 LUNs.