Description of problem:
During testing of system with large numbers of LUNs, I was able to isolate a
slow down in the boot up sequence to LVM. The original configuration of the
system was a set of 256 LUNs accessible via four paths with a volume group
spanning all LUNs. This caused a major slow down in booting. To isolate the
cause, I removed multipathing and made many measurements.
The goal was testing configurations with 1024 LUNs, however, this became
difficult, as booting the system would take several hours with a volume group on
I created a quick script to increase the number of PVs in a volume group one by
one, recording timing information for three commands: vgextend, vgscan, and
vgdisplay. I found that each command had similar run times which increased quite
rapidly with a growing number of physical volumes.
With 200 volumes, the run time of any of the aforemention commands was near two
minutes, however, with 400 volumes, the run time is nearly 12 minutes. I used
approximately 425 data points and fit a curve to it, and with 1024 physical
volumes, any of the operations would take approximately two hours.
Boot times followed a similar trend. Measurements are made based on times
associated with runlevel changes reported by `last -x`. The system, with 1024
LUNs (not configured as physical volumes) booted in 9 minutes. There was no
measurable slow down in boot time (1 minute granularity) until over 100 physical
volumes were used by the system. At 200 volumes, the boot time had increased by
two minutes, however, at 250 volumes, the boot time had increased by 7 minutes.
With 500 volumes, booting took a full hour. Using the same fitting that I did
earlier on the LVM operations, a boot with 1024 volumes is estimated to take
four hours. This is similar to my observation of one actual boot with 1024
volumes in a volume group (however, I had no accurate timing information for the
boot beyond message logs).
All measurements were made on an 8 x 2.0GHz Xeon machine with 50GB of RAM. The
devices used for physical volumes were 1GB LUNs on a ClariIon CX500 fibre
channel array connected via two links to an Emulex HBA using the lpfc module.
Although there was only a single path to each LUN, multipath was entirely
disabled during measurement.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create volume group with a large number of volumes ( > 200 )
2. time vgdisplay
3. Time a boot
With greater than 200 volumes, boots take 2 minutes to 4 hours longer than
usual, and operations like vgdisplay or vgextend take 2 minutes to 2 hours.
While a delay is expected, with the resources available in the system, some of
the values seem unreasonable.
This seems like problem with too many copies of lvm metadata.
You should probably use --metadatacopies 0 for most of PV's (and save only few
backup copies and increase metadata area).
See man vgcreate,pvcreate --metadatacopies remark (and also comment in 158687).
I'm fairly certain I used --metadatacopies when originally creating the physical
volumes. I did see the remark in the man pages. However, I have already
pvremoved all the volumes, so I have no way to be sure if it was used or not.
I will go back and test a few data points explicitly with --metadatacopies 0 and
It turns out I misunderstood the notes in the man page. I just did a quick test
using 10 copies of the metadata out of 1024 physical volumes and the run times
of these tasks dropped to very acceptable levels.
I'm marking this NOTABUG because it was user error.
Couple of questions:
1. Where you using MPIO to manage the multiple paths? If so were the logical
volumes created on the mpath devices or were you still using sd names?
2. Did you have any special definitions for the "filter" and "types"
Hari (eLab EMC)
The original configuration was using 256 x 4 with dm-multipath with volumes
created on the mapped devices. During boot, I isolated the slowdown to LVM which
is because of the misconfiguration and too many copies of volume group metadata.
At the time, there was a filter setup to blacklist /dev/sd* from LVM (so that it
would use the multipath maps), however, I don't recall defining any types.
All the timing testing was done with 1024 unique LUNs. There was one path for
each set of 256 LUNs.