Bug 453929
| Summary: | 100 LVs (in single PV/VG) cause long hang when booting | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Richard W.M. Jones <rjones> | ||||||||
| Component: | mkinitrd | Assignee: | Peter Jones <pjones> | ||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 11 | CC: | agk, ahecox, bmarzins, bmr, dcantrell, dwysocha, jmh, kzak, mbroz, prockai, wtogami | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2010-01-12 15:32:25 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Richard W.M. Jones
2008-07-03 10:02:07 UTC
Can you be more precise? What are the last messages shown before it hangs? If you have 50 LVs instead of 100, how long is the hang in comparison? This is the last message before it hangs (I copied this by hand so it may not be exactly correct): device-mapper: uevent: version 1.0.3 device-mapper: ioctl: 4.13.0-ioctl (2007-10-18) initialised: dm-devel I will try with 50 LVs in a moment and let you know. 50 LVs => 3 minutes (These are all wallclock times, so accurate to the nearest minute). Can you also attach the actual 'init' script from inside the initrd? And what messages come next when it wakes up - try to spot whereabouts in the script the delay is happening if you can. And 3mins (50) -> 15mins (100) - what about something in between like 75? (you said it's fully up-to-date rawhide - so I'm assuming that means you ran mkinitrd *after* updating the lvm2 package) (if not, make sure you keep the problematic initrd before replacing it with a new one) well, it should work, if not, another problem related to lvmcache probably. anyway, assigning to me, I have test system for this. Version of LVM in the saved initrd.img ('bin/lvm version'):
LVM version: 2.02.39 (2008-06-27)
Library version: 1.02.27 (2008-06-25)
The version of nash in the saved initrd.img is 6.0.54. Created attachment 310911 [details]
init script from the saved initrd
I'm now completely up to date, and initrd has been rebuilt. 75 LVs cause a 10 minute boot delay. Created attachment 310918 [details]
lvmdump with 75 LVs
lvm2-2.02.39-3.fc10.i386 Linux thinkpad 2.6.26-0.98.rc8.git1.fc10.i686 #1 SMP Mon Jun 30 15:27:47 EDT 2008 i686 i686 i386 GNU/Linux nash-6.0.54-1.fc10.i386 mkinitrd-6.0.54-1.fc10.i386 ok, I have system with 1000 LVs, vgchange in initrd works ok (<2min) but there is another delay later, also I see Creating root device. Mounting root filesystem. get_netlink_msg returned No buffer space available ... probably nash have some problem here. I'll add more debug info later. As a data point, there is no problem with RHEL 5.2. Created attachment 310924 [details]
nash log screen
Screenshot with time info this nash script:
The time 1:40 for vgachange is ok - it is the same time like in normal system.
(but see delay in "mount /sysroot")
date
echo ----------------
echo Scanning logical volumes
time lvm vgscan --ignorelockingfailure
echo Activating logical volumes
time lvm vgchange -ay --ignorelockingfailure vg_test
echo ----------------
date
echo ----------------
echo resume UUID=...
resume UUID=f7e8fede-d4f0-42b8-8250-7c31fa8d0c87
date
echo Creating root device.
mkrootdev -t ext3 -o noatime,ro UUID=eb7582a3-b3d1-4244-ba4b-5c3b1e79e948
date
echo Mounting root filesystem.
mount /sysroot
date
echo Setting up other filesystems.
setuproot
date
echo loadpolicy
loadpolicy
Note that I have root on normal partition in previous example, lvm commands are just added there to show the problem. # rpm -q nash nash-6.0.54-1.fc10.x86_64 Reassigning to mkinitrd (nash is not in pkg list). Also mkinitrd (during automatic kernel update) has problems. Updating initrd with 1000 LVs is almost impossible - mkinitrd waits for ... echo nash-resolveDevice ... | /sbin/nash --forcequiet Doesn't this make this a dupe of bug 277271? (with 9 months old patch never integrated :-( Seems so. I think with device-mapper (LVM devices) there is another scanning loop on top of problem mentioned in bug 277271... This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping This is still problem, I cannot believe that this problems still remains unfixed... Runinng on rawhide, mkinitrd-6.0.81-1.fc11.x86_64 Without DM devices: # time /usr/libexec/plymouth/plymouth-update-initrd real 0m16.124s user 0m5.060s sys 0m11.870s Now let's create some fake DM devices (150 new mapped devices - return zero on access): # for i in $(seq 1 150) ; do dmsetup create "pv$i" --table "0 1024 zero" ; done # time /usr/libexec/plymouth/plymouth-update-initrd real 112m21.046s user 111m53.578s sys 0m32.850s nash eat a lot of memory too, and 100% cpu root 18471 99.9 6.2 421484 371376 pts/0 R+ 13:53 56:02 /sbin/nash --forcequiet This problem was observed with real lvm config, about 200 PVs. The example above is just easy reproducer without lvm involved. The kernel update on system with such config and activated volumes is still almost impossible. This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping This is a mass edit of all mkinitrd bugs. Thanks for taking the time to file this bug report (and/or commenting on it). As you may have heard in Fedora 12 mkinitrd has been replaced by dracut. In Fedora 12 the mkinitrd package is still around as some programs depend on certain libraries it provides, but mkinitrd itself is no longer used. In Fedora 13 mkinitrd will be removed completely. This means that all work on initrd has stopped. Rather then keeping mkinitrd bugs open and giving false hope they might get fixed we are mass closing them, so as to clearly communicate that no more work will be done on mkinitrd. We apologize for any inconvenience this may cause. If you are using Fedora 11 and are experiencing a mkinitrd bug you cannot work around, please upgrade to Fedora 12. If you experience problems with the initrd in Fedora 12, please file a bug against dracut. |