Bug 453929

Summary:

100 LVs (in single PV/VG) cause long hang when booting

Product:

[Fedora] Fedora

Reporter:

Richard W.M. Jones <rjones>

Component:

mkinitrd

Assignee:

Peter Jones <pjones>

Status:

CLOSED WONTFIX

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

medium

Version:

CC:

agk, ahecox, bmarzins, bmr, dcantrell, dwysocha, jmh, kzak, mbroz, prockai, wtogami

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2010-01-12 15:32:25 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
init script from the saved initrd	none
lvmdump with 75 LVs	none
nash log screen	none

Description Richard W.M. Jones 2008-07-03 10:02:07 UTC

Description of problem:

  I need to create a large number of LVs for unrelated testing purposes.
  However when I do this, it causes a very long hang during boot
  (apparently somewhere in initrd).  This is very easy to reproduce (see
  the steps below).

Version-Release number of selected component (if applicable):

  lvm2-2.02.39-3.fc10.i386

  The machine is running Rawhide and is up to date as of about
  two days ago (2008-07-01).

How reproducible:

  Always

Steps to Reproduce:

  On a machine with a single PV & VG (ie. an ordinary Rawhide install)
  and just a little bit of free space in the VG, you can reproduce this
  easily:

  for i in `seq 0 99`; do /sbin/lvcreate -L 32M -n Temp$i VolGroup00; done
  /sbin/reboot

Actual results:

  The machine will hang for around 15 minutes at one stage in the boot
  process.  There is no visual indication during this time that anything
  is happening at all, but if you wait long enough the machine should
  eventually reboot.

  The hang appears to happen somewhere in initrd.

Expected results:

  Machine should either reboot more quickly, or give some indication
  of progress.

Additional info:

Comment 1 Alasdair Kergon 2008-07-03 10:15:59 UTC

Can you be more precise?
What are the last messages shown before it hangs?

Comment 2 Alasdair Kergon 2008-07-03 10:19:48 UTC

If you have 50 LVs instead of 100, how long is the hang in comparison?

Comment 3 Richard W.M. Jones 2008-07-03 10:21:47 UTC

This is the last message before it hangs (I copied this by hand so it
may not be exactly correct):

device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.13.0-ioctl (2007-10-18) initialised: dm-devel

I will try with 50 LVs in a moment and let you know.

Comment 4 Richard W.M. Jones 2008-07-03 10:31:09 UTC

50 LVs => 3 minutes

(These are all wallclock times, so accurate to the nearest minute).

Comment 5 Alasdair Kergon 2008-07-03 10:34:43 UTC

Can you also attach the actual 'init' script from inside the initrd?  And what
messages come next when it wakes up - try to spot whereabouts in the script the
delay is happening if you can.

And 3mins (50) -> 15mins (100)  - what about something in between like 75?

Comment 6 Alasdair Kergon 2008-07-03 10:36:27 UTC

(you said it's fully up-to-date rawhide - so I'm assuming that means you ran
mkinitrd *after* updating the lvm2 package)

Comment 7 Alasdair Kergon 2008-07-03 10:38:03 UTC

(if not, make sure you keep the problematic initrd before replacing it with a
new one)

Comment 8 Milan Broz 2008-07-03 10:44:21 UTC

well, it should work, if not, another problem related to lvmcache probably.
anyway, assigning to me, I have test system for this.

Comment 9 Richard W.M. Jones 2008-07-03 11:04:44 UTC

Version of LVM in the saved initrd.img ('bin/lvm version'):

  LVM version:     2.02.39 (2008-06-27)
  Library version: 1.02.27 (2008-06-25)

Comment 10 Richard W.M. Jones 2008-07-03 11:09:52 UTC

The version of nash in the saved initrd.img is 6.0.54.

Comment 11 Richard W.M. Jones 2008-07-03 11:11:38 UTC

Created attachment 310911 [details]
init script from the saved initrd

Comment 12 Richard W.M. Jones 2008-07-03 12:24:33 UTC

I'm now completely up to date, and initrd has been rebuilt.

75 LVs cause a 10 minute boot delay.

Comment 13 Richard W.M. Jones 2008-07-03 12:26:33 UTC

Created attachment 310918 [details]
lvmdump with 75 LVs

Comment 14 Richard W.M. Jones 2008-07-03 12:29:09 UTC

lvm2-2.02.39-3.fc10.i386

Linux thinkpad 2.6.26-0.98.rc8.git1.fc10.i686 #1 SMP Mon Jun 30 15:27:47 EDT
2008 i686 i686 i386 GNU/Linux

nash-6.0.54-1.fc10.i386

mkinitrd-6.0.54-1.fc10.i386

Comment 15 Milan Broz 2008-07-03 13:34:47 UTC

ok, I have system with 1000 LVs, vgchange in initrd works ok (<2min) but there
is another delay later, also I see

Creating root device.
Mounting root filesystem.
get_netlink_msg returned No buffer space available

... probably nash have some problem here.

I'll add more debug info later.

Comment 16 Richard W.M. Jones 2008-07-03 13:46:33 UTC

As a data point, there is no problem with RHEL 5.2.

Comment 17 Milan Broz 2008-07-03 14:12:07 UTC

Created attachment 310924 [details]
nash log screen


Screenshot with time info this nash script:
The time 1:40 for vgachange is ok - it is the same time like in normal system.
(but see delay in "mount /sysroot")

date
echo ----------------
echo Scanning logical volumes
time lvm vgscan --ignorelockingfailure
echo Activating logical volumes
time lvm vgchange -ay --ignorelockingfailure  vg_test
echo ----------------
date
echo ----------------

echo resume UUID=...
resume UUID=f7e8fede-d4f0-42b8-8250-7c31fa8d0c87

date
echo Creating root device.
mkrootdev -t ext3 -o noatime,ro UUID=eb7582a3-b3d1-4244-ba4b-5c3b1e79e948

date
echo Mounting root filesystem.
mount /sysroot

date
echo Setting up other filesystems.
setuproot

date
echo loadpolicy
loadpolicy

Comment 18 Milan Broz 2008-07-03 14:20:05 UTC

Note that I have root on normal partition in previous example, lvm commands are
just added there to show the problem.

# rpm -q nash
nash-6.0.54-1.fc10.x86_64

Reassigning to mkinitrd (nash is not in pkg list).

Comment 19 Milan Broz 2008-07-07 14:34:55 UTC

Also mkinitrd (during automatic kernel update) has problems.

Updating initrd with 1000 LVs is almost impossible - mkinitrd waits for
...
echo nash-resolveDevice ... | /sbin/nash --forcequiet

Comment 20 Alexandre Oliva 2008-07-13 07:14:50 UTC

Doesn't this make this a dupe of bug 277271? (with 9 months old patch never
integrated :-(

Comment 21 Milan Broz 2008-07-13 09:36:02 UTC

Seems so.
I think with device-mapper (LVM devices) there is another scanning loop on top
of problem mentioned in bug 277271...

Comment 22 Bug Zapper 2008-11-26 02:30:30 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 23 Milan Broz 2009-04-08 13:16:09 UTC

This is still problem, I cannot believe that this problems still remains unfixed...

Runinng on rawhide, mkinitrd-6.0.81-1.fc11.x86_64

Without DM devices:
# time /usr/libexec/plymouth/plymouth-update-initrd

real    0m16.124s
user    0m5.060s
sys     0m11.870s


Now let's create some fake DM devices (150 new mapped devices - return zero on access):

# for i in $(seq 1 150) ; do dmsetup create "pv$i" --table "0 1024 zero" ; done
# time /usr/libexec/plymouth/plymouth-update-initrd

real    112m21.046s
user    111m53.578s
sys     0m32.850s

nash eat a lot of memory too, and 100% cpu
root     18471 99.9  6.2 421484 371376 pts/0   R+   13:53  56:02 /sbin/nash --forcequiet

This problem was observed with real lvm config, about 200 PVs. The example above is just easy reproducer without lvm involved.

The kernel update on system with such config and activated volumes is still almost impossible.

Comment 24 Bug Zapper 2009-06-09 09:38:09 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 25 Hans de Goede 2010-01-12 15:32:25 UTC

This is a mass edit of all mkinitrd bugs.

Thanks for taking the time to file this bug report (and/or commenting on it).

As you may have heard in Fedora 12 mkinitrd has been replaced by dracut. In Fedora 12 the mkinitrd package is still around as some programs depend on
certain libraries it provides, but mkinitrd itself is no longer used.

In Fedora 13 mkinitrd will be removed completely. This means that all work
on initrd has stopped.

Rather then keeping mkinitrd bugs open and giving false hope they might get fixed we are mass closing them, so as to clearly communicate that no more work will be done on mkinitrd. We apologize for any inconvenience this may cause. 

If you are using Fedora 11 and are experiencing a mkinitrd bug you cannot work around, please upgrade to Fedora 12. If you experience problems with the initrd in Fedora 12, please file a bug against dracut.