Red Hat Bugzilla – Bug 232862
Anaconda exception trying lgchange the disk layout
Last modified: 2008-03-27 12:37:07 EDT
Description of problem:
Anaconda throws an exception (report attached) when trying to adjust the LVMs
generated by a custom configuration. Specifically, I'm simply selecting to
format some of the existing LVM pieces, and assign mount points, without making
any size or format changes (well no format changes in the LVMs - changing /boot
from ext2 to ext3.)
Version-Release number of selected component (if applicable):
three times in a row
Steps to Reproduce:
1. boot rawhide boot.iso
2. use custom config for disk and assign mount points, etc.
3. watch it throw exception before formatting the filesystems
see attached exception report.
work1.private: Pentium III (Coppermine) @600MHz 512MB RAM MSI Mobo
etc. all standard and working with older rawhides.
Created attachment 150349 [details]
Anaconda dump of exception - date of rawhide 2007-03-18
Created attachment 150814 [details]
Anaconda dump of exception - date of rawhide 2007-03-23
See attached exception report for todays rawhide failure.
With anaconda-18.104.22.168 I still get tracebacks.
More relevant, however, is that I tried an install without customizing the
layout - and it works! The bug is centered somewhere where the code is
attempting to re-do the formatting of logical volumes.
I suspect that my next install with doing a custom layout with a zeroed disk
will work also. It might also possibly be in doing something with a "weird"
name: I had a LV that was to be mounted on "/data2" and the disklabel for the
volume showed up as only "data2" (no leading slash).
I will experiment with this last possibility a bit in the next several installs.
My problem with debugging this is that I still don't really understand the
internals of the LVM code and utilities. The anaconda dump may or may not
contain the failing vgchange command and arguments, which are:
("vgchange", "-ay", "-v")
and I have no idea what that is supposed to do, but it *feels* like the best
place on which to focus attention.
Let me know if there are specific tests you'd like me to make.
follow-up: wipeing the boot track doesn't prevent the bug.
apparently, any major changes to the default layout (like resizing root and
swap and adding separate /usr and /var LVs) result in the lvchange error
occurring. I'm going to try a no-LVM layout and see what happens.
Created attachment 152095 [details]
Anaconda dump of exception - date of rawhide 2007-04-09 try #1 (mod default)
Created attachment 152096 [details]
dump from try #2 (No LVM configured in layout)
Created attachment 152097 [details]
dump from try #3 (accept default)
Try #1 - wipe MBR and partition table, modify the LVM and attempt install.
Try #2 - wipe MBR and partition table, generate custom NO-LVM layout and install
Try #3 - wipe MBR and partition table, accept defaults and install
Why is it trying to LVCHANGE a layout with NO LVM partitions?
Will try a few more variations and report.
I dragged out Knoppix and used QTParted to make my desired layout on the disc.
Loaded up the boot.iso and used "custom layout" to assign the mount points.
WHY is anaconda calling lvm stuff at all?
I let anaconda (disk-druid) make its default config (wipe out Linux partitions
and create default layout) and no other changes.
It's now in the process of formatting and installing....
I'm tempted (but not ready to) look at the anaconda code and try to fathom the
python stuff to see what is wrong. It has been too long since I've had to
really look at code.
Okay, rawhide of 2007-04-16 is still suffering from the bug.
I know why the vgchange is being called - to "activate" the volume groups on the
I also know WHY IT FAILS!
reading the man page for vgchange shows the example:
vgchange -a y
to activate all volumes, however, the args for the anaconda call are:
The command to do should be:
vgchange -a y
Please fix the code!
"-ay" is the same as "-a y" with the lvm tools and has been for a long time
OK, I did verify that it works as written, but that merely means I don't know
why it is failing. Are you seeing the problem at all, or am I at risk of a
Okay, as of 2007-04-23 rawhide, there is a major improvement in this situation.
Things mostly work, with a glitch coming in if the names of the VG or LVs are
changed. Non-standard names lead to a failure to format because the names are
not properly activated (no VG name in /dev)
Leaving the names standard allow the creation of different partitioning schemes
than just the default. I am getting some problems also with non-LVM layouts.
I suspect the problem lies not in anaconda *per se* bu perhaps in the LVM or
Created attachment 153728 [details]
current f7t4 exception dump
have to remote copy the bug attachments from target machine by hand as the
"save to remote" doesn't seem to work.
This bug is occurring in F7t4 i386 DVD spin too.
"debugging" the exception and trying the command from the debugger results in a
"cannot allocate memory" error
memtest86+ doesn't find any errors in the 256MB of RAM on the work1.private machine.
Free from the VT2 console shows ~3.4Mb free.
will try activating swap before depsolving to see if OOM can be avoided.
I'm seeing this same problem as in the original report. Mine is a kickstart
installation on x86_64 system with 512MB.
Can you try a rawhide tree after they start showing back up and let me know if
this is still not working for you? We've done some recent changes around
activating and deactivating LVM that I think should clear this issue up. Thanks.
This problem seems to be fixed in the current rawhide, for me.
I think that you should ignore my previous comment (#20). What
happened was that when using the mirror at mirrorservice.org it gets
beyond where I thought it failed before (with the unhandled
exception), but the mirror is clearly incomplete (many messages about
missing packages). During the "starting install" phase it fails not
being able to open glibc-headers-2.5.90-21.x86_64.rpm.
Using the mirror at ftp.funet.fi the install fails with an unhandled exception
and LvmError: vgachange failed in the gui window. I can see right at
the top of the F3 screen that there is a message: "ERROR: Running lvm:
Cannot allocate memory".
I guess I was mistaken about where the failure occurred, and based on
the seemingly complete mirror at ftp.funet.fi, the install still
fails with 512MB.
Sorry for the confusion.
Unfortunately the rawhide of 2007-05-13 doesn't clear the issue.
Graphical mode with 256 MB RAM crashes always
Text Mode crashes if too much is added to the mix (Gnome+KDE e.g.)
Text mode works with default minus a few items
I have couple of anacdumps here, but I'm not going to bother submitting them
unless requested since they appear very similar to the onews already on file.
if there is an existing swap partition, use it,
then unswap before committing the partition changes
then mkswap the new swap partition and use it before proceeding
(In reply to comment #22)
> if there is an existing swap partition, use it,
> then unswap before committing the partition changes
> then mkswap the new swap partition and use it before proceeding
That's not really going to help the problem since we'll still have the data
around. It just changes things to be more likely to hit kernel bugs ;-)
I've changed our thresholds for doing early partition commits (and thus early
swap on) to be a little bit more aggressive on x86. x86_64 should already be in
better shape due to making the change there a week or so ago.
Well...Jeremy, isn't this bug report a fixed duplicate of mine (> 390 MB for
x86_64 installation, > 256 MB for x86 installation)? When reading it looks
like the same problem with the same error messages...
Oh, it's the same bug in root cause.
This one is for the ix86 architecture, and yours is for the _64 architecture.
The upshot is that 256 MB is no longer sufficient for a graphical install on the
x86 plain arch unless you don't select *any* additional packages, and even then
it might bite you due to the uncertainties of the install process.
(Deterministic? heck no.)
I was beginning to think I was the only one seeing the bug for a while, but it
has turned into a hard one to fix.
Thursday's rawhide (2007-05-24) now has a different major failure mode in low
Anaconda is detecting the low memory condition and activating swap early,
however as it goes to format the filesystems previously committed, it removes
the LVM entries in /dev/mapper
They were there, I looked at them with the VT02 window during software selection
and depsolving! Then they were gone and the format command failed.
Saturday's rawhide and RC2 are a no-go for me.
The LVM entries are there during software selection and depsolving, but
disappear when anaconda goes to format the filesystems.
This is rawhide date 2007-05-26
in packages.py:turnOnFilesystems() there is no check to see if the partitioning
has been committed already and/or that swap was activated.
It looks to me as if a check against anaconda.id.fsset.isActive() should be
somewhere in there. it should/might be similar to the sequence found in
partitioningComplete() in partitions.py
I'm struggling cause I'm not too familiar with python (yet!)
Do you have a log / crash dump from the more recent failures?
I'm trying to reproduce your problem but I'm not sure I understand what the
actual error is now. Does the lvm stuff fail because you are out of memory, or
is turnOnFilesystems() failing because the filesystems are already on?
You're right about turnOnFilesystems() lacking a check to see if the filesystems
are active, but I'm not sure if the check is needed.
To explain my previous comment a bit - turnOnFilesystems() does the following:
if not anaconda.id.fsset.isActive():
if not anaconda.id.fsset.volumesCreated:
# This stuff doesn't happen in partitioningComplete()
formatSwap() and turnOnSwap(), at least, are idempotent (i.e. safe to run
multiple times). I'm not sure about the rest, especially doMetaDeletes() and
clearDevices(), but we're definitely running them twice.
clearDevices() is the real culprit here.
What is happening is that with low memory (256MB) swap is being committed and
activating LVM happens early. The LVM points exist in /dev/mapper during
software select and depsolving. Then it goes to turnOnFilesystems and crashes
with no traceback or dump because the devices are no longer in /dev/mapper to be
available for the format commands.
Becase of the early commit for swap, the turnOnFilesystems() has to check for
the devices already being in place, or els it wipes the devices and doesn't
re-activate the LVM volume groups again.
To repoduce the error, use a mchine with 256 of RAM and make a LVM using custom
layout (with swap not in the LV but say for example, as a partition 2) it will
activate the swap and continue with software selection and depsolving. during
this period, one can switch to VT02 and ls /dev/mapper to see that the LVM is
indeed there. Then it finishes depsolving and gets the confirminstall, and
*BOOM* the devices are no longer around to be formatted.
Hmmmm. what happens if I move the swap it's using into the LVM? Still likely
to crash. Main probllem is that /home is a non-LVM partition on the drive.
ok, the bug only occurs if the swap partition is *not* part of the LVM volume
group. I move the swap partition into the LVM and the installation proceeded
without a hitch.
Still a bug, now with a workaround. Note that the default tools generated
partition layouts won't trigger this, but a user/custom/legacy layout could see
it very easily.
Specifically: I moved the swap definition from a real partition (/dev/sda3) into
the LVM definition (/dev/VolumeGroup00/LogVol00) and the installation properly
formatted all the filesystems.
This *old* bug is back!
Anaconda/lvm doesn't have enough memory on my Dell to make the filesystems with
320MB of memory.
Attaching the dump.
Created attachment 160719 [details]
Anaconda dump of exception - date of rawhide 2007-08-03
FC8 series didn't show this bug, something may have fixed it.
But then I haven't tried F8 on the old Dell yet either.
closing as fixed long since in RAWHIDE