Bug 232862
Description
G.Wolfe Woodbury
2007-03-19 01:57:24 UTC
Created attachment 150349 [details]
Anaconda dump of exception - date of rawhide 2007-03-18
Created attachment 150814 [details]
Anaconda dump of exception - date of rawhide 2007-03-23
See attached exception report for todays rawhide failure. anaconda-11.2.0.40-1 With anaconda-11.2.0.41 I still get tracebacks. More relevant, however, is that I tried an install without customizing the layout - and it works! The bug is centered somewhere where the code is attempting to re-do the formatting of logical volumes. I suspect that my next install with doing a custom layout with a zeroed disk will work also. It might also possibly be in doing something with a "weird" name: I had a LV that was to be mounted on "/data2" and the disklabel for the volume showed up as only "data2" (no leading slash). I will experiment with this last possibility a bit in the next several installs. My problem with debugging this is that I still don't really understand the internals of the LVM code and utilities. The anaconda dump may or may not contain the failing vgchange command and arguments, which are: ("vgchange", "-ay", "-v") and I have no idea what that is supposed to do, but it *feels* like the best place on which to focus attention. Let me know if there are specific tests you'd like me to make. follow-up: wipeing the boot track doesn't prevent the bug. apparently, any major changes to the default layout (like resizing root and swap and adding separate /usr and /var LVs) result in the lvchange error occurring. I'm going to try a no-LVM layout and see what happens. Created attachment 152095 [details]
Anaconda dump of exception - date of rawhide 2007-04-09 try #1 (mod default)
Created attachment 152096 [details]
dump from try #2 (No LVM configured in layout)
Created attachment 152097 [details]
dump from try #3 (accept default)
Try #1 - wipe MBR and partition table, modify the LVM and attempt install. FAILED Try #2 - wipe MBR and partition table, generate custom NO-LVM layout and install FAILED Try #3 - wipe MBR and partition table, accept defaults and install FAILED Why is it trying to LVCHANGE a layout with NO LVM partitions? Will try a few more variations and report. I dragged out Knoppix and used QTParted to make my desired layout on the disc. Loaded up the boot.iso and used "custom layout" to assign the mount points. BOOM! same exception. WHY is anaconda calling lvm stuff at all? OK, I let anaconda (disk-druid) make its default config (wipe out Linux partitions and create default layout) and no other changes. It's now in the process of formatting and installing.... I'm tempted (but not ready to) look at the anaconda code and try to fathom the python stuff to see what is wrong. It has been too long since I've had to really look at code. Okay, rawhide of 2007-04-16 is still suffering from the bug. BUT! I know why the vgchange is being called - to "activate" the volume groups on the system. I also know WHY IT FAILS! reading the man page for vgchange shows the example: vgchange -a y to activate all volumes, however, the args for the anaconda call are: "-ay" "-v" The command to do should be: vgchange -a y That is: "vgchange","-a","y" Please fix the code! "-ay" is the same as "-a y" with the lvm tools and has been for a long time OK, I did verify that it works as written, but that merely means I don't know why it is failing. Are you seeing the problem at all, or am I at risk of a WORKSFORME disposition? Okay, as of 2007-04-23 rawhide, there is a major improvement in this situation. Things mostly work, with a glitch coming in if the names of the VG or LVs are changed. Non-standard names lead to a failure to format because the names are not properly activated (no VG name in /dev) Leaving the names standard allow the creation of different partitioning schemes than just the default. I am getting some problems also with non-LVM layouts. I suspect the problem lies not in anaconda *per se* bu perhaps in the LVM or Udev interactions. Created attachment 153728 [details]
current f7t4 exception dump
have to remote copy the bug attachments from target machine by hand as the
"save to remote" doesn't seem to work.
This bug is occurring in F7t4 i386 DVD spin too. "debugging" the exception and trying the command from the debugger results in a "cannot allocate memory" error memtest86+ doesn't find any errors in the 256MB of RAM on the work1.private machine. Free from the VT2 console shows ~3.4Mb free. will try activating swap before depsolving to see if OOM can be avoided. I'm seeing this same problem as in the original report. Mine is a kickstart installation on x86_64 system with 512MB. Can you try a rawhide tree after they start showing back up and let me know if this is still not working for you? We've done some recent changes around activating and deactivating LVM that I think should clear this issue up. Thanks. This problem seems to be fixed in the current rawhide, for me. I think that you should ignore my previous comment (#20). What happened was that when using the mirror at mirrorservice.org it gets beyond where I thought it failed before (with the unhandled exception), but the mirror is clearly incomplete (many messages about missing packages). During the "starting install" phase it fails not being able to open glibc-headers-2.5.90-21.x86_64.rpm. Using the mirror at ftp.funet.fi the install fails with an unhandled exception and LvmError: vgachange failed in the gui window. I can see right at the top of the F3 screen that there is a message: "ERROR: Running lvm: Cannot allocate memory". I guess I was mistaken about where the failure occurred, and based on the seemingly complete mirror at ftp.funet.fi, the install still fails with 512MB. Sorry for the confusion. Unfortunately the rawhide of 2007-05-13 doesn't clear the issue. Graphical mode with 256 MB RAM crashes always Text Mode crashes if too much is added to the mix (Gnome+KDE e.g.) Text mode works with default minus a few items I have couple of anacdumps here, but I'm not going to bother submitting them unless requested since they appear very similar to the onews already on file. Suggestion: if there is an existing swap partition, use it, then unswap before committing the partition changes then mkswap the new swap partition and use it before proceeding (In reply to comment #22) > Suggestion: > if there is an existing swap partition, use it, > then unswap before committing the partition changes > then mkswap the new swap partition and use it before proceeding That's not really going to help the problem since we'll still have the data around. It just changes things to be more likely to hit kernel bugs ;-) I've changed our thresholds for doing early partition commits (and thus early swap on) to be a little bit more aggressive on x86. x86_64 should already be in better shape due to making the change there a week or so ago. Well...Jeremy, isn't this bug report a fixed duplicate of mine (> 390 MB for x86_64 installation, > 256 MB for x86 installation)? When reading it looks like the same problem with the same error messages... Oh, it's the same bug in root cause. This one is for the ix86 architecture, and yours is for the _64 architecture. The upshot is that 256 MB is no longer sufficient for a graphical install on the x86 plain arch unless you don't select *any* additional packages, and even then it might bite you due to the uncertainties of the install process. (Deterministic? heck no.) I was beginning to think I was the only one seeing the bug for a while, but it has turned into a hard one to fix. Thursday's rawhide (2007-05-24) now has a different major failure mode in low memory situations. Anaconda is detecting the low memory condition and activating swap early, however as it goes to format the filesystems previously committed, it removes the LVM entries in /dev/mapper They were there, I looked at them with the VT02 window during software selection and depsolving! Then they were gone and the format command failed. Saturday's rawhide and RC2 are a no-go for me. The LVM entries are there during software selection and depsolving, but disappear when anaconda goes to format the filesystems. This is rawhide date 2007-05-26 anaconda-11.2.0.66-1 in packages.py:turnOnFilesystems() there is no check to see if the partitioning has been committed already and/or that swap was activated. It looks to me as if a check against anaconda.id.fsset.isActive() should be somewhere in there. it should/might be similar to the sequence found in partitioningComplete() in partitions.py I'm struggling cause I'm not too familiar with python (yet!) Do you have a log / crash dump from the more recent failures? I'm trying to reproduce your problem but I'm not sure I understand what the actual error is now. Does the lvm stuff fail because you are out of memory, or is turnOnFilesystems() failing because the filesystems are already on? You're right about turnOnFilesystems() lacking a check to see if the filesystems are active, but I'm not sure if the check is needed. To explain my previous comment a bit - turnOnFilesystems() does the following: anaconda.id.partitions.doMetaDeletes(anaconda.id.diskset) anaconda.id.diskset.clearDevices() anaconda.id.fsset.setActive(anaconda.id.diskset) if not anaconda.id.fsset.isActive(): anaconda.id.diskset.savePartitions () anaconda.id.fsset.checkBadblocks(anaconda.rootPath) if not anaconda.id.fsset.volumesCreated: anaconda.id.fsset.createLogicalVolumes(anaconda.rootPath) anaconda.id.fsset.formatSwap(anaconda.rootPath) anaconda.id.fsset.turnOnSwap(anaconda.rootPath) # This stuff doesn't happen in partitioningComplete() anaconda.id.fsset.makeFilesystems (anaconda.rootPath) anaconda.id.fsset.mountFilesystems (anaconda) formatSwap() and turnOnSwap(), at least, are idempotent (i.e. safe to run multiple times). I'm not sure about the rest, especially doMetaDeletes() and clearDevices(), but we're definitely running them twice. clearDevices() is the real culprit here. What is happening is that with low memory (256MB) swap is being committed and activating LVM happens early. The LVM points exist in /dev/mapper during software select and depsolving. Then it goes to turnOnFilesystems and crashes with no traceback or dump because the devices are no longer in /dev/mapper to be available for the format commands. Becase of the early commit for swap, the turnOnFilesystems() has to check for the devices already being in place, or els it wipes the devices and doesn't re-activate the LVM volume groups again. To repoduce the error, use a mchine with 256 of RAM and make a LVM using custom layout (with swap not in the LV but say for example, as a partition 2) it will activate the swap and continue with software selection and depsolving. during this period, one can switch to VT02 and ls /dev/mapper to see that the LVM is indeed there. Then it finishes depsolving and gets the confirminstall, and *BOOM* the devices are no longer around to be formatted. Hmmmm. what happens if I move the swap it's using into the LVM? Still likely to crash. Main probllem is that /home is a non-LVM partition on the drive. ok, the bug only occurs if the swap partition is *not* part of the LVM volume group. I move the swap partition into the LVM and the installation proceeded without a hitch. Still a bug, now with a workaround. Note that the default tools generated partition layouts won't trigger this, but a user/custom/legacy layout could see it very easily. Specifically: I moved the swap definition from a real partition (/dev/sda3) into the LVM definition (/dev/VolumeGroup00/LogVol00) and the installation properly formatted all the filesystems. This *old* bug is back! Anaconda/lvm doesn't have enough memory on my Dell to make the filesystems with 320MB of memory. Attaching the dump. Created attachment 160719 [details]
Anaconda dump of exception - date of rawhide 2007-08-03
FC8 series didn't show this bug, something may have fixed it. But then I haven't tried F8 on the old Dell yet either. closing as fixed long since in RAWHIDE |