Red Hat Bugzilla – Bug 76541
pxegrub and kickstart exposes bugs in loader-network
Last modified: 2007-04-18 12:47:53 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 Galeon/1.2.6 (X11; Linux i686; U;) Gecko/20020916
Description of problem:
When installing using pxe & grub to deliver the kernel and initrd, the using
http to deliver the install and the ks.cfg file, the install fails when
Version-Release number of selected component (if applicable):
Steps to Reproduce:
This will be hard to setup the conditions so you'll have to trust my setup and
1. Setup a PXE system to deliver pxegrub and a grub config file that looks like:-
title Start from network
kernel /rh8.0/vmlinuz ks=http://126.96.36.199/ks.cfg ksdevice=eth0
N.B 188.8.131.52 is where everything is held
/tftpboot/rh8.0 is populated with the files from images/pxeboot and are tftp'ed
across by grub.
2. Set the ks.cfg file to pull the install over http
3. Boot up machine through pxe
Actual Results: Install fails when downloading netstg1.img with the simple
message "Unable to retrieve the first install image"
Expected Results: Install should continue through the kickstart process.
The root problem is that the ramdisk is too small. The loader grabs the file
/etc/ramfs.img from the initrd.img, uncomresses it and copies it to /dev/ram.
The file is just a comressed empty fs. This uncompresses to 16Mb, yet the
default ramdisk size on the installer kernel is only 8Mb. So it fails at a
little over 8Mb (8387828 bytes). Why it is exactly 8Mb is unclear to me.
This ramdisk is then mounted onto /tmp/ramfs and the netstg1.img is copied onto
it from wherever the install files are coming from (http in my case). However
netstg1.img is 8200192 bytes, so it runs off the end of the filesystem and fails.
Why doesn't this happen on a normal network install (booting on a floppy for
example)? For some reason in that case the ramfs.img copy doesn't fail until
something after 9Mb, which is enough to contain the netstg1.img OK. Something in
my setup is causing a difference in how much you can overrun the ramdisk by. The
fact that you can overrun it at all may well be a kernel bug.
In the process I have discovered a couple of bugs/limitations in loader/loader.c
(part of the anaconda package) in the way they report (and fail to report) problems:
1) Why doesn't the failed ramfs.img copy stop the install? If it did the fact
that the ramdisk is too small would have been spotted.
2) setupStage2Image() has a logMessage statment that tries to print the file
size that has been copied, however it uses a %d to print an size_t and thus
messes up the name of the file copied.
I have included a patch to fix the above two problems, but note that it will
break the install until the ramdisk size is fixed by either bumping the
CONFIG_BLK_DEV_RAM_SIZE kernel parameter or providing a ramdisk_size parameter
on the kernel command line.
I have marked this a low severity because I have a work-around (increasing the
ramdisk size), however I am concerned that others may find it and not know how
to do the workaround - the errors hardly point to this being the problem!
Created attachment 81665 [details]
Patch to fix the error reporting
It's not usually a problem because we change the ramdisk_size using the kernel
parameter on the boot disks. Not including the ramdisk_size parameter when
booting the kernel is considered an incorrect setup.
Please re-open this bug - you haven't read it properly!
I can't figure out how to reopen it myself.
First of all, I don't believe you are correct about the floppy boot having a
ramdisk_size parameter - if there is, it is certainly not the 16Mb needed for
the uncompressed /etc/ramfs.img file.
Secondly regardless of the ramdisk stuff, the loader is not reporting the errors
that it needs to. If you apply the patches to anaconda, you will see that on the
floppy boot the ramfs.img copy fails after 9431040 bytes. Which luckily, in the
case of the floppy, is big enough to hold the netstg1.img file later on. However
I believe this is more by luck than anything else - this may well be an
intermittent that will be difficult to find. You are relying on a corrupted ram
If you want to see it without doing all the patch stuff. Bring the install up to
a point that you get a shell and try doing a dd on the ram fs:-
dd if=/dev/ram of=/dev/null bs=1k
I get 9216k - not 16000
If you can find a way of getting access to e2fsck (I mounted a copy I had on the
harddisk), you get:-
/tmp/mnt/sbin/e2fsck -n /dev/ram
e2fsck 1.27 (8-Mar-2002)
Warning! /dev/ram is mounted.
The filesystem size (according to the superblock) is 16000 blocks
The physical size of the device is 9216 blocks
Either the superblock or the partition table is likely to be corrupt!
Please reconsider this bug report.
P.S. To make life easier for you I will attach a new bootnet.img floppy image
that has the loader program patched. You can then try doing an http install with
it and see it stop creating the ramdisk.
Created attachment 81845 [details]
Floopy boot image with patched loader to demonstrate the problem
Ooops, I apologise, I should have looked at the syslinux.cfg file on the floppy.
I see you do bump the ramdisk_size on the floppy - to 9216! So I guess you rely
on the write failing and not getting near the end of the fs!
So we're just left with a minore reporting bug, which I will file a new bug for.
Sorry about that.
Time tracking values updated