From Bugzilla Helper: User-Agent: Mozilla/5.0 Galeon/1.2.6 (X11; Linux i686; U;) Gecko/20020916 Description of problem: When installing using pxe & grub to deliver the kernel and initrd, the using http to deliver the install and the ks.cfg file, the install fails when downloading netstg1.img Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: This will be hard to setup the conditions so you'll have to trust my setup and debugging! 1. Setup a PXE system to deliver pxegrub and a grub config file that looks like:- timeout 0 title Start from network ifconfig --server=64.104.195.13 root(nd) kernel /rh8.0/vmlinuz ks=http://64.104.195.13/ks.cfg ksdevice=eth0 ramdisk_size=16000 initrd /rh8.0/initrd.img boot N.B 64.104.195.13 is where everything is held /tftpboot/rh8.0 is populated with the files from images/pxeboot and are tftp'ed across by grub. 2. Set the ks.cfg file to pull the install over http 3. Boot up machine through pxe Actual Results: Install fails when downloading netstg1.img with the simple message "Unable to retrieve the first install image" Expected Results: Install should continue through the kickstart process. Additional info: Further debugging:- The root problem is that the ramdisk is too small. The loader grabs the file /etc/ramfs.img from the initrd.img, uncomresses it and copies it to /dev/ram. The file is just a comressed empty fs. This uncompresses to 16Mb, yet the default ramdisk size on the installer kernel is only 8Mb. So it fails at a little over 8Mb (8387828 bytes). Why it is exactly 8Mb is unclear to me. This ramdisk is then mounted onto /tmp/ramfs and the netstg1.img is copied onto it from wherever the install files are coming from (http in my case). However netstg1.img is 8200192 bytes, so it runs off the end of the filesystem and fails. Why doesn't this happen on a normal network install (booting on a floppy for example)? For some reason in that case the ramfs.img copy doesn't fail until something after 9Mb, which is enough to contain the netstg1.img OK. Something in my setup is causing a difference in how much you can overrun the ramdisk by. The fact that you can overrun it at all may well be a kernel bug. In the process I have discovered a couple of bugs/limitations in loader/loader.c (part of the anaconda package) in the way they report (and fail to report) problems: 1) Why doesn't the failed ramfs.img copy stop the install? If it did the fact that the ramdisk is too small would have been spotted. 2) setupStage2Image() has a logMessage statment that tries to print the file size that has been copied, however it uses a %d to print an size_t and thus messes up the name of the file copied. I have included a patch to fix the above two problems, but note that it will break the install until the ramdisk size is fixed by either bumping the CONFIG_BLK_DEV_RAM_SIZE kernel parameter or providing a ramdisk_size parameter on the kernel command line. I have marked this a low severity because I have a work-around (increasing the ramdisk size), however I am concerned that others may find it and not know how to do the workaround - the errors hardly point to this being the problem!
Created attachment 81665 [details] Patch to fix the error reporting
It's not usually a problem because we change the ramdisk_size using the kernel parameter on the boot disks. Not including the ramdisk_size parameter when booting the kernel is considered an incorrect setup.
Please re-open this bug - you haven't read it properly! I can't figure out how to reopen it myself. First of all, I don't believe you are correct about the floppy boot having a ramdisk_size parameter - if there is, it is certainly not the 16Mb needed for the uncompressed /etc/ramfs.img file. Secondly regardless of the ramdisk stuff, the loader is not reporting the errors that it needs to. If you apply the patches to anaconda, you will see that on the floppy boot the ramfs.img copy fails after 9431040 bytes. Which luckily, in the case of the floppy, is big enough to hold the netstg1.img file later on. However I believe this is more by luck than anything else - this may well be an intermittent that will be difficult to find. You are relying on a corrupted ram filesystem! If you want to see it without doing all the patch stuff. Bring the install up to a point that you get a shell and try doing a dd on the ram fs:- dd if=/dev/ram of=/dev/null bs=1k I get 9216k - not 16000 If you can find a way of getting access to e2fsck (I mounted a copy I had on the harddisk), you get:- /tmp/mnt/sbin/e2fsck -n /dev/ram e2fsck 1.27 (8-Mar-2002) Warning! /dev/ram is mounted. The filesystem size (according to the superblock) is 16000 blocks The physical size of the device is 9216 blocks Either the superblock or the partition table is likely to be corrupt! Abort? no .... Please reconsider this bug report. Damian P.S. To make life easier for you I will attach a new bootnet.img floppy image that has the loader program patched. You can then try doing an http install with it and see it stop creating the ramdisk.
Created attachment 81845 [details] Floopy boot image with patched loader to demonstrate the problem
Ooops, I apologise, I should have looked at the syslinux.cfg file on the floppy. I see you do bump the ramdisk_size on the floppy - to 9216! So I guess you rely on the write failing and not getting near the end of the fs! So we're just left with a minore reporting bug, which I will file a new bug for. Duh! Sorry about that. Damian
Time tracking values updated