76541 – pxegrub and kickstart exposes bugs in loader-network

Bug 76541 - pxegrub and kickstart exposes bugs in loader-network

Summary: pxegrub and kickstart exposes bugs in loader-network

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	anaconda
Sub Component:
Version:	8.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	low
Target Milestone:	---
Assignee:	Jeremy Katz
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-10-23 02:34 UTC by Damian Ivereigh
Modified:	2007-04-18 16:47 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2002-10-24 01:25:09 UTC
Embargoed:

Attachments	(Terms of Use)
Patch to fix the error reporting (3.10 KB, patch) 2002-10-23 03:41 UTC, Damian Ivereigh	no flags	Details \| Diff
Floopy boot image with patched loader to demonstrate the problem (1.41 MB, application/octet-stream) 2002-10-24 01:24 UTC, Damian Ivereigh	no flags	Details
View All

Description Damian Ivereigh 2002-10-23 02:34:32 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 Galeon/1.2.6 (X11; Linux i686; U;) Gecko/20020916

Description of problem:
When installing using pxe & grub to deliver the kernel and initrd, the using
http to deliver the install and the ks.cfg file, the install fails when
downloading netstg1.img 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
This will be hard to setup the conditions so you'll have to trust my setup and
debugging!

1. Setup a PXE system to deliver pxegrub and a grub config file that looks like:-

timeout 0
title Start from network
        ifconfig --server=64.104.195.13
        root(nd)
        kernel /rh8.0/vmlinuz ks=http://64.104.195.13/ks.cfg ksdevice=eth0
ramdisk_size=16000
        initrd /rh8.0/initrd.img
        boot

N.B 64.104.195.13 is where everything is held

/tftpboot/rh8.0 is populated with the files from images/pxeboot and are tftp'ed
across by grub.

2. Set the ks.cfg file to pull the install over http

3. Boot up machine through pxe
	

Actual Results:  Install fails when downloading netstg1.img with the simple
message "Unable to retrieve the first install image"

Expected Results:  Install should continue through the kickstart process.

Additional info:

Further debugging:-

The root problem is that the ramdisk is too small. The loader grabs the file
/etc/ramfs.img from the initrd.img, uncomresses it and copies it to /dev/ram.
The file is just a comressed empty fs. This uncompresses to 16Mb, yet the
default ramdisk size on the installer kernel is only 8Mb. So it fails at a
little over 8Mb (8387828 bytes). Why it is exactly 8Mb is unclear to me. 

This ramdisk is then mounted onto /tmp/ramfs and the netstg1.img is copied onto
it from wherever the install files are coming from (http in my case). However
netstg1.img is 8200192 bytes, so it runs off the end of the filesystem and fails.

Why doesn't this happen on a normal network install (booting on a floppy for
example)? For some reason in that case the ramfs.img copy doesn't fail until
something after 9Mb, which is enough to contain the netstg1.img OK. Something in
my setup is causing a difference in how much you can overrun the ramdisk by. The
fact that you can overrun it at all may well be a kernel bug.

In the process I have discovered a couple of bugs/limitations in loader/loader.c
(part of the anaconda package) in the way they report (and fail to report) problems:

1) Why doesn't the failed ramfs.img copy stop the install? If it did the fact
that the ramdisk is too small would have been spotted.
2) setupStage2Image() has a logMessage statment that tries to print the file
size that has been copied, however it uses a %d to print an size_t and thus
messes up the name of the file copied.

I have included a patch to fix the above two problems, but note that it will
break the install until the ramdisk size is fixed by either bumping the
CONFIG_BLK_DEV_RAM_SIZE kernel parameter or providing a ramdisk_size parameter
on the kernel command line.

I have marked this a low severity because I have a work-around (increasing the
ramdisk size), however I am concerned that others may find it and not know how
to do the workaround - the errors hardly point to this being the problem!

Comment 1 Damian Ivereigh 2002-10-23 03:41:56 UTC

Created attachment 81665 [details]
Patch to fix the error reporting

Comment 2 Jeremy Katz 2002-10-23 16:07:57 UTC

It's not usually a problem because we change the ramdisk_size using the kernel
parameter on the boot disks.  Not including the ramdisk_size parameter when
booting the kernel is considered an incorrect setup.

Comment 3 Damian Ivereigh 2002-10-24 01:21:33 UTC

Please re-open this bug - you haven't read it properly!

I can't figure out how to reopen it myself.

First of all, I don't believe you are correct about the floppy boot having a
ramdisk_size parameter - if there is, it is certainly not the 16Mb needed for
the uncompressed /etc/ramfs.img file.

Secondly regardless of the ramdisk stuff, the loader is not reporting the errors
that it needs to. If you apply the patches to anaconda, you will see that on the
floppy boot the ramfs.img copy fails after 9431040 bytes. Which luckily, in the
case of the floppy, is big enough to hold the netstg1.img file later on. However
I believe this is more by luck than anything else - this may well be an
intermittent that will be difficult to find. You are relying on a corrupted ram
filesystem!

If you want to see it without doing all the patch stuff. Bring the install up to
a point that you get a shell and try doing a dd on the ram fs:-

dd if=/dev/ram of=/dev/null bs=1k

I get 9216k - not 16000

If you can find a way of getting access to e2fsck (I mounted a copy I had on the
harddisk), you get:-

/tmp/mnt/sbin/e2fsck -n /dev/ram
e2fsck 1.27 (8-Mar-2002)
Warning! /dev/ram is mounted.
The filesystem size (according to the superblock) is 16000 blocks
The physical size of the device is 9216 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort? no
....

Please reconsider this bug report.

Damian

P.S. To make life easier for you I will attach a new bootnet.img floppy image
that has the loader program patched. You can then try doing an http install with
it and see it stop creating the ramdisk.

Comment 4 Damian Ivereigh 2002-10-24 01:24:16 UTC

Created attachment 81845 [details]
Floopy boot image with patched loader to demonstrate the problem

Comment 5 Damian Ivereigh 2002-10-24 01:32:09 UTC

Ooops, I apologise, I should have looked at the syslinux.cfg file on the floppy.
I see you do bump the ramdisk_size on the floppy - to 9216! So I guess you rely
on the write failing and not getting near the end of the fs!

So we're just left with a minore reporting bug, which I will file a new bug for.

Duh!

Sorry about that.

Damian

Comment 6 Michael Fulbright 2002-12-20 17:38:25 UTC

Time tracking values updated

Note You need to log in before you can comment on or make changes to this bug.