Bug 675685 - netboot fails on ppc64 - ramdisk is >32MB
netboot fails on ppc64 - ramdisk is >32MB
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: anaconda (Show other bugs)
6.1
ppc64 Linux
urgent Severity urgent
: beta
: 6.1
Assigned To: David Cantrell
Martin Banas
: TestBlocker
: 675288 (view as bug list)
Depends On: 670159
Blocks: 1006043
  Show dependency treegraph
 
Reported: 2011-02-07 05:36 EST by Martin Banas
Modified: 2013-09-12 04:16 EDT (History)
10 users (show)

See Also:
Fixed In Version: anaconda-13.21.96-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-05-19 08:37:34 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
console.log from RHEL6.1-20110206.n.0 (65.84 KB, text/plain)
2011-02-07 05:36 EST, Martin Banas
no flags Details
image.diff (93.92 KB, text/plain)
2011-02-08 10:59 EST, David Cantrell
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 69818 None None None Never

  None (edit)
Description Martin Banas 2011-02-07 05:36:38 EST
Created attachment 477392 [details]
console.log from RHEL6.1-20110206.n.0

Description of problem:
ramdisk.image.gz exceeded the 32MB size and netboot fails on ppc64.

Version-Release number of selected component (if applicable):
RHEL6.1-20110206.n.0
RHEL6.1-20110204.n.0
RHEL6.1-20110203.n.0 

How reproducible:
always

Steps to Reproduce:
1. Start beaker job on ppc64 with latest nightly build
  
Actual results:
	!EA017021 ! 
    of_net_open - result: 0 
<-- of_net_open - FILE_ERR_OK 
    yaboot_text_ui - file->buffer 2100000 -> 4100000 (2000000) 
    yaboot_text_ui - vmlinux 4100000 -> 5700000 (1600000) 
    yaboot_text_ui - initrd 0 -> 0 (0) 
    yaboot_text_ui - Looks like we do not need to move the kernel 
--> of_close 
    of_close - <@01b36000> 
    of_close - of_close called 
<-- of_close - 0 
ramdisk load failed ! 
ENTER called ok 
0 > 


Expected results:
ramdisk is loaded without problems.

Additional info:
The sizes of latest ramdisks are:
RHEL6.1-20110206.n.0 - 34566900
RHEL6.1-20110204.n.0 - 34512863
RHEL6.1-20110203.n.0 - 34533430
Latest working was:
RHEL6.1-20110202.n.0 - 33088046

The limitation is somewhere about 32MB, which is 33554432.

similar bug is filled on Fedora:
https://bugzilla.redhat.com/show_bug.cgi?id=653986
Comment 1 Jiri Skala 2011-02-07 08:04:34 EST
*** Bug 675288 has been marked as a duplicate of this bug. ***
Comment 2 Martin Sivák 2011-02-07 08:21:44 EST
I went through the content of our initrd and found about 300KB of files which might (but it is not sure yet) be possible to remove. The space saved by this won't be enough as we are over the limit by about 1MB at the moment.

Just for the info - the locales in stage1 are already stripped down to bare minimum (en_US.utf8).
Comment 4 Chris Lumens 2011-02-07 10:15:33 EST
And unfortunately, since everything continues to grow over time, just removing something isn't really a sustainable fix.  There's going to come a day in RHEL6 where we simply cannot remove anything else and keep the same level of functionality in anaconda.  On that day, what's going to be the real answer?
Comment 5 David Cantrell 2011-02-08 10:26:37 EST
Based on comment #3, the last working ppc64 tree was 20110202.n.0.  The
ramdisk.image.gz file was exactly 32MB in that tree.  In the current nightly
tree, the ramdisk.image.gz file has increased to 33MB.

Unpacking the trees, here's what I see:
20110202.n.0 tree - 73548k unpacked ramdisk.image.gz
20110208.n.0 tree - 75664k unpacked ramdisk.image.gz

Gathering more details.
Comment 6 David Cantrell 2011-02-08 10:58:35 EST
Appears the most recent growth in the image has been for driver updates, new firmware files, and the iscsi userland tools.

Attaching a diff of the 20110202.n.0 image with the 20110208.n.0 image.
Comment 7 David Cantrell 2011-02-08 10:59:04 EST
Created attachment 477640 [details]
image.diff
Comment 8 IBM Bug Proxy 2011-02-08 13:45:03 EST
------- Comment From mjwolf@us.ibm.com 2011-02-08 13:33 EDT-------
for the short term can you use lzma compression instead of gzip.  The ramdisk will be smaller and the kernel can still deal with it.
Comment 9 David Cantrell 2011-02-08 18:00:28 EST
(In reply to comment #8)
> ------- Comment From mjwolf@us.ibm.com 2011-02-08 13:33 EDT-------
> for the short term can you use lzma compression instead of gzip.  The ramdisk
> will be smaller and the kernel can still deal with it.

We are working on this on the master branch, but changing the compression format used for the ramdisk image affects a number of other components.  Ideally it's fine, but we need to check all of those to ensure we don't introduce another problem.

For now, the fix we have removes some kernel modules and firmware files from ramdisk.image.gz so that it's below 32MB.  We are removing the following subdirectories from /lib/modules:

    firewire
    pcmcia
    sound
    wireless

And from /lib/firmware, we are removing the following subdirectories:

    matrox
    r128
    radeon
    zd1211

The test compose we just did brings us to a 31MB ramdisk.image.gz.
Comment 10 IBM Bug Proxy 2011-02-08 18:01:07 EST
------- Comment From mjwolf@us.ibm.com 2011-02-08 17:56 EDT-------
ok verified that I can transfer more than 32MB.  Please do the following steps
and let me know what happens

boot and enter the SMS menus
type '0'  then 'y'    you should now be at the open firmware prompt.
setenv real-base c00000
dev /packages/gui obe      //be ready right away to type '1' and go back into
the SMS menus
select "Setup Remote IPL (Initial Program Load)"
select appropriate network device
select "IPv4 - Address Format 123.231.111.222"
select "BOOTP"
select "Advanced Setup: BOOTP"
change the Bootp Blocksize from 512 to 1024
select "M" and go to main menu and try the network install again

again I would also recommend using the lzma compression.  For the rhel6 ramdisk
it changed the file size significantly
-r--r--r--.  1 root root  24041544 Feb  8 15:33 ramdisk.image.lzma
-r--r--r--  1 root root 31319913 Oct 27 10:01 ramdisk.image.gz
Comment 11 David Cantrell 2011-02-08 19:58:05 EST
We're not opposed to lzma for ramdisk.image, we're just not going to throw that in right now because all of the tree composition and booting tools that rely on the initrd being named ramdisk.image.gz and a gzip file all need testing to ensure that everything will still work if it's a .xz file.  We just don't have the time for that for this release.
Comment 13 IBM Bug Proxy 2011-02-09 00:41:02 EST
------- Comment From tonyb@au1.ibm.com 2011-02-09 00:34 EDT-------
There are 2 problems causing this failure.
1. Firmware will only TFTP in 64k packets of block-size.  The default
block-size is 512bytes.  This limits the transfer to 32Mb changing the TFTP
blocksize is documented in RFC 2348.  I don't know how many TFTP serves in
production support this.  Certainly some do.
Mike has shown how to do it for us.
2. (RedHat's) Yaboot has a default transfer buffer #defined to 32Mb.  Exceeding
this size in the TFTP transfer should fail.  32Mb was chosen as a default
maximum size that will work on systems with 128Mb RMA
Going forward there are several options:
1. Remove un-need data from the ramdisk to keep it's size under 32Mb
Providing this is done only to the ramdisk that's used for netbooting that
should create minimal breakage (if any).
2. Switch to LZMA compression, again to keep the size under 32Mb

Both of these options require changes to the Redhat build/test process but
shoudl allow 6.1 to get out the door while we discuss a more long term strategy

3. Change yaboot to use the initrd-size option in yaboot.conf to modify the
size of the buffer allocated for the TFTP load.  This will also need to
change the blocksize for the TFTP request.  Also this will need backporting
several changes from upstream to RHEL's yaboot.  Infact it may be easier to
rebase yaboot completely on upstream
4. Change yaboot to request a minumum RMA for the LPAR of 256Mb, and increase
the default TFTP buffer size.  I believe that there is a chance, should the
kernel specify a differrnt value, to get into an infinite boot loop.  This
option will need to be very well tested, and we'd need to talk to the FW
team.

One drawback to approaches 3 and 4 is that almost certainly some portion of the
change to yaboot will not be acceptable upstream so Redhat would need to accept
the risk of shipping a version of yaboot with code that isn't upstream,

We need to think carefully about the pro and cons and of course the timeline for RHEL 6.1
Comment 16 Martin Banas 2011-02-28 07:48:55 EST
Retested on build RHEL6.1-20110224.2, anaconda-13.21.100-1.
-rw-r--r--. 2 root root 32961917 2011-02-25 07:02 ramdisk.image.gz

Moving to VERIFIED.
Comment 17 errata-xmlrpc 2011-05-19 08:37:34 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0530.html

Note You need to log in before you can comment on or make changes to this bug.