Bug 191995

Summary: LTC23914-kernel panic starting install
Product: [Fedora] Fedora Reporter: IBM Bug Proxy <bugproxy>
Component: yabootAssignee: Paul Nasrat <nobody+pnasrat>
Status: CLOSED CANTFIX QA Contact:
Severity: urgent Docs Contact:
Priority: medium    
Version: 5CC: miyer
Target Milestone: ---   
Target Release: ---   
Hardware: powerpc   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-11-16 13:16:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description IBM Bug Proxy 2006-05-16 18:32:40 UTC
LTC Owner is: thinh.com
LTC Originator is: marksmit.com


Problem description:
kernel panic starting install. recreates with FC5 kernel through today's May15,
rawhide ppc64.img netboot image.
Does not recreate with Rhel4U3.
Recreates with/without "nodmraid" 
...
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
RAMDISK: Compressed image found at block 0
RAMDISK: ran out of compressed data
invalid compressed format (err=1)
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
Call Trace:
[C0000000030B3C30] [C00000000000ED40] .show_stack+0x68/0x1b0 (unreliable)
[C0000000030B3CD0] [C00000000005C680] .panic+0x84/0x218
[C0000000030B3D70] [C0000000003FCEE0] .mount_block_root+0x254/0x28c
[C0000000030B3E30] [C0000000003FD094] .prepare_namespace+0xf4/0x164
[C0000000030B3EC0] [C0000000000095B4] .init+0x2e8/0x3c0
[C0000000030B3F90] [C000000000025BF0] .kernel_thread+0x4c/0x68

P5 570 - floralp2.upt.austin.ibm.com
ethernet adapter is 1st/only boot device in the SMS bootlist.
to recreate, reboot lpar, hit '8' during sms menu to goto openfirmware prompt.
of> boot  <add any extra boot commands you want>

access: telnet sqh7lte.upt  (hscroot : abc123)
chsysstate -m fsp-flora --id 2 -o shutdown -r lpar --immed --restart
rmvterm -m fsp-flora --id 2  (note: that is two dashes in front of id)
mkvterm -m fsp-flora --id 2
~. <enter>  tilde, dot  to exit from vterm.
Lpar recreates in 2minutes.  I left it at the open firmware prompt, ready for a
recrate.  (first time just do the mkvterm line, no need for the chsysstate)
Awaiting suggestions on how to debug.

>RAMDISK: ran out of compressed data
>invalid compressed format (err=1)

this is the same as RHEL4 U3 Beta bug 20403 - RIT 85890
the patch is:
https://enterprise.redhat.com/issue-tracker/?module=download&fid=53180

on IBM LTC:
https://bugzilla.linux.ibm.com/show_bug.cgi?id=20403#c58


since this also recreates on FC5 as well as rawhide, I suspect this is not
upstream. ie. The version in rawhide today is yaboot-1.3.13-0.18.src.rpm from
March6,2006.  That is the same version as is in the March "gold" publish of FC5.

I tried going here but search does not find yaboot.c
http://sosdg.org/~coywolf/lxr/find?a=ppc&string=yaboot.c
So I assume the patch is made _not_ in the kernel, but only in the yaboot SRPM.

Comment 1 IBM Bug Proxy 2006-05-17 20:47:43 UTC
----- Additional Comments From marksmit.com  2006-05-17 16:49 EDT -------
As this problem was encountered on JS21 on RHEL4U3,
there appears to be a patch dropped into RHEL4U3.
yaboot-1.3.12/second/yaboot.c

What is unique about this lpar is that it 1) recreated for the first time on a
power5 systems, not JS21, and 2) there is an existing AIX install plus RHEL4
install on the disks for this lpar.   Whether or not that is pertinent, I do not
know, because I am not sure how far along in the netboot process we are when
meeting the kernel panic. 

Comment 2 Paul Nasrat 2006-05-17 22:05:19 UTC
Please refer to patches by name not by file - as it's impossible to guess what
you mean from the context.

If you have AIX installed has it changed load-base - can you boot into OF and
check that please.

Also please do not switch between a bug report on rawhide/Fedora development and
discussing RHEL (other than to point out regressions) as they are seperate bug
streams with different processes.  This bug was filed against Fedora please lets
keep it focussed.

Comment 3 Paul Nasrat 2006-05-17 22:06:35 UTC
Manoj - please can you try and replicate with a rawhide tree here.  

Comment 4 IBM Bug Proxy 2006-05-19 19:53:12 UTC
----- Additional Comments From thinh.com(prefers email via th2tran.com)  2006-05-19 15:57 EDT -------
the patch name is reduce-initrd-alloc-increments.patch
the link to the patch from RedHat issue tracker #85890: http://bugmail.austin.ibm.com/rhitLoginGetAttachFr.php?&bug_id=191995&fid=53180 

Comment 5 IBM Bug Proxy 2006-05-27 20:29:08 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |natlynch.com
            Version|FC5                         |Other




------- Additional Comments From marksmit.com  2006-05-27 00:22 EDT -------
> If you have AIX installed has it changed load-base - can you boot into OF and
> check that please.
I am unsure of the OF command syntax to discover this.  If you know it, please
append; else, I will ask around.

I have verified this is easily recreatable, after booting one of the Linux
installations and after attempting the AIX installation.  But in all our testing
, it only seems to recreate on this one power5 lpar.

I am changing the version on the LTC Bugzilla from FC5 to "other" to reflect
that this is found on Fedora-devel (rawhide).

The status remains the same.  This continues to recreate on today's May26
snapshot of Fedora-devel.

Here is a picture of the drives and bootable partitions on this lpar.
from SMS, the 1st boot device, used to network boot:
 7.        1      Ethernet
                  ( loc=U7879.001.0395562-P1-C1-T1 )
(1. = Virtual eth, 2-5, 7-10 are physical eth NICs, 6=IDE cdrom)

from SMS, the bootable disk devices 11,12,13:
11.        -      SCSI 73407 MB Harddisk, part=2 (AIX 5.3.0)
                  ( loc=U7879.001.0395562-P1-T14-L3-L0 )
12.        -      SCSI 73407 MB Harddisk, part=1 ()
                  ( loc=U7879.001.0395562-P1-T14-L4-L0 )
                  2.6.9-20 installation
13.        -      SCSI 73407 MB Harddisk, part=1 ()
                  ( loc=U7879.001.0395562-P1-T12-L3-L0 )
                  2.6.9-22 installation
And from open firmware: of> scsiinfo  all the disks visible:
1  U7879.001.0395562-P1-T14 /pci@800000020000001/pci@2/pci1069,b166@1/scsi@0
ID 3 - 73407 MB Disk drive (bootable) (AIX 5.3.0) (VIO server)
  (does not boot as floralp2 is not configured as a VIOS partition, but instead
   is an RPA lpar)
ID 4 - 73407 MB Disk drive (bootable) (2.6.9-20)

4  U7879.001.0395562-P1-T12 /pci@800000020000003/pci@2,2/pci1069,b166@1/scsi@0
ID 3 - 73407 MB Disk drive (bootable) (2.6.9-22)
ID 4 - 73407 MB Disk drive
ID 5 - 73407 MB Disk drive 

Comment 6 IBM Bug Proxy 2006-06-06 04:12:25 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|Other                       |devel




------- Additional Comments From marksmit.com  2006-06-06 00:14 EDT -------
I am changing the version to 'devel' to properly reflect that this recreates on
rawhide.

This still recreates on the one lpar: floralp2 using today's June5 version of
rawhide. 

Comment 7 IBM Bug Proxy 2006-08-30 23:56:02 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |REJECTED
         Resolution|                            |UNREPRODUCIBLE




------- Additional Comments From marksmit.com  2006-08-30 19:53 EDT -------
After loaning the victim system to Enterprise Linux testing, and getting it
back, I can no longer recreate this on floralp2 (vio server lpar, running
Linux). I will have to cancel this as unreproducible.  I tried to recreate using
several versions of OS that formerly all recreated. 

Comment 8 Paul Nasrat 2006-11-16 13:16:14 UTC
Closing Red Hat side based on comment #7