Bug 224255 - FC6 fails to support IBM eServer xseries 300
Summary: FC6 fails to support IBM eServer xseries 300
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 6
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Konrad Rzeszutek
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-01-24 21:04 UTC by Amin Astaneh
Modified: 2007-11-30 22:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-03-23 20:40:54 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
lspci output with acpi=off passed to the bootloader (849 bytes, text/plain)
2007-02-12 19:32 UTC, Amin Astaneh
no flags Details
anaconda log from install with no additional kernel options (72.23 KB, text/plain)
2007-02-14 15:51 UTC, Amin Astaneh
no flags Details
anaconda log from install with apci=off (74.40 KB, text/plain)
2007-02-14 15:51 UTC, Amin Astaneh
no flags Details
dmesg output with normal kernel options (12.38 KB, text/plain)
2007-02-19 17:07 UTC, Amin Astaneh
no flags Details
dmesg output with acpi=off (12.38 KB, text/plain)
2007-02-19 17:09 UTC, Amin Astaneh
no flags Details
dmesg output with normal kernel options (12.43 KB, text/plain)
2007-02-19 17:11 UTC, Amin Astaneh
no flags Details
dmesg with kernel option "nolapic" (14.23 KB, text/plain)
2007-02-21 16:25 UTC, Amin Astaneh
no flags Details
contents of /proc/interrupts with "nolapic" option (467 bytes, text/plain)
2007-02-21 16:28 UTC, Amin Astaneh
no flags Details
contents of /proc/interrupts with no additional kernel options (358 bytes, text/plain)
2007-02-21 16:30 UTC, Amin Astaneh
no flags Details

Description Amin Astaneh 2007-01-24 21:04:38 UTC
Description of problem:
Fedora Core 6 does not support two sets of hardware on a IBM eserver xseries 300:
S3 Savage 4 Pro video on system board
Dual Intel 'e100' cards on system board.

Fedora Core 4 will properly identify hardware during install as well as
configure and use them at runtime, whereas FC6 will fail to identify the Savage
4 card and revert to curses install mode (Headless). The e100 cards will fail to
identify and no device operability is shown (ifconfig shows no devices
available, even with module installed).


Version-Release number of selected component (if applicable):
FC6 Install CD

How reproducible:
always

Steps to Reproduce:
1.Run the FC6 installer with no kernel options on a IBM eServer xseries 300
2.Note the installer revert to headless mode
3.Reboot after install, and attempt to configure the eth0 and eth1 devices with
proper modules loaded.
4. Repeat Steps 1-3 with FC4 disks and note differences.
3.
  
Actual results:
Anaconda starts in headless mode, unable to connect to network after install

Expected results:
Anaconda should be able to start in X mode, should be able to connect to network
after install

Additional info:

Comment 1 Konrad Rzeszutek 2007-01-24 21:37:52 UTC
How does rawhide work?

Comment 2 Amin Astaneh 2007-02-12 19:32:36 UTC
Created attachment 147932 [details]
lspci output with acpi=off passed to the bootloader

Comment 3 Amin Astaneh 2007-02-12 19:35:09 UTC
Also, running lspci in the shell without passing "acpi=off" to the kernel
returns nothing.


Comment 4 Amin Astaneh 2007-02-12 19:40:00 UTC
In reference to comments 2 and 3:

Disabling ACPI with the kernel option acpi=off allows anaconda's hardware
detection to work properly, identifying both the e100 and the savage4 hardware.
The above attachment shows lspci's results when ACPI is disabled. When run
without the option, lspci returns nothing. 

Comment 5 Konrad Rzeszutek 2007-02-12 19:49:42 UTC
Can you attach the anaconda.log file pls?


Comment 6 Amin Astaneh 2007-02-12 21:17:26 UTC
I was requested by sgrubb to complete an FC6 install on the machine using
acpi=off so that I was able to update the machine using yum. The update
completed, installing the current 2.6.19-1 kernel. Booting into the new kernel
without the boot option still results in the same problems, where the e100 cards
are not identified(ifconfig returns only lo) and lspci returns nothing. Booting
into the kernel with the option (acpi=off) still successfully allows the
hardware to work again, with lspci returning results identical to the attachment
above. In short, the current kernel fixes nothing.   

Comment 7 Amin Astaneh 2007-02-12 21:35:42 UTC
In reference to Comment 5 (request for anaconda log): I will have the results to
post from both installs (with and without kernel option acpi=off) on Wednesday.

Comment 8 Konrad Rzeszutek 2007-02-13 16:52:58 UTC
Did you try 'ifconfig -a' instead of just 'ifconfig'?


Comment 9 Amin Astaneh 2007-02-14 15:09:08 UTC
(In reply to comment #8)
> Did you try 'ifconfig -a' instead of just 'ifconfig'?
> 
Both. Nothing is returned except for loopback in both cases.


Comment 10 Amin Astaneh 2007-02-14 15:51:00 UTC
Created attachment 148055 [details]
anaconda log from install with no additional kernel options

Comment 11 Amin Astaneh 2007-02-14 15:51:48 UTC
Created attachment 148057 [details]
anaconda log from install with apci=off

Comment 12 Konrad Rzeszutek 2007-02-14 19:34:16 UTC
Two more items pls:

 - the output of 'lspci -vn'
 - and the output of running 'dmesg'.

Should have asked earlier but forgot, sorry.

Comment 13 Konrad Rzeszutek 2007-02-14 19:38:40 UTC
Just to summarize it with 'acpi=off' you can install/see devices without
trouble. It is only when don't specify anything that nothing is seen?

What version of the BIOS do you have? Do you have the latest version?

What happends when you use 'nolapic' as a bootup argument (without adding
'acpi=off') ?

Comment 14 Amin Astaneh 2007-02-19 17:06:13 UTC
(In reply to comment #12)
> Two more items pls:
> 
>  - the output of 'lspci -vn'
>  - and the output of running 'dmesg'.
> 
> Should have asked earlier but forgot, sorry.

lspci -vn returns nothing with normal options.
dmesg output of both modes attached below.


Comment 15 Amin Astaneh 2007-02-19 17:07:46 UTC
Created attachment 148337 [details]
dmesg output with normal kernel options

Comment 16 Amin Astaneh 2007-02-19 17:09:21 UTC
Created attachment 148338 [details]
dmesg output with acpi=off

Comment 17 Amin Astaneh 2007-02-19 17:11:10 UTC
Created attachment 148339 [details]
dmesg output with normal kernel options

Comment 18 Amin Astaneh 2007-02-19 17:22:04 UTC
(In reply to comment #13)
> Just to summarize it with 'acpi=off' you can install/see devices without
> trouble. It is only when don't specify anything that nothing is seen?

Correct.

 
> What version of the BIOS do you have? Do you have the latest version?

I have ABE117A. The current version is ABE120A. According to IBM website
http://www-304.ibm.com/jct01004c/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-39664&brandind=5000008
the update was made for MS certification testing. If necessary, I can update the
BIOS and see what changes. 
> What happends when you use 'nolapic' as a bootup argument (without adding
> 'acpi=off') ?
Success. That kernel option provides the same result as acpi=off, such that the
hardware works.


Comment 19 Konrad Rzeszutek 2007-02-20 17:38:15 UTC
I would be curious to see what the dmesg output is when you use 'lapic' as
acpi=off is the more "big hammer" approach, while lapic just turns off one part
of the ACPI initialization.

In regards to the BIOS, please upgrade it. Windows certification means "Vista
certification" and possibly updating the BIOS ACPI and MPT tables so that Vista
can run. In many ways Vista depends on the BIOS ACPI to configure much more
functionality, in a same manner as Linux depends on the BIOS to figure out if it
should use the LAPIC or the legacy interrupt controller.

So sorry for making you do more stuff, but can you do:
 1). include the dmesg when running with 'lapic'
 2). include the output from /proc/interrupts when running with 'lapic' and when
 running without 'lapic' 
 3). upgrade the BIOS and try to run the kernel without any bootup options.
 4). Go in the BIOS and see if there is a 'Run in compatible layer" or 'Run in
UNIX configuration'. Some of the older Compaq's had this (and you needed it to
run Linux on them) and I wonder if this older box might have something similar
to that. 

Thanks.

Comment 20 Konrad Rzeszutek 2007-02-20 17:39:31 UTC
Replace where I said 'Vista' with 'Windows Server 2003'. 

Comment 21 Amin Astaneh 2007-02-21 16:25:11 UTC
Created attachment 148497 [details]
dmesg with kernel option "nolapic"

Comment 22 Amin Astaneh 2007-02-21 16:28:23 UTC
Created attachment 148498 [details]
contents of /proc/interrupts with "nolapic" option

Comment 23 Amin Astaneh 2007-02-21 16:30:02 UTC
Created attachment 148500 [details]
contents of /proc/interrupts with no additional kernel options

Comment 24 Amin Astaneh 2007-02-21 19:47:00 UTC
I upgraded the BIOS to version ABE120A, which is the latest available on the IBM
website. When I did so, the kernel failed to boot, hanging on the message:
"agpgart: detected VIA Apollo Pro 133 Chipset".
This occurs regardless if I use boot option "acpi=off" or not.

This phenomena has been confirmed on other eserver xseries 300s when the BIOS
has been updated to the same version, when attempting to boot with Fedora 6.

Therefore, the BIOS update results in an unusable system.



Comment 25 Konrad Rzeszutek 2007-02-22 18:32:07 UTC
The work-around where you pass in 'lapic' and use the earlier BIOS version is
the safest bet right now.

Looking at the dmesg and the  Linux code I am not sure if the problem is with
BIOS ACPI code or the Linux ACPI interpreter being too anal.

I don't have this box here so I cannot do any debugging of this unless you are
willing to spend more time and collecting information from a debug kernel that I
can supply to you.

Please keep in mind that this might take months before a resolution is found,
and the resolution might be: use lapic b/c the BIOS code is busted and this box
is too old to roll a new BIOS.

Comment 26 Konrad Rzeszutek 2007-03-23 20:40:54 UTC
One thing that I neglected to mention is that having the 'nolapic' option does
not decrease the performance. The 'nolapic' skips initialization of the IRQ
lines and uses what the BIOS sets. In the case of the x300, the BIOS sets it to
use IO-APIC (which is good), and that is what the Linux kernel would use (and
not try to parse the BIOS LAPIC entries - which somehow after initializing are
missing the IRQs for a couple of devices). 

What would have been terrible was if you saw 'XT-PIC' in the /proc/interrupts.
That would be using the legacy PIC controller which is limited in its functionality.

Lastly, in comment #25, when reading please replace all 'lapic' strings to
'nolapic'. I think starring at too much code made me confuse the variables in
the kernel with the bootup arguments. Sorry about that confusion.

In summary, I am going to close this BZ as WORKSFORME since there is a
work-around (passing 'nolapic' as the bootup argument). If I get the hardware in
my hands I can take spend more time on  finding the culprit.



Note You need to log in before you can comment on or make changes to this bug.