Bug 470189

Summary: GUI install failed to start rx2660+Smart Array
Product: Red Hat Enterprise Linux 5 Reporter: masanari iida <masanari_iida>
Component: xorg-x11-drv-atiAssignee: Dave Airlie <airlied>
Status: CLOSED CURRENTRELEASE QA Contact: desktop-bugs <desktop-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: airlied, alanm, dchapman, rick.beldin, tao, xgl-maint
Target Milestone: rc   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-11 18:51:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Xorg.0.log with radeon driver.
none
Xorg.0.log with vesa driver
none
sosreport-kec35-483993-85dde0.tar.bz2 none

Description masanari iida 2008-11-06 08:34:29 UTC
Description of problem:
Installer Anaconda failed to open GUI (X Window) 
install screen with following configuration.

HP rx2660 (Itanium Server) + HP SmartArray P400

Version-Release number of selected component (if applicable):
RHEL 5.1 and RHEL 5.2 install CD-ROM (IA64)


How reproducible:
Always


Steps to Reproduce:
1. Start Installer.
2. Skip media check.
3. Installer start to detect Video Chip, ES1000(correct).
   And select "radeon" as a driver.
4. Installer try to start GUI (X Window)
  
Actual results:
The screen is Blank.
If I hit "ALT+F1", then text installer is waiting
on Virtual Console 1.


Expected results:
GUI installer start with Smart Array card.

Additional info:
If I install RHEL 5.0 on the same rx2660 + SA P400,
it succesfully start GUI installer.

If I install RHEL 5.0, 5.1, 5.2 on rx2660 _without_ 
SA P400 card, it sccesfully start GUI installer.

So the symptom depends on SA P400 card.

Even after the OS install (with text mode),
I can not start X Window with radeon driver with 
Smart Array.
The workaround is, modify the xorg.conf and change to 
VESA as a driver.

My question is, is there any changes on radeon driver
within RHEL5.0 installer and RHEL5.1 or 5.2 installer?

Comment 1 masanari iida 2008-11-10 09:16:53 UTC
Created attachment 323049 [details]
Xorg.0.log with radeon driver.

This log was collected after text mode install,
boot the system with runlevel 3 and issue "startx".

As you see, the X Window failed to start during 
radeon driver initialization. 
But the system was not freeze at the moment.
The kernel was working ok.

Comment 2 masanari iida 2008-11-10 09:20:31 UTC
Created attachment 323050 [details]
Xorg.0.log with vesa driver

This log was collected after text install the system,
boot with runlevel 3, and issue "startx".
Xorg.conf was modify to use "VESA" driver, so this time
the system could start X Window without problem.

Comment 3 masanari iida 2008-11-10 09:54:53 UTC
I have compared xorg-x11-drv-ati RPM version between 5.0 to 5.2.
These are a version of RPMs which were installed on the HD.

5.0  xorg-x11-drv-ati-6.6.3-3.2.el5
5.1  xorg-x11-drv-ati-6.6.3-3.2.el5
5.2  xorg-x11-drv-ati-6.6.3-3.13.el5

What I need to know is a difference between a radeon driver
on RHEL5.0 install CD and RHEL5.1 install CD.
Its not a RPM package, but the radeon driver version that 
anaconda uses.

Comment 4 masanari iida 2008-11-11 09:29:52 UTC
Checked out radeon_drv.so between 5.0 to 5.2.
The file was extracted from stage2.img within 
install CD-ROM disk1.  

Both 5.0 and 5.1 had a same md5 checksum.
5.2 has a different md5 checksum.

Comment 5 masanari iida 2008-11-13 02:48:29 UTC
IT #238402 was opened.

Comment 6 Doug Chapman 2008-11-14 21:06:24 UTC
I am working on reproducing this on my rx2660 however so far I have not seen this problem.  It is possible that this is related to the version of system firmware you are running.

Could you please go to the EFI shell and type "info fw" and attach that info here?

Comment 7 masanari iida 2008-11-17 01:48:49 UTC
These are firmware version of the affected rx2660.

System firmware 4.03
BMC firmware 5.23
MP firmware  F.02.17

Comment 8 Rick Beldin 2008-11-21 14:02:50 UTC
The affected system has: 

PCIe and PCI-X mixed riser card

Rev 03 of SA controller.  lspci output: 

04:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Controller (rev
03)

Another customer has rev 04, unclear what model of riser card.

Comment 9 Doug Chapman 2008-11-21 15:02:01 UTC
(In reply to comment #8)
> The affected system has: 
> 
> PCIe and PCI-X mixed riser card
> 
> Rev 03 of SA controller.  lspci output: 
> 
> 04:00.0 RAID bus controller: Hewlett-Packard Company Smart Array Controller
> (rev
> 03)
> 
> Another customer has rev 04, unclear what model of riser card.

If I remember correctly both of the systems showing the problem were using PCI slot 04:00.0 correct?

The systems I have access to do not reproduce the problem however I note 2 key differences:

1: the rev of the Smart Array cards I have access to are rev 01, I don't have any later cards

2: the PCI slot was (I think) 05:00.0, not 04:00.0.  On one system I moved the card around and was never able to get it to get it to show up as slot 4.

I think #2 is the key to reproducing the problem.  Granted there is no reason it _shouldn't_ work but being in a different slot evidently changes something that causes it to conflict with the VGA card.

Now.... question is why would this be different between systems.  I think we have enough data to rule out system firmware.  Is it possible we have a different rev of the motherboard that makes a difference?

We should compare rev info on some components via the FRU data.  This can be found from the MP under the "CM" sub-menu and entering DF.

Key items that might make be related (these are from a system that is working OK):

FRU NAME: System Board ID:00

CHASSIS INFO:
 Type:Rack Mount Chassis
 Part Number        : AB419-2101A
 Serial Number      : TWT4647256

BOARD INFO:
 Mfg Date/Time      : 2105376
 Manufacturer       : INVENTEC  
 Product Name       : 2 Socket System Board           
 S/N                : ME6BMK0146
 Part Number        : AB419-60001
 Fru File ID        : 10
 Custom Info        :         
 Custom Info        : 4642
 Custom Info        : B1
 Custom Info        : 0

PRODUCT INFO:
 Manufacturer       : hp
 Product Name       : server rx2660
 Part/Model         : AD245A
 Version            :       
 S/N                : USE4709BKP
 Asset Tag          :                                 
 FRU File ID        : 11
 Custom Info        : 601

-------------

FRU NAME: I/O Riser Board ID:01

CHASSIS INFO:

BOARD INFO:
 Mfg Date/Time      : 5793845
 Manufacturer       : INVENTEC  
 Product Name       : 3 Slot PCI-X/PCI-e IO Riser     
 S/N                : m36cmk0194
 Part Number        : AB419-60003
 Fru File ID        : 10
 Custom Info        :         
 Custom Info        : 4642
 Custom Info        : B1
 Custom Info        : 0

-------------

FRU NAME: PCIe Exp. Brd ID:04

CHASSIS INFO:

BOARD INFO:
 Mfg Date/Time      : 5711817
 Manufacturer       : INVENTEC  
 Product Name       : PCI-e Expansion Board           
 S/N                : MN6BMK0010
 Part Number        : AB419-60008
 Fru File ID        : 10
 Custom Info        :         
 Custom Info        : 4642
 Custom Info        : B1
 Custom Info        : 0

Comment 10 Rick Beldin 2008-11-25 20:02:21 UTC
We now have another site where the same thing is happening, except that this site doesn't have ANY smart array controllers.  lspci for the two systems affected are: 

 lspci:
 
 00:01.0 Class ff00: Hewlett-Packard Company RMP-3 (Remote Management Processor)
 00:01.1 Communication controller: Hewlett-Packard Company RMP-3 Shared Memory Driver
 00:01.2 Serial controller: Hewlett-Packard Company Diva Serial [GSP] Multiport UART
 00:02.0 USB Controller: NEC Corporation USB (rev 43)
 00:02.1 USB Controller: NEC Corporation USB (rev 43)
 00:02.2 USB Controller: NEC Corporation USB 2.0 (rev 04)
 00:03.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
 01:01.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01)
 01:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)
 01:02.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 10)
 02:00.0 PCI bridge: Hewlett-Packard Company PCIe Root Port
 05:00.0 PCI bridge: Hewlett-Packard Company PCIe Root Port

Comment 11 Alan Matsuoka 2008-12-17 15:25:31 UTC
Created attachment 327260 [details]
sosreport-kec35-483993-85dde0.tar.bz2

Comment 12 Dave Airlie 2008-12-18 06:43:10 UTC
can someone give the RHEL5.3 snapshots a try?

Comment 13 Rick Beldin 2008-12-19 14:29:43 UTC
I tried RHEL5.3 snapshot 2 a few weeks ago on a system in-house and it did NOT fail, however, this is no guarantee that 5.3 'fixed' it.   As Doug pointed out, he was unable to reproduce the problem on a system inside RH with 5.2. 

As of now I have two customers with the problem who both have Smart Array controllers and one customer who does NOT have Smart Array controllers. 

It would be nice if we could get a stack trace from xorg, however the Xserver
simply exits.  Comment 2 has a sample.

Only known workaround is to use VESA driver instead of radeon.   This does limit the quality of graphics and the visual depth and may not be suitable for all environments.   It should be noted that graphical installs are quite common in APJ region as they can more easily provide the localization of Asian characters and they are commonly used to manipulate the console for administrative purposes as well. 

There are extensive MP logs and sysreports in IT 250421.

Comment 14 Dave Airlie 2008-12-22 00:27:24 UTC
I suspect this is a problem we've fixed in 5.3 on IA64 systems, with userspace doing optimised memory operations on MMIO areas, however you need to be using at least Snapshot 6.

Surely if you have a broken machine where 5.0 or 5.2 always fails and you run the 5.3 snapshot 6 installer and it works then we've solved the issue, whether we can reproduce it in house or not.

I would suspect stacktraces are busted on IA64 for some reason, not really sure how we could attach gdb to the installer either.

Comment 15 masanari iida 2008-12-22 02:30:17 UTC
You wrote I need a snapshot 6 version of RHEL5.3 installer.
How do I know the snapshot version from the CD-ROM?
I have downloaded a RHEL5.3 beta from RHN, probably later than 
5th/Nov, 2008.

Once I confirmed a install CD currently with me is snapshot6 
or later, then I will send the CD-ROMs to the remote site.
Because, currently, I can reproduce the symptom only one 
system, which is located at remote site.

So, let me know how to find out the version on the cd-rom.

Comment 16 masanari iida 2008-12-22 03:38:24 UTC
On Disk1, .discinfo file has a timestamp 2008/10/21 10:35
In images directory, all .img files have 2008/10/21 8AM 
timestamp.

Comment 17 David Aquilina 2008-12-26 18:20:01 UTC
(In reply to comment #12)
> can someone give the RHEL5.3 snapshots a try?

We do not yet have a system in-house that reproduces this problem. 

Doug, if someone inside of HP has reproduced this problem (masanari_iida?) can you please help them get a copy of the latest snapshot?

Comment 18 Rick Beldin 2009-01-22 13:54:29 UTC
Ok,  I'll buy that this is fixed, but can someone provide a reference to what fix is meant by: 

problem we've fixed in 5.3 on IA64 systems, with userspace
doing optimised memory operations on MMIO areas, however you need to be using
at least Snapshot 6.

This comment was made in comment 14.

Having a solid reference will provide a measure of tranquility to the customers that experience this problem.

Thanks

Comment 20 Dave Airlie 2009-02-12 23:07:49 UTC
Rick,

The fix translates as IA64 is a badly designed architecture in that you cannot execute certain operations on the memory mapped IO regions, one of these operations is memcpy. We've rewritten parts of X for RHEL in 5.3 to avoid doing optimised memory access to video RAM.

This showed up on Altix but should apply to all IA64 machines. I'm not 100% sure it fixes your issue, but since we haven't a reproducer in-house it makes sense to make sure 5.3 is still broken before I proceed to spend time tracking it any further.

Dave.

Comment 21 Rick Beldin 2009-02-13 18:30:47 UTC
Ah, so the fixes are not in the kernel but in xorg?   Ok.

Comment 23 masanari iida 2009-03-10 06:50:02 UTC
Tested on the affected rx2660 with RHEL5.3 installed cd-rom,
and confirmed the GUI installer was started on the system.

And I have been using RHEL 5.2 with vesa driver on the rx2660.
I have updated xorg related RPMs to RHEL 5.3 and confirmed
the NEW xorg works with radeon driver.

Thanks for the fix.
You may close this call.