Bug 219288 - RHEL5-B2: /proc/bus/pci/devices speaks LIES
RHEL5-B2: /proc/bus/pci/devices speaks LIES
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
i386 Linux
high Severity urgent
: ---
: ---
Assigned To: John Feeney
Brian Brock
:
: 219286 (view as bug list)
Depends On:
Blocks: 200812
  Show dependency treegraph
 
Reported: 2006-12-12 09:19 EST by Raghavendra Biligiri
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-01-15 09:21:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Attached the xorg.conf,Xorg.log and lspci output. (30.00 KB, application/x-tar)
2006-12-12 09:22 EST, Raghavendra Biligiri
no flags Details

  None (edit)
Description Raghavendra Biligiri 2006-12-12 09:19:09 EST
Description of problem:
On RHEL5-B2(kernel-2.6.18-1.2767.el5),X fails to come up with the ati driver.
X fails to come up even during the installation and after the installation.

After installation when we try to start X we get the following error message :

(EE) No devices detected.

Fatal server error:
no screens found
XIO:  fatal IO error 104 (Connection reset by peer) on X server ":0.0"
      after 0 requests (0 known processed) with 0 events remaining.


Version-Release number of selected component (if applicable):
RHEL5-B2(kernel-2.6.18-1.2767.el5)
xorg-x11-drv-ati-6.6.3-2.el5

How reproducible:


Steps to Reproduce:
1.Install RHEL5-B2(kernel-2.6.18-1.2767.el5) on PE700 or PE2650.
2.Try to start X
3.
  
Actual results:
X fails to come up.

Expected results:
X should start without any errors.

Additional info:
Attached the Xorg.log,xorg.conf and output of lspci
Comment 1 Raghavendra Biligiri 2006-12-12 09:22:32 EST
Created attachment 143391 [details]
Attached the xorg.conf,Xorg.log and lspci output.
Comment 2 Adam Jackson 2006-12-12 11:14:44 EST
*** Bug 219286 has been marked as a duplicate of this bug. ***
Comment 3 John Feeney 2006-12-14 14:05:54 EST
Per Release Criteria (section 1 Desktop point ac Xorg x11) Dell wants this to be
a blocker of 5.0.0. 

Does it need to be an exception?
Comment 5 Adam Jackson 2006-12-14 15:52:10 EST
Note that the X PCI scan and the lspci scan give different results...
Comment 7 Adam Jackson 2006-12-15 11:25:57 EST
Analysis from #212030, which appears to be an identical issue:

This appears to be the kernel's fault.  X is counting the number of PCI devices
by inspecting /proc/bus/pci/devices, same as lspci.  But it then scans down the
device trees in /proc/bus/pci/*/, and finds 16 devices!  Since the mach64
happens to be at the end of the list, we stop at 15, and miss the mach64.

Note the following entry in X's PCI scan:

(II) PCI: 00:06:0: chip 8086,257e card 0000,0000 rev 02 class 08,80,00 hdr 00

Which isn't visible in lspci or /proc/bus/pci/devices.  Seems like an odd one to
leave out...

Reassigning to kernel.
Comment 8 Adam Jackson 2006-12-15 16:15:10 EST
Related Fedora bug with some hints: bug #214050
Comment 9 Linda Wang 2006-12-18 17:07:20 EST
2746 has the "sort PCI device list breadth-first" patch.  So, this issue
is seen after that patch is applied.  

According to bug 212030, that issue is seen on 2714.  So, the pci ordering
patch didn't create the regression.  Can someone narrow down the 
issue with pci scan problem? 
Comment 10 Adam Jackson 2006-12-18 19:10:34 EST
As mentioned in bug #214050, this seems to only happen with CONFIG_EXPERIMENTAL.
 The device missing from lspci is the EDAC driver for that chipset, which we
only build the driver for when EXPERIMENTAL=y.  Sounds like a good place to
start looking.
Comment 11 Amit Bhutani 2006-12-20 05:14:56 EST
Based on analysis from comment #7, it appears that this could break (read: No X)
*any* system where the the video device shows up deep enough (>15) in the
/proc/bus/pci/*/ tree.

Bumping the severity of the issue based on that analysis. RH- Please mark as
Blocker for RHEL5.0 if not already marked that way.
Comment 12 Adam Jackson 2006-12-20 12:37:39 EST
It's not an issue of depth, it's an issue of miscounting.  The list of devices
visible in /proc/bus/pci/devices is not the same set as those visible through
/sys/bus/pci/devices.  One device in particular is consistently missing from
/proc/bus/pci/devices, and it's _not_ the VGA device.

It may be possible to work around this in X, but the correct fix is for the
kernel's filesystems to present a consistent view of the world.
Comment 13 Larry Troan 2006-12-20 16:33:10 EST
Per John Feeney, question whether this is a DUP of bug 212030? It may also be
related to Fedora bug 214050.
Comment 14 Adam Jackson 2006-12-20 17:33:47 EST
Probably to both.  All three bugs show exactly the same fault: 8086:257e at PCI
slot 0:6.0 missing from lspci but visible in /sys/bus/pci/devices, and X failing
to start because the device count is wrong.
Comment 15 Ken Reilly 2006-12-22 10:37:02 EST
After talking with John Feeney, he'll continue trying to isolate the problem(s)
are refine the scope/impact. If there isn't more information available on/before
January 5, 2007 we'll assess the impact of defering this bug to a later release. 
Comment 16 Stuart Hayes 2007-01-02 16:16:35 EST
It looks like i82875p_setup_overfl_dev() (in drivers/edac/i82875p_edac.c) is 
exposing a PCI device (part of the north bridge) that was hidden by the BIOS 
(at Intel's recommendataion), and calling pci_proc_attach_device(), which 
creates the /proc file specific to this device, but never calling 
pci_bus_add_device(), which adds this device to the global list pci_devices, 
which /proc/bus/pci/devices exposes.

I don't yet have a system to check this out on, though... this is all just 
based on my looking at the code--I could be missing something.
Comment 18 Larry Troan 2007-01-03 10:07:38 EST
BLOCKER: We at least need a workaround that will permit RHEL5 certification.

Comment 19 Stuart Hayes 2007-01-03 15:20:20 EST
What I said in comment #16 appears to be correct.  This patch fixed the 
problem--/proc/bus/pci/devices now shows device 8086:257e, and X windows 
started up.

--- linux-2.6.18.i386/drivers/edac/i82875p_edac.c	2006-09-19 
22:42:06.000000000 -0500
+++ linux-2.6.18.i386_dec18_2006/drivers/edac/i82875p_edac.c	2007-01-03 
06:37:22.000000000 -0600
@@ -297,6 +297,8 @@ static int i82875p_setup_overfl_dev(stru
 			       "device\n", __func__);
 		return 1;
 	}
+	pci_bus_add_device(dev);
+
 #endif  /* CONFIG_PROC_FS */
 	if (pci_enable_device(dev)) {
 		i82875p_printk(KERN_ERR, "%s(): Failed to enable overflow "
Comment 20 Jay Turner 2007-01-03 15:36:53 EST
QE ack for RHEL5.
Comment 23 John Feeney 2007-01-05 16:29:17 EST
An update: The patch provided by Dell works but the final solution as to how to
implement the change is being worked on given that this patch needs to be sent
upstream for approval. A discussion has been initiated with internal personnel
to  provide the best answer for RHEL-5 and upstream. It is not anticipated at
this time that this discussion should prevent this patch from being submitted to
rhkernel on time for RHEL-5. I have provided Stuart Hayes with details of the
discussion and asked for his input. Again, my thanks to Stuart for finding the
solution.
Comment 24 John Feeney 2007-01-08 15:54:13 EST
The patch was posted on rhkernel list for review and acceptance.
Comment 25 Jay Turner 2007-01-10 10:29:36 EST
Built into 2.6.18-1.3002.el5.
Comment 26 Amit Bhutani 2007-01-10 13:33:18 EST
rsync from kernel build page of Don Z has not picked up the 3002 build yet. Dell
will report results once that build has been made available to Dell.
Comment 28 Raghavendra Biligiri 2007-01-12 05:19:35 EST
This issue is not reproducible on the test kernel(kernel-2.6.18-1.3002.el5).
X comes up fine with the test kernel(kernel-2.6.18-1.3002.el5) on PE700.

Comment 29 Amit Bhutani 2007-01-12 09:00:08 EST
Moving to VERIFIED based on previous comment.
Comment 30 Jay Turner 2007-01-15 09:21:49 EST
kernel-2.6.18-1.3002.el5 included in 20070111.1 and 20070112.3.

Note You need to log in before you can comment on or make changes to this bug.