Bug 214050
Summary: | /proc/bus/pci/devices missing entry (was Xorg PCI scan misses video card) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Charles Butterfield <cb20777> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | urgent | Docs Contact: | |
Priority: | medium | ||
Version: | 6 | CC: | ajax, jim.cornette |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | 2.6.20-1.2944.fc6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-03-28 18:22:40 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Charles Butterfield
2006-11-05 05:19:06 UTC
Created attachment 140379 [details]
Zip file of key text files: (xorg.conf, Xorg.0.log, scanpci, Xorg-scanpci, strace, etc)
Also submitted to freedesktop.org as bug number 8894, with more recent clues. See: https://bugs.freedesktop.org/show_bug.cgi?id=8894 I could NOT seem to add that reference to this ticket via the "External Bugzilla References" sub-form herein, so I'm just doing it as a manual comment. Okay, based on a debugging trail (logged in the xfreedesktop bugzilla #8894), it has become apparent that this is really not an Xorg bug, but a bug in the /proc/bus/pci logic (kernel?). So I'm going to try to reassign this. Here is the AHA entry from my xfreedesktop bug report. ------------------------------------------------------------------ Hmmm. Looks like an OS issue. After stepping through the Xorg scan with gdb I noticed that we are stopping after scanning 14 PCI devices, although my video card is the 15th (and last). It turns out that there is a mismatch between the contents of /proc/bus/pci/devices (14 devices) and the nodes in /proc/bus/pci/xx/* (which number 15). The device missing from /proc/bus/pci/devices is /proc/bus/pci/00/06.0. Xorg is getting a count of PCI devices by counting the lines in /proc/bus/pci/devices (function xf86OSLinuxGetPciDevs in lnx_pci.c). Since this is missing one PCI device, the subsequent scan stops prematurely which is only a problem if you video device is the last one. Mine is. So there is clearly an OS problem, which I will try to figure out how to submit. Can anybody suggest where? For that matter, is anybody reading this stuff? Maybe a triage team? Some feedback would make me feel less lonely :-) I'm guessing at the assignment, changing the category did NOT change the assignment from original auto-generated value of X/OpenGL maintenance which seems to be quite wrong given the change in component. And I'm trying for a better Summary. Sorry for not doing this all in one change. I don't know if you have mc installed but you can do the lspci |grep VGA to get the video card pc id. Afterwards you can look into the devices file with f4 to view the file details. I have two video cards lspci |grep VGA 01:00.0 VGA compatible controller: nVidia Corporation NV5M64 [RIVA TNT2 Model 64/Model 64 Pro] (rev 15) 02:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200 (rev 01) See attachment for screenshot. I take it 0100 is the NV and 0200 is the Matrox card information. sorry for the interruption. Hopefully someone who knows what the heck they are doing will reply to this bug. Mention on the list should have added visibility. Created attachment 141216 [details]
viewing the devices file in /proc/bus/pci
You might find the F4 edit feature for mc useful to read content. Maybe!
Your bug seems to be far out of my knowledge base,
please attach output of dmesg Created attachment 141225 [details]
dmesg output after reboot with failing Xorg server
Here is the dmesg output.
Sorry for the delay. A few days ago I built a modified Xorg that just bumped
the device count by one as a totally crude workaround for the pci scan problem.
Thought it prudent to roll back to the nominal Xorg prior to generating your
dmesg listing (just it case it affected anything you were looking for).
It's way past bedtime, so goodnight :-)
I think the previously attached files clearly indicate a bug in the pci related processing for the procfs filesystem (as indicated by the fact that /proc/bus/pci/devices contains a different number of PCI devices than the /proc/bus/pci/xx/* device entries). 1) Is there any reason not to just report this upstream? That is,is there any reason to suspect this is some bug added by Fedora customization of the associated code? 2) Am I correct in assuming that the distro maintainers are the proper gatekeepers for submitting kernel bugs? If not, should I just go ahead and submit this stuff myself? Significatn Update - I finally figured out how download a vanilla 2.6.18.1 kernel from www.kernel.org and build it. The big surprise was that in this vanilla kernel, there is NO mismatch between /proc/bus/pci/devices and /proc/bus/pci/xx/*. Wow! So it seems like the Fedora kernel mods may well be the culprit. However at this point I'm totally unsure of how to proceed. There are are a tremendous number of differences between a vanilla 2.6.18.1 kernel and the Fedora 2.6.18-1.2849.fc6 kernel. Any suggestions on next steps? New conclusion: It appears the bug is associated with the CONFIG_EXPERIMENTAL flag in the stock 2.6.18 kernel. Details: I rebuilt the vanilla 2.6.18 kernel two ways: 1) With FC6 .config file (which sets EXPERIMENTAL=y) - this manifests the bug 2) With FC6 .config file, (BUT setting EXPERIMENTAL undefined) - no bug! So, its an upstream bug. Could somebody please suggest what the correct next step is? Try building with EXPERIMENTAL set but without the 82875 EDAC driver (should be CONFIG_EDAC_82875P). I suspect it's doing something unpleasant that ends up hiding it from /proc/bus/pci/devices. Or, build the kernel with this patch: http://people.freedesktop.org/~ajax/i82875p-edac-fix.patch This patch _is_ correct, but it doesn't appear to be in either the FC6 or rawhide kernels yet. It's in rawhide now, I'll poke Chuck to get it into FC6 updates too. Is somebody going to submit this patch upstream? My problem is resolved by FC6 kernel 2.6.20-1.2944.fc6. In this release the contents of /proc/bus/pci/devices and the nodes in /proc/bus/pci/xx/* agree in the number of devices (both indicate 15). The previous release (2.6.20-1.2933.fc6), did NOT resolve the problem, so thank-you to whoever fixed the problem between 2933 and 2944. I have no idea if ALL of the issues discussed on this list have been fixed. I suspect not, since there seem to be several different chunks of code that need to arrive at the same conclusion about what PCI devices exist, which is a recipe for future problems. At present, on my particular hardware configuration, the various code paths seem to be in agreement. Thanks again to all concerned! |