Description of problem: Sun Microsystems Inc. sells operton based servers (v20z/v40z) with Trident Microsystems Blade 3D PCI/AGP video controllers (see below for details). Error messagees are generated when X is started: console and /var/log/messages: mtrr: type mismatch for e5000000,800000 old: write-back new: write-combining Xorg*.log: (WW) TRIDENT(0): Failed to set up write-combining range (0xe5000000,0x800000) Note that this error message is disturbing to customers. There also appear to be instances where a real error may occur. Finally, these error messages may keep certain versions of the RHR video certification tests from succeeding. Please consider updating trident support in RHEL4 (and RHEL3 if possible) to resolve this issue. Version-Release number of selected component (if applicable): 01:05.0 VGA compatible controller: Trident Microsystems Blade 3D PCI/AGP (rev 3a) (prog-if 00 [VGA]) Subsystem: Newisys, Inc.: Unknown device 0020 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 Interrupt: pin A routed to IRQ 177 Region 0: Memory at e5000000 (32-bit, non-prefetchable) [size=8M] Region 1: Memory at e4100000 (32-bit, non-prefetchable) [size=128K] Region 2: Memory at e4800000 (32-bit, non-prefetchable) [size=8M] Capabilities: [80] AGP version 1.0 Status: RQ=33 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW- AGP3- Rate=x1,x2 Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none> Capabilities: [90] Power Management version 1 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00: 23 10 80 98 07 00 b0 02 3a 00 00 03 00 40 00 00 10: 00 00 00 e5 00 00 10 e4 00 00 80 e4 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 c2 17 20 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 0a 01 00 00 How reproducible: Start the X server, after automatic configuration by rhel.
>(WW) TRIDENT(0): Failed to set up write-combining range (0xe5000000,0x800000) [SNIP] >Please consider updating trident support in RHEL4 (and RHEL3 if possible) >to resolve this issue. This message is a "warning" and not an "error". There are many reasons why a user might see this warning in the server log file, many of which are system hardware limitations. It does not however have anything to do with the video driver. Please contact your Red Hat partner manager or technical support representative at 1-888-REDHAT1 for further assistance with this or any other issue. Global Support Services is the doorway for technical support for issues of this nature for Red Hat Enterprise Linux. Hope this helps, thanks.
There are two issues here: 1) there error messages have been seen with systems that lock-up - the X server has been suggested as the cause. 2) these error message causes redhat ready certification to fail. The issue appears to be with the underlying vga driver for the vga chipset, and a newer version of the driver supposedly resolves both these issues.
In order to troubleshoot this issue, we need you to follow the following steps and provide us with various data to assist in diagnosis: Please perform the following steps: 1) Update your system to the latest kernel and xorg-x11 packages that have been released as official updates for RHEL-4. 2) Ensure that you are not using any 3rd party kernel modules, or disable them from starting at or after bootup. 3) Reboot your system. Make sure it boots into the latest RHEL-4 kernel update we've released. 4) Run "system-config-display --reconfig" to generate a brand new X configuration from scratch. 5) Start the X server 6) Indicate in specific detail, the exact failure you experience, and the specific steps to reproduce the problem. If there is more than one type of failure, please file separate support requests with Red Hat GSS for each issue, so they can be investigated and resolved individually. We'll need you to attach the following files as individual uncompressed bugzilla file attachments using the link below: - X server log files ( /etc/X11/Xorg.0.log* ) - be sure to include the .old file - X server config file - The complete /var/log/messages from the last system boot onward - the output of "uname -a" - the output of "lsmod" - the output of "lspci -vvn" Assuming the problem is still reproduceable after supplying the above information, please try adding the following to the device section of your X server config file: Option "NoMTRR" After this, restart the X server and attempt to reproduce the problem again. Please attach the new X server log file (and .old one) and indicate if the problem still persists or not. Then try adding the following to the device section of the config: Option "noaccel" Restart again, and attach the log files from this invocation also, and indicate if the problem persists or not. Once you've tried these troubleshooting tips and supplied the requested information, we'll review it and attempt to diagnose the underlying cause of the problems. Please be very detailed in your explanation of what occurs, and how to reproduce it. Include the exact output of any error messages you see, or digital pictures of the screen if appropriate. Thanks in advance.
Setting status to "NEEDINFO", awaiting results of troubleshooting and file attachments.
ping
Thanks for the update - I am rerunning the test from bug 113533 (start and stop X server forever) on rhel4 update1 beta, to see if more debugging information can be captured on the lock-up.
Some test results: 1) the v40z test machine locked up after a day of running while true; do init 5 sleep 15 init 3 sleep 15 done Couldn't get in to HDT mode to get register dump, sysrq frozen, etc. 2) Created an artificial test script (attached) to try to create the lock-up (not sure if it's triggering the same error). It locks up within a few minutes, with standard xorg.conf, and with "Option NoMTRR". It did work for a few hours with "Option noaccel" also added. The request is that the X server configured by RHEL 4/3, not potentially lock up the system.
Created attachment 113687 [details] test shell script
Created attachment 113688 [details] xorg.conf file
When attaching "text" files to bugzilla, please select the "text/plain" mime-type, so that it the file attachment is viewable in any standard web browser. TIA
Several pieces of information requested above in comment #5 are still missing from the report. Please re-review comment #5 and attach the remainder of the information requested above. We do not have Trident video hardware available to attempt to reproduce locally and diagnose, so the requested information is critically important before we can proceed any further with diagnosis. Setting status back to "NEEDINFO" and awaiting attachment of remainder of information requested above. Thanks in advance.
Updating "Summary" to reflect the real symptoms.
Created attachment 113986 [details] uname, lsmod, messages, lspci, xorg.conf file Unfortunately the original test machine is gone. Here's information from a v20z (same trident chipset, exhibits same problem).
Please ensure all attachments are always attached as individual uncompressed file attachments that are web browser viewable. Thanks in advance.
(In reply to comment #2) > The issue appears to be with the underlying vga driver for the vga chipset, and > a newer version of the driver supposedly resolves both these issues. Could you clarify this part for me? The system default driver for trident, is the "trident" driver. There is a "vga" driver also, but it is for ancient 16 color 640x480 and lower standard VGA hardware from the early to mid '90's and we never use it by default for any hardware. The reason I seek clarification, is because there is a different bug reported against our Fedora Core 4 OS release, which is caused due to the X server module "libvgahw.a" being miscompiled. That bug is bug #161242, which affects a fairly large number of users with a variety of hardware, including Trident under Fedora Core 4 Xorg which is compiled with gcc4. While I dont believe that bug is related to the problem you're experiencing here, as RHEL4 is compiled with gcc3 and the problem in FC4 is gcc4 specific, I thought I would get you to confirm that by "vga driver for the vga chipset" you actually meant "trident driver for Trident chipsets".