Bug 531111

Summary: REGRESSION : vmlinuz-2.6.31.5-96.fc12.i686 - intel 855GM xorg freezing
Product: [Fedora] Fedora Reporter: Tony White <twhite>
Component: xorg-x11-drv-intelAssignee: Adam Jackson <ajax>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 12CC: ajax, awilliam, dkrawchuk, dougsland, gansalmon, imc, iny, itamar, itreppert, kernel-maint, mcepl, xgl-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard: card_852GM/855GM
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-04 03:40:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg from the attachment 366900
none
lspci.txt from the attachment 366900
none
messages.txt from the attachment 366900
none
Xorg.0.log from the attachment 366900
none
xlock + Xorg backtraces
none
syslog saying that i915/0 and Xorg are hung none

Description Tony White 2009-10-26 22:26:32 UTC
Description of problem:
vmlinuz-2.6.31.1-56.fc12.i686 was fine. No freezing experienced.
With vmlinuz-2.6.31.5-96.fc12.i686 the x server has freezed the kernel three times in one and a half hours today. All I was doing was surfing the web.
It renders the desktop unusable and magic sysrq key combinations do not result in anything.
All I can do is hold down the power key and force the power off.

Version-Release number of selected component (if applicable):
vmlinuz-2.6.31.5-96.fc12.i686

How reproducible:
Browse the web.

Steps to Reproduce:
1. Open web browser
2. Surf web
3.
  
Actual results:
kde completely frozen. Mouse moves. Nothing on the desktop (Taskbar, widgets) Including any open windows respond to activities such as mouse clicks or task switching.

Expected results:
No freeze or crash as vmlinuz-2.6.31.1-56.fc12.i686 provides.

Additional info:
I believe you'll find that the intel linux-next patches for the i915 driver have been merged into vmlinuz-2.6.31.5-96.fc12.i686.
I strongly recommend to regress any difference between vmlinuz-2.6.31.1-56.fc12.i686 and vmlinuz-2.6.31.5-96.fc12.i686 in the i915 kernel driver.
The patches that lead up to the current stable mainline 2.6.31.4 from intel to the i915 driver are completely unusable here with an intel 855GM.
The intel developers refuse to acknowledge or are unwilling to test the problem.

Reporting upstream resulted in something like :
Them : We've fixed something similar, here try this patch.
Me : No, that didn't work. It was worse! Don't commit that to mainline.
Them : Please open a new report for your problem.
(Regression made worse patch commited upstream by intel xorg developers.)

I suppose now I need to try to debug the mess they've created from a perfectly functioning driver. If I can, I will. Although I fail to see why they are being paid to develop a driver for hardware that they are not testing said driver on.
Intel xorg. Not impressed.

Comment 1 Adam Williamson 2009-11-09 01:46:14 UTC
*** Bug 532022 has been marked as a duplicate of this bug. ***

Comment 2 Adam Williamson 2009-11-09 01:47:42 UTC
For now we tend to keep bugs like this assigned to the X driver package, even though they're technically in the kernel; it's easier for the devs to track that way.

There's Xorg and dmesg logs with nasty segments in the dupe bug, Adam.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 3 Matěj Cepl 2009-11-09 10:54:44 UTC
(In reply to comment #2)
> There's Xorg and dmesg logs with nasty segments in the dupe bug, Adam.

dmesg thing is attachment 367584 [details]

Comment 4 Matěj Cepl 2009-11-09 10:57:09 UTC
Created attachment 368176 [details]
dmesg from the attachment 366900 [details]

Comment 5 Matěj Cepl 2009-11-09 10:57:19 UTC
Created attachment 368178 [details]
lspci.txt from the attachment 366900 [details]

Comment 6 Matěj Cepl 2009-11-09 10:58:04 UTC
Created attachment 368179 [details]
messages.txt from the attachment 366900 [details]

Comment 7 Matěj Cepl 2009-11-09 10:58:15 UTC
Created attachment 368180 [details]
Xorg.0.log from the attachment 366900 [details]

Comment 8 Tony White 2009-11-09 14:39:14 UTC
@Adam
I've been trying to do this to get a back trace :
http://forums.fedoraforum.org/showthread.php?t=233504
however, even after typerlc's suggestion the only feedback I get is :
Waiting for the xserver to accept incomming connections..
..
..
..
..
Which loops and at that point I can only ctrl + alt + del to stop it looping like that. Which reboots.
Any ideas?

Comment 9 Adam Williamson 2009-11-09 17:50:17 UTC
disregarding typerlc's comment and script, can you try working from the upstream info?

http://www.x.org/wiki/Development/Documentation/ServerDebugging

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 10 Tony White 2009-11-09 20:48:49 UTC
@Adam
I am. Please see section "Debugging with one machine Version 1."
I don't have another machine and a serial cable to trap the trace over ssh with.

In fedora /usr/X11R6/bin/X does not exist. In fedora /usr/bin/X is a symlink to /usr/bin/Xorg.
The only step I avoided was ln -sf /tmp/Xdbg /usr/X11R6/bin/X
Which was where I did sudo cp Xdbg /usr/bin/Xorg instead.
I can't be sure that making that symlink to the same file in /tmp instead is much different.

Please, pointing people to that entry on the xorg wiki and from the fedora wiki is going to put so many people off trying to get stack traces from the xserver. It's put me off wanting to try to do it and I can be bullishly persistent but I still actually want to try solve this issue. I really need clear and correct instructions on how to create a stack trace.

What's bothering me slightly is the fact that this is a kernel driver issue, so a stack trace from the xserver may even turn out to provide nothing useful. I live in hope.

If I really, really have no other option. I will go beg/borrow/steal a serial cable and spend a day trying to debug it over ssh but that is going to be a massive pain in the ass and there is a lot of room for failure because I do not understand gdb. I'll try if I find the time.

Comment 11 Adam Williamson 2009-11-09 21:02:16 UTC
the difference between 'kernel driver' and 'X driver' is much smaller than it used to be.

I know the current instructions aren't ideal, but I personally haven't had a HUGE amount of time over the last week to write better ones :/


matej, can you perhaps see what Tony is doing wrong or give him some better instructions?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 12 Tony White 2009-11-09 21:29:18 UTC
Thanks Adam. I reeeeeeeally wanna try to help to fix this. :)

Comment 13 Tony White 2009-11-12 16:52:20 UTC
More on this. With kernel 2.6.32-rc6 from kernel.org compiled with just a make oldconfig, a no or module to any new stuff in there, xorg no longer freezes the kernel but I get like a ghost display.
Meaning :
the kde desktop loads and I can click and launch apps from any taskbar shortcut but nothing appears on screen. An app does not appear to have been launched but I see the disk activity led doing it usual bit as if the app has been launched. Then I move the mouse cursor around the screen and the cursor changes to the hand or I beam in certain places. I click on the taskbar where I would normally see the task bar entry for the running app (but do not) And mouse click it as if to minimise the window. Nothing happens on screen. I move the mouse to the places where the cursor changed to hand and I beam before on screen. Nothing. The cursor does not change. I then click again on the task bar to maximise this hidden app that I cannot see and move the mouse. The mouse cursor changes to hand and I beam in exactly the same places.
So the xserver is taking input but only refreshing the display for the mouse.
So it only looks like it has crashed. It just not working properly.

I think a video would be helpful. I'll see what I can do.

Anyway the woes of the 855GM continue with the xorg-evdev update which now makes the xserver default to 800x600 for this 1024x768 graphics chip. The external display resolution (VGA output) Is completely wrong also.

The intel 855GM needs to default to 1024 x 768 resolution for both the primary (LVDS) And the secondary (VGA) Displays. The 855GM cannot do more than 2x 1024 x 768. It can do 1280 x 1024 on the secondary display but only if the primary display is disabled. The windows driver for this chip forbids 1280 x 1024 but calls the default 1024 x 768 when the external monitor display is activated - "Intel mirror technology" Or something.
Anyway, an 800 x 600 default on a laptop screen that is 1024 x 768 is completely wrong and the 1024 x 768 display option has completely disappeared from system-config-display.

Comment 14 Bug Zapper 2009-11-16 14:22:58 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 15 Adam Williamson 2009-11-17 07:28:23 UTC
Tony, per-chipset resolution defaults are nonsensical, the important factor is what the native resolution of the _display_ is, which varies from system to system even if they have the same graphics hardware. This problem could not have been caused by an evdev update, evdev is for input devices, it has nothing to do with graphics rendering. It must have been a kernel or xorg-x11-drv-intel change. Anyway, that's a separate bug, so please file it separately, preferably after you've identified when it started happening. Thanks.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 16 Adam Williamson 2009-11-17 07:29:04 UTC
Can you test with 'nomodeset' to see if it's any better with KMS disabled?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 17 Tony White 2009-11-17 14:50:40 UTC
Yeah, it is a separate bug. Will try.

Comment 18 Tony White 2009-11-17 15:32:50 UTC
nomodeset = black screen of death.
When plymouth drops, a black screen appears, the fan spins up with lots of noise and x does not load. It just sits there. Can't reboot, ctrl + alt + del or ctrl + alt + backspace. It completely locks the machine up.

To add to this report also, if I boot without nomodeset, it gets to the desktop but as described before, windows are ghosting but if I ctrl + alt + backspace, x crashes and the result is the black screen of death also.

Adam, the display resolution is related to the evdev update because the notebook is plugged into a portbar (Mini dock) Which is a secondary method of power/Charging and it also provides a replicated auxiliary video output.
I don't know why but when it's unplugged, the primary display comes up 1024 x 768 but that is a different bug, which I'll report separately as you suggest.

Comment 19 Ian Collier 2009-11-17 22:58:47 UTC
I'd just like to echo that nomodeset = black screen of death here too.
[Intel Corporation 82852/855GM Integrated Graphics Device [8086:3582] (rev 02)]
That was not a good introduction to F12...

I've been using F12 for just under two hours now (since removing "nomodeset" from grub.conf*) and it hasn't crashed, but I don't find this bug report very encouraging.

* It was in there because I'd been trying an F11 kernel on F10, and F10's xorg Intel driver doesn't understand KMS.

Comment 20 Adam Williamson 2009-11-17 23:18:01 UTC
to clarify, we don't care much if the nomodeset path is bad, it's deprecated code that will go away in the foreseeable future. I was suggesting it more as a diagnostic tactic than as a 'fix' (to see if the issue was KMS-related).

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 21 Ian Collier 2009-11-20 00:03:34 UTC
The problem with nomodeset going away is it's the first thing I try when there's a graphics bug.

And when I say bug I mean stuff such as bug 502913, bug 489907, bug 539367, bug 533450, etc (not limited to Intel either - see bug 528191, bug 508558, bug 505769 for ATI Radeon problems).

KMS is not ready yet - please fix the other stuff before taking it away.  Besides which, locking the machine solid is a bug whether or not it's officially supported.

Comment 22 Adam Williamson 2009-11-20 00:11:30 UTC
it's not my decision to make, i'm just telling you the basis on which triage is done.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 23 Ian Collier 2009-12-11 11:55:42 UTC
Created attachment 377712 [details]
xlock + Xorg backtraces

I've experienced two types of freeze since I installed F12.  One of these happened while I was scrolling a Mozilla SeaMonkey window with a playing video in it (while using dd to copy a large amount of data from one USB disk to another in the background) and killed the system so dead that Alt+SysRq+b didn't even work.  This one happened twice in one evening.

The other happens intermittently while either xlock or xscreensaver is active and displaying 3D graphics.  The machine is still alive when both X and the screensaver freeze and can be logged into remotely, but it essentially has to be rebooted in order to regain control of the console.  This happened once two weeks ago and once today.  I am attaching some info (chiefly gdb backtraces) about the state of both xlock and Xorg when the hang occurred today.  I also have some "INFO: task i915/0:103 blocked for more than 120 seconds" messages in the syslog which are similar but not identical to the attachment "messages.txt" already attached to this bug.

Comment 24 Ian Collier 2009-12-11 12:01:20 UTC
Created attachment 377714 [details]
syslog saying that i915/0 and Xorg are hung

And these are the messages.  By the way I was running kernel-2.6.31.5-127.fc12.i686 because I hadn't rebooted after the most recent update.  Will try kernel-2.6.31.6-162.fc12.i686 which so far has managed to run "xlock -mode molecule" a few times without hanging.

Comment 25 Tony White 2009-12-12 14:32:04 UTC
According to the merge list for the drm-intel branch, this is supposed to be fixed now in 2.6.32 :

http://www.pubbs.net/kernel/200909/118476/

8xx works again, since the regression with GEM's introduction
        back in .27

Can anyone confirm, I haven't got the time or the disc space here.

Comment 26 Ian Collier 2009-12-13 03:07:33 UTC
Earlier I wrote:
> Will try kernel-2.6.31.6-162.fc12.i686

Sadly 2.6.31.6-166.fc12 (plus patches for bug 489907) still hangs (moebiusgears from xscreensaver this time).

Comment 27 Bug Zapper 2010-11-04 09:05:33 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 28 Bug Zapper 2010-12-04 03:40:36 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.