Bug 689968

Summary: [RV380] hang when X shuts down (regression)
Product: [Fedora] Fedora Reporter: Pierre Ossman <ossman>
Component: xorg-x11-drv-atiAssignee: Jérôme Glisse <jglisse>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 14CC: airlied, jglisse, mcepl, xgl-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard: [cat:modesetting]
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-16 13:32:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
dmesg.log
none
messages
none
Xorg.0.log none

Description Pierre Ossman 2011-03-22 21:42:52 UTC
Whenever X terminates, the machine locks up hard. It doesn't matter if I try to turn off the machine, suspend or just kill X, the result is roughly the same.

The machine is completely hung and cannot even be reached over the network. Nothing in Xorg.0.log or messages when the machine is back up.

Pressing the power button for 4 seconds generally work, except for suspend where you have to pull the power to the machine.


This problem only occurs with xorg-x11-drv-ati-6.13.1-0.4.20100705git37b348059.fc14.i686. Downgrading to xorg-x11-drv-ati-6.13.1-0.3.20100705git37b348059.fc14.i686 solves things.

For some reason I only got the new version a few weeks ago, even though koji claims it was built back in october. Odd...


Hardware:

01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon X300 (PCIE)]

Comment 1 Matěj Cepl 2011-03-23 10:14:51 UTC
(In reply to comment #0)
> The machine is completely hung and cannot even be reached over the network.
> Nothing in Xorg.0.log or messages when the machine is back up.

Even when you boot to runlevel 3? (add number 3 to the kernel command line)

Otherwise, please add drm.debug=0x04 to the kernel command line, restart computer, and attach

* your X server config file (/etc/X11/xorg.conf, if available),
* X server log file (/var/log/Xorg.*.log) from running in the runlevel 3,
* output of the dmesg command (anytime before the termination of X), and
* system log (/var/log/messages)

to the bug report as individual uncompressed file attachments using the bugzilla file attachment link above.

We will review this issue again once you've had a chance to attach this information.

Thanks in advance.

Comment 2 Pierre Ossman 2011-03-31 19:20:17 UTC
Ehm... not sure I understand you here. If I boot runlevel 3 then there won't be any X, so no Xorg.*.log.

I am getting hangs with the "good" driver now as well. It seems it has the bug as well, just that the newer was more easily provoked. On that theory, starting just X (no clients) even with the "bad" driver does not provoke a hang. So it seems it needs to do some actual work before things go south...

I'll attach some files from runlevel 5 with that debug flag. Hopefully something makes it into the log files...

Comment 3 Pierre Ossman 2011-03-31 19:28:15 UTC
Created attachment 489204 [details]
dmesg.log

Comment 4 Pierre Ossman 2011-03-31 19:28:38 UTC
Created attachment 489205 [details]
messages

Comment 5 Pierre Ossman 2011-03-31 19:28:54 UTC
Created attachment 489206 [details]
Xorg.0.log

Comment 6 Matěj Cepl 2011-04-01 05:45:25 UTC
(In reply to comment #2)
> Ehm... not sure I understand you here. If I boot runlevel 3 then there won't be
> any X, so no Xorg.*.log.

You can get X in the runlevel 3 by running startx command as normal user with the advantage that you get back to the command line and not getting locked out from whole system. I am sorry for not explaining this completely.

Comment 7 Pierre Ossman 2011-04-01 10:29:55 UTC
I've been debugging the system remotely over ssh, so messing up the local console hasn't been an issue.

I did manage to run one test this morning though, and that was recompiling the mesa driver package without gallium. And after that I could not reproduce the hang. So it seems the issue is in the R300 Gallium driver.

I'll try netconsole when I get back home and see if I can get something before it locks up.

Comment 8 Pierre Ossman 2011-05-02 19:16:21 UTC
Sorry, got distracted and then completely forgot about this issue.

The machine has been running just fine using the classic driver for a month now. Not a single hang. Went back to gallium today to do some more testing.

First, I upgraded the system fully so I'm running on a new kernel and newer Xorg now. No updates of mesa or the DDX though.

I set up netconsole and disabled NetworkManager's suspend script (so that I could see things as the machine is going down). I've also kept drm.debug=0x04. I then tried to do some suspends.

Most of the time, the machine locks up with the display running and valid output still on there. Nothing whatsoever is sent over netconsole.

Twice, I got it to suspend but locked up on resume instead. I got this in netconsole going down:

[  184.676547] PM: Syncing filesystems ... done.
[  184.677857] PM: Preparing system for mem sleep
[  184.691980] [drm:drm_crtc_helper_set_config], 
[  184.691985] [drm:drm_crtc_helper_set_config], crtc: f64e4000 7 fb: f6715978 connectors: f658f850 num_connectors: 1 (x, y) (0, 0)
[  184.692023] [drm:drm_crtc_helper_set_config], setting connector 13 crtc to f64e4000
[  184.692040] [drm:drm_crtc_helper_set_config], 
[  184.692043] [drm:drm_crtc_helper_set_config], crtc: f64e3000 8 fb: f6715978 connectors: f658f860 num_connectors: 0 (x, y) (0, 0)
[  184.692051] [drm:drm_crtc_helper_set_config], crtc has no fb, full mode set
[  184.692054] [drm:drm_crtc_helper_set_config], setting connector 13 crtc to f64e4000
[  184.692067] [drm:drm_crtc_helper_set_config], 
[  184.692069] [drm:drm_crtc_helper_set_config], crtc: f64e4000 7 fb: f6715978 connectors: f658f850 num_connectors: 1 (x, y) (0, 0)
[  184.692077] [drm:drm_crtc_helper_set_config], setting connector 13 crtc to f64e4000
[  184.692503] Freezing user space processes ... (elapsed 0.01 seconds) done.
[  184.703073] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[  184.714033] PM: Entering mem sleep
[  184.714073] Suspending console(s) (use no_console_suspend to debug)

Nothing coming up unfortunately.

Comment 9 Pierre Ossman 2011-05-02 19:29:16 UTC
Just doing something simple, like stopping the X server through "telinit 3", results in the same hang with everything left on screen. Nothing using netconsole at that point either.

So I tried cranking up the debug level to 0xffff (sidenote: which crashed the nouveau driver on the receiving machine. graphics drivers really love me :/). These are the last few lines that seem to be the same for every hang:

[  161.461834] [drm:drm_ioctl], pid=1526, cmd=0xc0206466, nr=0x66, dev 0xe200, auth=1
[  161.461973] [drm:drm_ioctl], pid=1526, cmd=0x40086409, nr=0x09, dev 0xe200, auth=1
[  161.462345] [drm:drm_ioctl], pid=1526, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1
[  161.462385] [drm:drm_ioctl], pid=1526, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1
[  161.462429] [drm:drm_ioctl], pid=1526, cmd=0x40086409, nr=0x09, dev 0xe200, auth=1
[  161.462462] [drm:drm_ioctl], pid=1526, cmd=0x40086409, nr=0x09, dev 0xe200, auth=1
[  161.462533] [drm:drm_ioctl], pid=1526, cmd=0x40086409, nr=0x09, dev 0xe200, auth=1
[  161.462580] [drm:drm_ioctl], pid=1526, cmd=0xc0206466, nr=0x66, dev 0xe200, auth=1
[  161.462663] [drm:drm_ioctl], pid=1526, cmd=0xc01c64a3, nr=0xa3, dev 0xe200, auth=1
[  161.463707] [drm:drm_ioctl], pid=1526, cmd=0x641f, nr=0x1f, dev 0xe200, auth=1

Comment 10 Pierre Ossman 2011-05-02 19:57:15 UTC
Problem seems to be related to releasing control with GLX stuff active. If I kill xbmc, X will shut down momentarily and you see the console flicker by. I do not seem to be able to hang things this way. GLX commands still being executed and referencing freed memory or something nasty like that perhaps?

Comment 11 Pierre Ossman 2011-05-03 16:03:23 UTC
Dänzer asked me to test mismatch combinations of the DRI driver between the X server and the client application. This is the result:

Xorg    xbmc      Result
------------------------
Mesa    Gallium   Hang
Gallium Mesa      No hang

Comment 12 Pierre Ossman 2011-05-04 19:52:20 UTC
Did some printk debugging of the kernel, and it makes it all the way through drm_dropmaster_ioctl() before it locks up. Not sure where to continue hunting at that point.

Comment 13 Fedora End Of Life 2012-08-16 13:32:28 UTC
This message is a notice that Fedora 14 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 14. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained.  At this time, all open bugs with a Fedora 'version'
of '14' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this 
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen 
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we were unable to fix it before Fedora 14 reached end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" (top right of this page) and open it against that 
version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping