Bug 524368

Summary: [r600] When running in KMS mode, machine randomly locks up
Product: [Fedora] Fedora Reporter: Kevin DeKorte <kdekorte>
Component: xorg-x11-drv-atiAssignee: Dave Airlie <airlied>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: rawhideCC: awilliam, jglisse, mattdm, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-10-01 14:17:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Xorg log with bt at bottom
none
/var/log/messages while running in KMS mode none

Description Kevin DeKorte 2009-09-19 14:12:43 UTC
Description of problem:
When logging in with KMS mode and running the desktop, it will occasionally lockup the machine making is not able to be reached by ssh. Setting radeon.modeset=0 on the commandline makes these random lockups go away, but then 3d apps crash the machine (Bug 524367)

Version-Release number of selected component (if applicable):
Latest as of Sept 18,2009 

How reproducible:
Random

Steps to Reproduce:
1. Use desktop in normal ways, some times desktop locks up right as menus appear, sometimes after 1-2 hours of usage
2.
3.
  
Actual results:
Random lockups

Expected results:
Should not lockup

Additional info:

Card is an rv635 - Asus Silent Magic 3650

Comment 1 Kevin DeKorte 2009-09-30 18:10:56 UTC
Upgraded to 

kernel-2.6.31.1-56.fc12.x86_64
xorg-x11-drv-ati-6.13.0-0.6.20090929git7968e1fb8.fc12.x86_64

And I still get random lockups in KMS mode. Usually within a few minutes of boot. Switching off KMS results in everything working ok, although several 3d apps based on clutter run quite poorly in non-KMS mode and run better in KMS mode until the machine locks up.

Please assign to airlied as this bug will cause many issues with F12 and may need to be marked as a blocker for F12

01:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3600 Series (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. Device 01da
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Memory at fe9e0000 (64-bit, non-prefetchable) [size=64K]
	I/O ports at d000 [size=256]
	[virtual] Expansion ROM at fe900000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel modules: radeon

Comment 2 Adam Williamson 2009-09-30 18:59:23 UTC
it can't be considered a blocker unless we know it affects more than one or a couple of devices, but setting high priority.

Can you please provide /var/log/Xorg.0.log and /var/log/messages from after a lock up? (Obviously, reboot to runlevel 3 to get the Xorg.0.log from after the lock).

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 3 Adam Williamson 2009-09-30 18:59:35 UTC
uh, obviously I meant severity not priority.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 4 Kevin DeKorte 2009-09-30 19:41:17 UTC
From previous testing /var/log/messages and /var/log/Xorg.0.log.old contain nothing, but I will retest to verify. When it locks up I can't ssh and the machine is just dead, so I'm not sure that the logs are getting written.

Comment 5 Adam Williamson 2009-09-30 19:54:13 UTC
sure is, but we need to see just in case the last few messages before the hang are useful, and in some cases even the negative evidence that _no_ messages made it to the relevant logs when the system hung may be useful to a developer in identifying the problem (i.e. it lets them know for sure that the problem is _not_ one which would have allowed to get some logs written).

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 6 Kevin DeKorte 2009-09-30 20:43:24 UTC
Created attachment 363237 [details]
Xorg log with bt at bottom

Comment 7 Kevin DeKorte 2009-09-30 20:53:47 UTC
Created attachment 363238 [details]
/var/log/messages while running in KMS mode

Comment 8 Kevin DeKorte 2009-09-30 21:05:07 UTC
Also, just a note on the setup

Intel Q6600 CPU, 4GB of RAM, 64bit mode
Dual head setup

DVI-0 - 1680x1050 display
DVI-1 - 1280x1024 display (this is right of DVI-0)

Comment 9 Kevin DeKorte 2009-10-01 13:50:53 UTC
I tried the following things and all resulted in hangs

1. boot with a single display instead of dual
2. boot with mem=2g
3. boot 32bit live cd  - desktop-i386-20090930.16.iso
             with this one also got some initial cursor curruption

I chatted with airlied over irc and had also had me try 
EXANoDownloadFromScreen
EXANoUploadToScreen
EXANoComposite
AccelDFS off

Still was either able to hang the machine or get X stuck in a tight loop

bt when X was stuck in a tight loop

(gdb) bt
#0  0x0000003d9eed9717 in ioctl () from /lib64/libc.so.6
#1  0x00007f2295fd3203 in drmIoctl () from /usr/lib64/libdrm.so.2
#2  0x00007f2295fd344c in drmCommandWriteRead () from /usr/lib64/libdrm.so.2
#3  0x00007f22956c6f59 in ?? () from /usr/lib64/libdrm_radeon.so.1
#4  0x00007f22956c7035 in ?? () from /usr/lib64/libdrm_radeon.so.1
#5  0x00007f2295985966 in xf86EnableDisableFBAccess ()
   from /usr/lib64/xorg/modules/drivers/radeon_drv.so
#6  0x00007f22959859d3 in xf86EnableDisableFBAccess ()
   from /usr/lib64/xorg/modules/drivers/radeon_drv.so
#7  0x00007f2295983592 in xf86EnableDisableFBAccess ()
   from /usr/lib64/xorg/modules/drivers/radeon_drv.so
#8  0x00007f2294c8a5f5 in ?? () from /usr/lib64/xorg/modules/libexa.so
#9  0x00007f2294c8b20a in ?? () from /usr/lib64/xorg/modules/libexa.so
#10 0x00000000004d220b in ?? ()
#11 0x00000000005635a7 in ?? ()
#12 0x0000000000563683 in miCompositeRects ()
#13 0x00000000004cbf34 in ?? ()
#14 0x000000000042c5dc in ?? ()
#15 0x0000000000421c6a in _start ()


And again booting with radeon.modeset=0, I don't get any lockups

Comment 10 Kevin DeKorte 2009-10-01 13:53:26 UTC
Also in KMS mode I have a extra connector in xrandr

in not KMS mode xrandr looks like this

Screen 0: minimum 320 x 200, current 2960 x 1050, maximum 3600 x 1680
DVI-1 connected 1680x1050+0+0 (normal left inverted right x axis y axis) 450mm x 280mm
   1680x1050      60.0*+   59.9  
   1280x1024      75.0     60.0  
   1024x768       75.0     70.1     60.0  
   832x624        74.6  
   800x600        72.2     75.0     60.3     56.2  
   640x480        75.0     72.8     66.7     59.9  
   720x400        70.1  
DVI-0 connected 1280x1024+1680+0 (normal left inverted right x axis y axis) 376mm x 301mm
   1280x1024      60.0*+
   1024x768       60.0  
   800x600        60.3  
   640x480        59.9  
   720x400        70.1  


In KMS mode I have an extra line "DIN not connected". The card does have an s-video out port on it, between the two DVI ports.

Comment 11 Kevin DeKorte 2009-10-01 13:55:21 UTC
I can pretty much always lockup X by opening a gnome-terminal (I use the gnome-desktop). Getting some text in it (like dmesg) and then rapidly scrolling the text or resizing the window.

Comment 12 Matthew Miller 2009-10-01 14:04:56 UTC
Kevin, Adam -- I'm getting basically the same thing (same traceback) on my iMac running rawhide (M76XT [Mobility Radeon HD 2600 XT]). However, my entire system isn't locking up, just X and the console. I can move the mouse and that updates, but no clicking, and the keyboard is entirely dead -- but I can ssh in just fine.

So this looks like a dupe of bug #517625 (which has several other reporters too), except for the addition of killing the network.

Comment 13 Kevin DeKorte 2009-10-01 14:12:27 UTC
Matt, I think I am actually running into two issues here. One where X gets stuck in a loop and another that kills the entire machine.

BTW, I did some additional testing with icewm - locked up when opening xterm and fluxbox, locked up on start, it didn't finish drawing its menu bar.

Comment 14 Matthew Miller 2009-10-01 14:17:27 UTC
In that case, marking this as a dupe of the other one.

*** This bug has been marked as a duplicate of bug 517625 ***