Bug 490291

Summary: kernel-2.6.29-0.237.rc7.git4.fc11 recursive fault in radeon_read_ring_rptr
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: kernelAssignee: Dave Airlie <airlied>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 11CC: jglisse, kernel-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-10-15 14:16:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg from 2.6.29-0.237.rc7.git4.fc11 terminated by protection fault
none
traces from X server restart with the current configuration none

Description Michal Jaegermann 2009-03-14 21:20:41 UTC
Created attachment 335228 [details]
dmesg from 2.6.29-0.237.rc7.git4.fc11 terminated by protection fault

Description of problem:

I got "general protection fault: 0000 [#1] SMP" with:

RIP: 0010:[<ffffffffa0048083>]  [<ffffffffa0048083>] radeon_read_ring_rptr+0x3b/
0x50 [radeon]

Full dmesg output is attached.  It is terminated with "Fixing recursive fault but reboot is needed!" so I rebooted.

This happened on an X server restart.  After a reboot the same operation
did not fault the first time but I got for a change:

X:2822 freeing invalid memtype e0102000-e0112000
X:2822 freeing invalid memtype e0112000-e0122000
X:2822 freeing invalid memtype e0122000-e0132000
X:2822 freeing invalid memtype e0132000-e0142000
X:2822 freeing invalid memtype e0142000-e0152000
X:2822 freeing invalid memtype e0152000-e0162000
X:2822 freeing invalid memtype e0162000-e0172000
X:2822 freeing invalid memtype e0172000-e0182000
X:2822 freeing invalid memtype e0182000-e0192000
X:2822 freeing invalid memtype e0192000-e01a2000
X:2822 freeing invalid memtype e01a2000-e01b2000
X:2822 freeing invalid memtype e01b2000-e01c2000
X:2822 freeing invalid memtype e01c2000-e01d2000
X:2822 freeing invalid memtype e01d2000-e01e2000
X:2822 freeing invalid memtype e01e2000-e01f2000
X:2822 freeing invalid memtype e01f2000-e0202000
X:2822 freeing invalid memtype e0202000-e0212000
X:2822 freeing invalid memtype e0212000-e0222000
X:2822 freeing invalid memtype e0222000-e0232000
X:2822 freeing invalid memtype e0232000-e0242000
X:2822 freeing invalid memtype e0242000-e0252000
X:2822 freeing invalid memtype e0252000-e0262000
X:2822 freeing invalid memtype e0262000-e0272000
X:2822 freeing invalid memtype e0272000-e0282000
X:2822 freeing invalid memtype e0282000-e0292000
X:2822 freeing invalid memtype e0292000-e02a2000
X:2822 freeing invalid memtype e02a2000-e02b2000
X:2822 freeing invalid memtype e02b2000-e02c2000
X:2822 freeing invalid memtype e02c2000-e02d2000
X:2822 freeing invalid memtype e02d2000-e02e2000
X:2822 freeing invalid memtype e02e2000-e02f2000
X:2822 freeing invalid memtype e02f2000-e0302000

A repeated server restart got me the same pile of "freeing invalid" messages and the same general protection fault as above.

In both cases KMS was _not_ used.


Version-Release number of selected component (if applicable):
2.6.29-0.237.rc7.git4.fc11.x86_64

How reproducible:
Apparently not every time but it is possible to repeat.

Additional info:
ATI Technologies Inc R300 AD [Radeon 9500 Pro] graphics card.

Comment 1 Michal Jaegermann 2009-03-14 21:33:39 UTC
With 'nomodeset' dropped from a command line so far I was unable to repeat that fault. OTOH on a reboot a monitor become totally unusable (no sync) without power cycling.

Comment 2 Michal Jaegermann 2009-03-31 20:50:52 UTC
I re-tried with kernel 2.6.29-21.fc11.x86_64, xorg-x11-server-Xorg-1.6.0-16.fc11,
and xorg-x11-drv-ati-6.12.0-2.fc11.  Results are still as described in the
report.  That means a bunch of "X:2779 freeing invalid memtype ..." followed
by recursive fault in radeon_read_ring_rptr on a server restart or a monitor unusable after a reboot if "nomodeset" was not used.

An X server is truly dead after such experiment and there is nothing which can be done from a local console.  Even from a remote login X process becomes an unkillable zombie.

Comment 3 Michal Jaegermann 2009-04-08 21:02:11 UTC
Created attachment 338800 [details]
traces from X server restart with the current configuration

I tried the same experiment (and a few times in the meantime) with kernel-2.6.29.1-54.fc11.x86_64, xorg-x11-server-Xorg-1.6.0-17.fc11 and xorg-x11-drv-ati-6.12.1-9.fc11.  This time I do not see that recursive fault any longer.  OTOH a series of "freeing invalid memtype" followed by a long list of "calling reserve_ram_pages_type" is still there.  Is that expected?

A relevant fragment of dmesg output is attached.

Comment 4 Bug Zapper 2009-06-09 12:14:13 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 5 Jérôme Glisse 2009-10-14 10:51:48 UTC
Can you test with fedora 12 livecd and report if it works with it.

Comment 6 Michal Jaegermann 2009-10-14 18:34:07 UTC
> Can you test with fedora 12 livecd ...

Not really.  OTOH I did not see that problem for a long time and whatever faults are in the current rawhide it is not that one.  A dmesg for the current rawhide kernel on this hardware can be found for example here:
https://bugzilla.redhat.com/attachment.cgi?id=364786

Comment 7 Jérôme Glisse 2009-10-15 14:16:49 UTC
Ok. So clausing this bug, reopen if you experience similar problem with fedora 12.