Bug 500086

Summary: X segfault on XO-1 system booted with rawhide-xo build 20090519
Product: [Fedora] Fedora Reporter: Mikus Grinbergs <mikus>
Component: xorg-x11-drv-atiAssignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: chris-rhbugs, dcantrell, kmcmartin, mcepl, mcepl, pbrobinson, sebastian, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: i586   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-12 22:13:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 446452, 461806    
Attachments:
Description Flags
Output from most recent (automatic) attempt to start X.
none
/var/log/messages none

Description Mikus Grinbergs 2009-05-10 20:16:57 UTC
Created attachment 343299 [details]
Output from most recent (automatic) attempt to start X.

Description of problem:  When I booted rawhide-xo 20090510 on an XO-1 system, the boot process stalled because it could not start X.  Prior to the stall, the text console output "flashed" repeatedly as the system attempted multiple times to start X.


Version-Release number of selected component (if applicable): 1.6.1


How reproducible:  Did not try with multiple XO-1s.


Steps to Reproduce:
1.  Copy "installation image" of ~cjb/rawhide-xo 20090510.img to NAND on XO-1 system (using 'copy-nand' at ok prompt).
2.  Boot XO (with 'check' button on front panel pressed).
3.

  
Actual results:  Never saw any of the screen contents that use X to display.

Expected results:  Would see "user logon screen".


Additional info:  Am primarily creating this bug ticket in order to have a place to store log output from the system that did not successfully start X.

Comment 1 Mikus Grinbergs 2009-05-10 20:20:02 UTC
Created attachment 343300 [details]
/var/log/messages

For what it's worth - copy of 'messages' output of system on which X did not start.

Comment 2 Peter Robinson 2009-05-10 20:49:16 UTC
I'm seeing exactly the same thing on a build I did today as well. This is a X regression on the XO that's occurred in the last couple of days. I'm going to add this as a blocker for F11.

Comment 3 Chris Ball 2009-05-10 23:22:55 UTC
Next step is probably to bisect the last few xorg-x11-server-Xorg.i586 RPMs here:

http://koji.fedoraproject.org/koji/packageinfo?packageID=63

in order to know whether it's a server change, and which particular RPM introduced the crash.

Comment 4 Chris Ball 2009-05-11 01:18:46 UTC
> (II) Cannot locate a core pointer device.

This is surprising.  Could it be related to the segfault?

I tried disabling GLX by moving /usr/lib/dri/swrast_dri.so out of the way; we get the same segfault with miCreateScreenResources() in the trace.

Comment 5 Chris Ball 2009-05-11 04:59:37 UTC
Mikus points out that "Cannot locate a core pointer device." was present in previous working builds; we're using an xorg.conf entry instead.

Comment 6 Chris Ball 2009-05-12 00:37:16 UTC
Here's a proper gdb backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x00458d54 in exaCreatePixmap (pScreen=0xa055a68, w=0, h=0, depth=16, 
    usage_hint=0) at exa.c:323
323			pExaPixmap->driverPriv = pExaScr->info->CreatePixmap2(pScreen, w, h, depth, usage_hint, bpp);
(gdb) bt
#0  0x00458d54 in exaCreatePixmap (pScreen=0xa055a68, w=0, h=0, depth=16, 
    usage_hint=0) at exa.c:323
#1  0x0811ba79 in miCreateScreenResources (pScreen=0xa055a68)
    at miscrinit.c:153
#2  0x00458177 in exaCreateScreenResources (pScreen=0xa055a68) at exa.c:716
#3  0x080e7221 in xf86CrtcCreateScreenResources (screen=0xa055a68)
    at xf86Crtc.c:698
#4  0x0806b983 in main (argc=1, argv=0xbfb79b44, envp=0xbfb79b4c) at main.c:326

Comment 7 Chris Ball 2009-05-12 01:03:44 UTC
Bisected.  xorg-x11-server-1.6.1-7.fc11 works, xorg-x11-server-1.6.1-8.fc11 (built by airlied) doesn't.

The changelog entry is:

* Thu Apr 23 2009 Dave Airlie <airlied> 1.6.1-8 - xserver-1.6.1-exa-create-pixmap2.patch - add support for tiling create pixmap hook - need to fix firefox on ati rs690 crashes 

X maintainers, please revert or -- if there's time -- let me know how Geode can avoid this new code path, and I can make a quick Geode release.

Comment 8 Kyle McMartin 2009-05-12 02:17:28 UTC
As I've pointed out to cjb on irc, this function pointer should be uninitialized on geode, and zero filled, so this codepath shouldn't be executed. Looks like a driver bug in the geode driver. (Although, possibly the server is doing it, but that seems unlikely.)

Comment 9 Chris Ball 2009-05-12 05:17:31 UTC
Kyle worked it out; we were callocing sizeof() the EXA struct at driver compile-time, which left it containing garbage when airlied made it larger, which made the new function pointer test as valid.  He has a patch to use ExaDriverAlloc() for allocation instead, which I'll try to merge, test, release upstream geode, release a new geode driver RPM, and point people at here for tagging into F11 final in a couple of hours.

Thanks for the excellent help, all.

Comment 10 Chris Ball 2009-05-12 06:31:08 UTC
I've made a new geode 2.11.2 release and tested it working, but have lost my ACL to xorg-x11-drv-geode CVS, I think due to losing provenpackager.  I've already put the tarball into new-sources; could someone else apply the following CVS patch and tag into F11, please?  Thanks!

cvs diff: Diffing .
Index: .cvsignore
===================================================================
RCS file: /cvs/pkgs/rpms/xorg-x11-drv-geode/devel/.cvsignore,v
retrieving revision 1.5
diff -u -r1.5 .cvsignore
--- .cvsignore	16 Feb 2009 21:23:50 -0000	1.5
+++ .cvsignore	12 May 2009 06:28:21 -0000
@@ -1 +1 @@
-xf86-video-geode-2.11.1.tar.bz2
+xf86-video-geode-2.11.2.tar.bz2
Index: sources
===================================================================
RCS file: /cvs/pkgs/rpms/xorg-x11-drv-geode/devel/sources,v
retrieving revision 1.7
diff -u -r1.7 sources
--- sources	16 Feb 2009 21:23:50 -0000	1.7
+++ sources	12 May 2009 06:28:21 -0000
@@ -1 +1 @@
-6e00dd248ac5de89ab4764954ea74a96  xf86-video-geode-2.11.1.tar.bz2
+4c652ecba772f705296b8e52d746857c  xf86-video-geode-2.11.2.tar.bz2
Index: xorg-x11-drv-geode.spec
===================================================================
RCS file: /cvs/pkgs/rpms/xorg-x11-drv-geode/devel/xorg-x11-drv-geode.spec,v
retrieving revision 1.9
diff -u -r1.9 xorg-x11-drv-geode.spec
--- xorg-x11-drv-geode.spec	26 Feb 2009 10:48:34 -0000	1.9
+++ xorg-x11-drv-geode.spec	12 May 2009 06:28:21 -0000
@@ -4,8 +4,8 @@
 
 Summary:   Xorg X11 AMD Geode video driver
 Name:      xorg-x11-drv-geode
-Version:   2.11.1
-Release:   2%{?dist}
+Version:   2.11.2
+Release:   1%{?dist}
 URL:       http://www.x.org/wiki/AMDGeodeDriver
 Source0:   http://xorg.freedesktop.org/releases/individual/driver/xf86-video-geode-%{version}.tar.bz2
 License:   MIT
@@ -60,6 +60,9 @@
 %{driverdir}/ztv_drv.so
 
 %changelog
+* Tue May 12 2009 Chris Ball <cjb> 2.11.2-1
+- fix crasher bug due to EXA ABI change: RHBZ #500086
+
 * Thu Feb 26 2009 Fedora Release Engineering <rel-eng.org> - 2.11.1-2
 - Rebuilt for https://fedoraproject.org/wiki/Fedora_11_Mass_Rebuild

Comment 11 Chris Ball 2009-05-12 18:05:32 UTC
Kyle made the build, I've filed a rel-eng ticket for inclusion in F11:

https://fedorahosted.org/rel-eng/ticket/1791

Comment 12 Matěj Cepl 2009-05-12 21:58:03 UTC
There is nothing to triage here.

Switching to ASSIGNED so that developers have responsibility to do whatever they want to do with it.

Comment 13 Chris Ball 2009-05-12 22:06:09 UTC
I think this can be closed -- the build with the fix has been tagged into f11-final.  If you'd like me to verify that the correct RPM makes it into the final image, go ahead and leave this open, else we can close it now.

Thanks.