Bug 677842

Summary: [abrt] mutter-2.91.6-4.fc15: ureg_src: Process /usr/bin/mutter was killed by signal 11 (SIGSEGV) - gcc buggy when building 32-bit with -fno-omit-frame-pointer
Product: [Fedora] Fedora Reporter: John Watzke <watzkej>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rawhideCC: airlied, ajax, a.mani.cms, awilliam, bruno, clydekunkel7734, d2kxweb, fabian.deutsch, hoyang, jakub, jglisse, jlaska, mamers.sdtb, maxamillion, mclasen, mike.cloaked, otaylor, pbrobinson, pfpschneider, rbergero, robatino, v.plessky, walters
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Unspecified   
Whiteboard: abrt_hash:34deef696f7cdda47fccde7f28d6df0c10e3e614 RejectedBlocker AcceptedNTH
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-25 19:15:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 657617    
Attachments:
Description Flags
File: backtrace
none
Xorg.0.log from comment 45
none
Xorg.0.log for mutter crash in Radeon M300 graphics
none
Error output from mutter crash in VT none

Description John Watzke 2011-02-16 03:31:21 UTC
abrt version: 1.1.17
architecture: i686
Attached file: backtrace, 48301 bytes
cmdline: mutter --mutter-plugins=libgnome-shell
comment: This appears to happen everytime I login.
component: mutter
Attached file: coredump, 91901952 bytes
crash_function: ureg_src
executable: /usr/bin/mutter
kernel: 2.6.38-0.rc4.git7.1.fc15.i686
package: mutter-2.91.6-4.fc15
rating: 4
reason: Process /usr/bin/mutter was killed by signal 11 (SIGSEGV)
release: Fedora release 15 (Rawhide)
time: 1297826569
uid: 500

How to reproduce
-----
1. Startup TC2 and login to gnome
2. Crash

Comment 1 John Watzke 2011-02-16 03:31:23 UTC
Created attachment 479012 [details]
File: backtrace

Comment 2 John Watzke 2011-02-16 03:36:04 UTC
I'm confident that my card works in gnome3 as it was working in TC1 and the gnome3 test day live image.

Comment 3 John Watzke 2011-02-16 03:49:23 UTC
I think this would qualify as an Alpha blocker... correct me if I'm wrong.

Comment 4 John Watzke 2011-02-16 03:50:22 UTC
Video card:

01:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3650

Comment 5 Adam Williamson 2011-02-16 05:22:17 UTC
john: if it affects everyone, yes. if it only affects your configuration, probably not. we'll see if this happens to all tc2 testers, I'm going to test it shortly.

Comment 6 Adam Williamson 2011-02-16 06:01:00 UTC
I just booted TC2 and this didn't happen. I have crashes for notification-daemon and gnome-user-share, but Shell is up and running and there's no crash for mutter.

Comment 7 James Laska 2011-02-16 15:27:35 UTC
I'm not longer see this issue after applying TC2 and related bodhi updates.

@John ... are you still seeing this issue with the latest bodhi Fedora 15 updates?

Comment 8 John Watzke 2011-02-16 19:50:04 UTC
I had the latest Bohdi updates as of 2011-02-15 22:31:21 EST which is when I submitted the bug.  I don't think there have been any new releases since then.  I still had the problem after applying the 2/15 Bodhi updates.  I can certainly try a fresh install, immediately apply the updates (or do so during install), and then try to login and see if that fixes things.

I know that when I applied last night's Bohdi updates the First Login module actually ran.  Maybe whatever broke that also put my machine in a horrible unrecoverable state.

Comment 9 John Watzke 2011-02-17 03:54:06 UTC
So I did a full reinstall of TC2 and also selected the Update-Testing repo during install so I got the latest updates.  First Boot came up like it was supposed to but I still got the crash.  I'm going to try this on the x84_64 DVD as well.  I have several gigs of RAM and usually run x86_64 but I thought I'd give i686 a try this time around.

Comment 10 John Watzke 2011-02-18 02:48:01 UTC
I'm posting this from F15 after using the x86_64 DVD.  Is it possible that mutter is having a problem specifically with ATI cards and i686 using the PAE kernel?  Fedora installs the PAE kernel by default so I can certainly reinstall the i686 DVD again and see if the problem goes away when I install and run the non-PAE kernel.

Comment 11 GoinEasy9 2011-02-18 03:13:49 UTC
I have a Nvidia Geforce 9800GT and have the same "reason: Process /usr/bin/mutter was killed by signal 11 (SIGSEGV)".  
I installed successfully from the TC2 Alpha DVD, but, after logging in, screen remains with just wallpaper, no panels.  The abrt notification flashes (yes flashes, as does any open window on the screen, until the flashing disappears).
Abrt notifications include: mutter, notification daemon and gnome-user share.  I also saw a couple of kernel oops'.
When I get back to the machine, I'll see if I can extract the messages file and attach it.

Comment 12 John Watzke 2011-02-18 03:38:27 UTC
Bug 678418 might possibly be a dup but I'll let the bug zapping experts decide.

Comment 13 John Watzke 2011-02-18 03:46:05 UTC
I tried switching to a non-PAE kernel and it didn't make a difference.  I'm positive nothing is wrong with the media I have either.  Both the sha256sum on the iso and a media check are fine.

[jwatzke@jwatzke Desktop]$ checkisomd5 /dev/sr0
The media check is complete, the result is: PASS.
It is OK to use this media.

However, the x86_64 DVD works.  The other reports all seem to be related to i686 but it isn't only ATI.

Comment 14 Andre Robatino 2011-02-18 17:59:36 UTC
Fedora is _supposed_ to install the PAE kernel when it's supported, but that hasn't been happening lately - see bug 676578.

Comment 15 Robyn Bergeron 2011-02-18 18:03:33 UTC
Per 2011-02-18 Alpha blocker meeting -
not enough information to accept or reject, need input from desktop SIG.

Comment 16 Robyn Bergeron 2011-02-18 18:05:03 UTC
#action desktop-sig - input needed to determine impact of 677842
#action desktop-sig - input needed to determine impact of 677371

Comment 17 James Laska 2011-02-18 18:13:11 UTC
(In reply to comment #16)
> #action desktop-sig - input needed to determine impact of 677842
> #action desktop-sig - input needed to determine impact of 677371

Jerome ... any thoughts on your end?

Comment 18 Clyde E. Kunkel 2011-02-18 21:10:13 UTC
saw on instance with F15TC1, but none since after updating thru TC2
RV600 card, ATI driver

Comment 19 Mike C 2011-02-18 21:19:13 UTC
I just got more data at bug 678418 which seems to be a duplicate of this one.

Comment 20 Adam Williamson 2011-02-19 07:37:22 UTC
*** Bug 678418 has been marked as a duplicate of this bug. ***

Comment 21 Hongqing Yang 2011-02-21 04:44:29 UTC
just tested it on TC2, it is not reproduced.
hardware info:
03:00.0 VGA compatible controller: ATI Technologies Inc RV620 [ATI FireGL V3700]

Comment 22 James Laska 2011-02-21 16:18:13 UTC
Dave Airlie provided some feedback on the users that would be impacted by this issue ...

On Sat, 2011-02-19 at 06:12 +1000, Dave Airlie wrote:
> On Fri, 2011-02-18 at 11:50 -0800, Adam Williamson wrote:
> > Hey, folks. At the blocker review meeting today we agreed that we were
> > not in a position to decide on the status of these potential blockers:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=677842
> > https://bugzilla.redhat.com/show_bug.cgi?id=677371
> > 
> > they apparently cause an unusable system as Shell repeatedly crashes and
> > fallback mode does not kick in. We are unsure of the extent of the
> > impact; several different reporters seem to have hit these (including
> > the Chinese install testers, though they didn't add a comment to the
> > bug, they mentioned it in their test report).
> 
> Looks like all post r600 ATI cards would be affected, something must
> have changed in mutter to trigger this, the quickest solution I can see
> at the moment is to rebase mesa and pray. If you want a targeted fix
> it'll take longer to actually work out why its going so horribly wrong.

Comment 23 Mike C 2011-02-21 16:38:16 UTC
Is there any identifiable change that occurred between the first graphics test day iso and the recent nightlies/tc2?

In my case for bug 678418 the same machine was going into gnome3 (gnome shell) and working without any major problems during the first test day - is it possible that some systemd changes recently have messed up the system in my (and the other) cases listed?

Comment 24 Adam Williamson 2011-02-21 18:09:28 UTC
a change in systemd certainly wouldn't be the first thing I'd look at. Dave suggested a change in mutter as a likely cause. The Test Day image would have used 2.91.6-2 . The current build is 2.91.6-4 . -3 and -4 were both rebuilds; the first with gcc 4.6, the second against GTK+ 3. Clutter, which Mutter depends on, went from 1.6.0 at the time of the Test Day to 1.6.4 at present, so that's another possibility.

Comment 25 Mike C 2011-02-21 19:59:22 UTC
I booted last night's nightly - and running the live iso I downgraded clutter to the same version as in the test day - when I restarted prefdm.service without /etc/gdm/custom.conf it did exactly the same - so that seems not to be the issue.

I tried to downgrade mutter but there are deps - I can use rpm to force it but it won't complete the login to gnome3 - it has the background but gives nothing else.

I am guessing it may need gtk3 installing first - I will try that in the running live iso - and see if I can downgrade mutter and try again - will report back.

Comment 26 Peter Robinson 2011-02-21 20:11:19 UTC
> I am guessing it may need gtk3 installing first - I will try that in the
> running live iso - and see if I can downgrade mutter and try again - will
> report back.

you won't be able to downgrade gtk3 as there's just way too many deps. So you'd either need to do a scratch build of the old mutter against the new gtk3 or we need to class it as a regression and treat it appropriately.

Comment 27 Mike C 2011-02-21 20:21:11 UTC
What does "treat it appropriately" imply here?

Comment 28 Mike C 2011-02-21 20:48:59 UTC
If someone could and would do a scratch build of mutter 2.91.6-2 against the new gtk3 then I would be happy to test it in the livecd iso - I don't know how to do the build.

Comment 29 Jérôme Glisse 2011-02-21 20:54:40 UTC
So as Dave said likely affect all HD2XXX,HD3XXX,HD4XXX,HD5XXX,(HD6XXX) gpu i
have a couple idea for possible culprit but if master works the easiest
solution is a rebase dunno if this can be considered as a good solution.

Comment 30 Adam Williamson 2011-02-21 20:57:59 UTC
mike c: that wouldn't make any sense. 2.91.6-4 *is* a build of 2.91.6-2 against the latest gtk3. No code was changed in -3 or -4, they were both simple rebuilds of the same code.

jerome: it would be much better to get a targeted fix at this point than 'rebase to master and pray' - even if a rebase fixes this bug there's no telling what else it might break. can you try and come up with a fix? is there anything anyone can provide to help with that? do you have appropriate test hardware available?

Comment 31 Mike C 2011-02-21 21:20:31 UTC
OK Adam - I did not realise - should have checked the koji info which does have that explicitely! - however I have the hardware if there is any testing that can be done to help move this forward.

Comment 32 James Laska 2011-02-22 19:02:03 UTC
(In reply to comment #29)
> So as Dave said likely affect all HD2XXX,HD3XXX,HD4XXX,HD5XXX,(HD6XXX) gpu i
> have a couple idea for possible culprit but if master works the easiest
> solution is a rebase dunno if this can be considered as a good solution.

Given the large number of testers this would impact, and since there are no reasonable workarounds for this issue, I believe we have to consider this as an Alpha release [1] blocker impacting the following 2 criteria ...

   It must be possible to run the default web browser and a terminal 
   application from the default desktop environment. The web browser 
   must be able to download files, load extensions, and log into FAS"
   
   "The installed system must be able to download and install updates 
    with yum and PackageKit"

+1 from me on making this an F15Alpha blocker.

[1] https://fedoraproject.org/wiki/Fedora_15_Alpha_Release_Criteria

Comment 33 Adam Williamson 2011-02-22 19:03:48 UTC
I'm also +1, I think the hardware impact is significant enough that we should fix this.

Comment 34 John Watzke 2011-02-22 19:21:28 UTC
I also have the hardware (both a HD3650 and an HD3400) so I'd be happen to test any test fixes on both of the systems if you create such a test fix.

Comment 35 Mike C 2011-02-22 19:49:51 UTC
This is not restricted to Radeon cards - I have just tried to do a test using
the current test day graphics iso on the following hardware:
http://www.smolts.org/client/show/pub_90235bed-08f2-4a23-ad0a-49cfb124f8f7

I am getting this same problem as for this bz - but this hardware is:
nVidia Corporation G86M [Quadro FX 360M]

Comment 36 Adam Williamson 2011-02-22 19:59:08 UTC
That does not mean it's the same bug. "Mutter crashed" is a pretty vague bug. Can you please attach the backtrace? It's almost certainly a different crash.

Comment 37 Mike C 2011-02-22 20:09:54 UTC
Well in /var/spool/abrt there are several entries and one contains the reason which has the same line as in the above - copying from that file:

[root@localhost ~]# cat /var/spool/abrt/ccpp-1298421417-1647/reason 
Process /usr/bin/mutter was killed by signal 11 (SIGSEGV)

I will try to install abrt-cli in this machine and then try to get a crash report - but this is looking the same as the other machine that I have been testing (Radeon) - symptoms as well as the abrt line above.

Comment 38 Mike C 2011-02-22 20:13:02 UTC
The list of crashes related to comment #37 is:


[root@localhost ~]# abrt-cli -l
0.
	UID        : 42
	UUID       : b844a8ef27d4e048bf46b9d2a33603fbb4bb1229
	Package    : gnome-settings-daemon-2.91.9-4.fc15
	Executable : /usr/libexec/gnome-settings-daemon
	Crash Time : Tue 22 Feb 2011 07:37:56 PM EST
	Crash Count: 2
	Hostname   : localhost.localdomain
1.
	UID        : 500
	UUID       : 1dacde1c51ec252934bbcfd19c7329b300ca4b54
	Package    : gnome-user-share-2.30.2-4.fc15
	Executable : /usr/libexec/gnome-user-share
	Crash Time : Tue 22 Feb 2011 07:38:31 PM EST
	Crash Count: 3
	Hostname   : localhost.localdomain
2.
	UID        : 500
	UUID       : 5ba203fd13e890f0ead895774aadb8429f625464
	Package    : mutter-2.91.6-4.fc15
	Executable : /usr/bin/mutter
	Crash Time : Tue 22 Feb 2011 07:38:34 PM EST
	Crash Count: 3
	Hostname   : localhost.localdomain
3.
	UID        : 0
	UUID       : dd4f502dacaf9e3d469a530de51607564b225ad6
	Package    : xorg-x11-server-Xorg-1.9.99.1-4.20101201.fc15
	Executable : /usr/bin/Xorg
	Crash Time : Tue 22 Feb 2011 07:37:22 PM EST
	Crash Count: 1
	Hostname   : localhost.localdomain
4.
	UID        : 0
	UUID       : af03d8297f778515e4b9d7671ce0948f39c0bc50
	Package    : xorg-x11-server-Xorg-1.9.99.1-4.20101201.fc15
	Executable : /usr/bin/Xorg
	Crash Time : Tue 22 Feb 2011 07:38:16 PM EST
	Crash Count: 1
	Hostname   : localhost.localdomain
5.
	UID        : 500
	UUID       : 3e72b2d7
	Package    : gnome-shell-2.91.6-6.fc15
	Executable : /usr/bin/gnome-shell
	Crash Time : Tue 22 Feb 2011 07:38:43 PM EST
	Crash Count: 1
	Hostname   : localhost.localdomain

I will try to get an abrt report - but crossing my fingers that the machine does not kernel panic!

Comment 39 Mike C 2011-02-22 20:21:20 UTC
Having run the command abrt-cli -r 5ba it tries to download 94 debug files - but then fails with:


>> Downloading (15 of 94) pango-debuginfo-1.28.3-2.fc15. : 88 %
>> Downloading (15 of 94) pango-debuginfo-1.28.3-2.fc15. : 98 %
>> Downloading (15 of 94) pango-debuginfo-1.28.3-2.fc15. : 100 %
>> Extracting cpio from /var/run/abrt/tmp-3683-1298423672/pango-debuginfo-1.28.3-2.fc15.i686.rpm
>> Can't extract package: /var/run/abrt/tmp-3683-1298423672/pango-debuginfo-1.28.3-2.fc15.i686.rpm
>> Unpacking failed, aborting download...
>> Complete!
>! abrt-debuginfo-install exited with 2

>> Generating backtrace
>! Can't create lock file '/var/spool/abrt/ccpp-1298421417-1647.lock': Read-only file system

dbus call returned error: 'org.freedesktop.DBus.Error.NoReply'

Please can someone suggest an alternate way to try and get a backtrace from the crash?
Thanks

Comment 40 Jérôme Glisse 2011-02-22 20:23:44 UTC
Can people affected please test if mesa package at 
http://koji.fedoraproject.org/koji/taskinfo?taskID=2858301

Fix the issue

Comment 41 Adam Williamson 2011-02-22 20:29:59 UTC
mike: "Process /usr/bin/mutter was killed by signal 11 (SIGSEGV)" just means "Mutter crashed".

you can try copying the abrt files to an installed system and running abrt-gui on them there, I guess?

Comment 42 Mike C 2011-02-22 20:33:46 UTC
Jerome: I am happy to test this right now but I don't know how to get at the rpm for that scratch build?

Adam: I don't have any installed f15 systems!

Comment 43 Mike C 2011-02-22 20:41:27 UTC
Jerome: can you email me the rpm for your build 2858301? Then I will be able to test it almost immediately.

Thanks.

Comment 44 Jérôme Glisse 2011-02-22 21:07:48 UTC
I put what should be enough there
http://people.freedesktop.org/~glisse/fixmutter/

Comment 45 Mike C 2011-02-22 21:11:24 UTC
OK I will test now and report back...

Comment 46 Mike C 2011-02-22 21:22:24 UTC
I local installed the scratch build two mesa packages from #44 into the running live system and restarted X via systemctl start prefdm.service after moving /etc/gdm/custom.conf out of the way - unfortunately I get the same failure - X begins, I am offered a login by the greeter - gnome 3 appears to begin but then the flashing window with mutter crash - after going back to a VT and exploring /var/spool/abrt/

cat /var/spool/abrt/ccpp-1298427375-2115/reason 
Process /usr/bin/mutter was killed by signal 11 (SIGSEGV)

Appears again. This is the same hardware as in comment #35

Comment 47 Mike C 2011-02-22 21:25:00 UTC
I will be able to run any more suggested tests to help diagnose this for about half an hour - then I will be away for about 20 hours but should be able to test again in the evening UK time (Wednesday).

Comment 48 Mike C 2011-02-22 21:28:40 UTC
If the Xorg.log.0 file is any help I can upload it as an attachment?

Comment 49 Adam Williamson 2011-02-22 21:30:25 UTC
jerome, don't pay too much attention to mike's result, he has an NVIDIA and probably a different bug.

Radeon users hitting this bug, can you please test the updated packages? Thanks.

Comment 50 Mike C 2011-02-22 21:32:25 UTC
Created attachment 480254 [details]
Xorg.0.log from comment 45

Xorg log from after crashed mutter using scratch builds of mesa

Comment 51 Mike C 2011-02-22 21:33:35 UTC
Adam: I can test the same files in a radeon graphics laptop - will test that now.

Comment 52 Mike C 2011-02-22 21:44:38 UTC
I have just run the same test on hardware:
http://www.smolts.org/client/show/pub_f0b5804b-af65-4379-8405-e14760df29b8

and get the same crash - again I will upload the Xorg log

Comment 53 Mike C 2011-02-22 21:46:13 UTC
Created attachment 480263 [details]
Xorg.0.log for mutter crash in Radeon M300 graphics

Comment 54 Mike C 2011-02-22 21:49:29 UTC
Attachment in #53 applies when running Jerome's two scratch builds in the running live test day iso from yesterday.

Comment 55 Jérôme Glisse 2011-02-22 22:54:11 UTC
Mike you have a different segfault it seems than others. I tested my build on various gpu and mutter works with it. I will do more test with nvidia hw tomorrow.

Comment 56 Dennis Martin Herbers 2011-02-23 01:08:59 UTC
Jerome's updated Mesa packages did not fix the bug for me. This is using an AMD Radeon HD 4850 1GB and the Fedora 15 Alpha RC1 LiveCD, booting into init 3 (no Xserver), installing the two updated packages with rpm --force -i and start the Xserver with init 5. It still crashes after I log into GDM. I don't think it's related, but maybe it is... if I boot the LiveCD with basic video mode (vesa), I get the GDM loop bug that has been around about two weeks ago that many people also suffered from.

Comment 57 John Watzke 2011-02-23 01:57:31 UTC
Jerome's mesa updates didn't work for me either.  I still get the crashes.  I'm guessing they are the same problem and not another problem being uncovered but I didn't verify that the trace is identical.  If needed I can do that (perhaps tomorrow).

Comment 58 Clyde E. Kunkel 2011-02-23 03:14:09 UTC
mutter fell over on rawhide with #44 mesa updates. abrt created a new bug: 679634.


$ lspci | grep VGA
01:00.0 VGA compatible controller: ATI Technologies Inc RV630 [Radeon HD 2600 Series]

Comment 59 Fabian Deutsch 2011-02-23 08:16:26 UTC
This also happens to me with the current nightly (mutter-2.91.6-4) and a nVidia G86 / GeForce 8500 GT.

Comment 60 mario rathinho 2011-02-23 08:27:27 UTC
This Bug affects me too, since the Fedora15 Branch. Hope, that the next nightly (with new Mutter und Gnome3 Builds) will bring a solution. I´m running on ATI HD Series. Jeremies Mesa packages did not fix the bug on my machine.

Comment 61 John Watzke 2011-02-23 13:03:47 UTC
@Clyde, your bug report is actually a different signal so it is possible that the new mesa packages had some effect but also happened to uncover another issue.

This is encouraging.  Let me go try to capture some more info on the mutter crash with the new mesa packages with abrt-cli and see if the resulting bug is the same as Clyde's or something different.

Comment 62 Peter F. Patel-Schneider 2011-02-23 13:47:38 UTC
I'm also seeing crashes from mutter (on Thinkpad T60p - ATI Technologies Inc M56GL [ATI Mobility FireGL V5200) and my desktop (with Nvidia hardware).  There seem to be a lot of machines with similar problems.

Comment 63 John Watzke 2011-02-23 14:51:37 UTC
Okay, so I did some more tests.  Using Jerome's mesa packages and all the latest updates from Updates-Testing including a new Clutter.  I still get the crashes.  When working with abrt-cli, I ended up submitting two crashes for mutter.  The first matched as a duplicate to this bug and the other matched to a new bug 679809.  I apologize if 679809 ends in a dup but I thought it would be better to submit it in the hopes that it might contain more helpful information.

Comment 64 Clyde E. Kunkel 2011-02-23 15:07:27 UTC
FWIW, on same machine where failing in rawhide, Fedora 14 working with no probs.

Radeon HD 2600 Series

Comment 65 Adam Williamson 2011-02-23 17:31:18 UTC
So I'm noticing something interesting here: we have a lot of people at the Radeon Test Day with r600+ hardware reporting success.

I did pull a newer kernel build into the test day live image. I'm wondering if somehow that fixes this.

Can everyone who's experiencing this issue please test the Test Day live image - see https://fedoraproject.org/wiki/Test_Day:2011-02-23_Radeon#Live_image for the download link - and report back if they can get into Shell with that? Thanks.

Comment 66 Jérôme Glisse 2011-02-23 18:44:38 UTC
Issue seems to be related with -fno-omit-frame-pointer & gcc 4.6 local build without that flags works properly. I guess debug build would also works

Comment 67 John Watzke 2011-02-23 19:15:01 UTC
@Jerome, Would that explanation account for why the x86_64 version works perfectly fine?  Perhaps all of this is due compiler generated differences?

Comment 68 Dennis Martin Herbers 2011-02-23 19:37:22 UTC
The GFX Test Day live image (32bit) also does not work for me (RV770 / HD 4850 1GB). I think I had the problem with a 64bit build a week ago, but I might retest if other people report success with 64bit.

Comment 69 Adam Williamson 2011-02-23 20:02:07 UTC
the 32bit/64bit difference looks like it may be the one, and also has blocker implications (if 64bit works it makes this less of a blocker); if people can test 64bit and see if that works it'd help.

Comment 70 Clyde E. Kunkel 2011-02-23 20:10:37 UTC
The live test image does go into the gnome shell for me (RV600).  F15A RC1 does not; however I can force it with desktop-effects. There are two differences that I see.  First, of the suspect pgms, only the kernel and clutter:

[liveuser@localhost ~]$ uname -r
2.6.38-0.rc5.git5.1.fc15.x86_64
[liveuser@localhost ~]$ rpm -q mutter clutter mesa-libGL mesa-dri-filesystem mesa-dri-drivers mesa-dri-llvmcore mesa-libGLU xorg-x11-drv-ati
mutter-2.91.6-4.fc15.x86_64
clutter-1.6.2-2.fc15.x86_64
mesa-libGL-7.10-0.26.fc15.x86_64
mesa-dri-filesystem-7.10-0.26.fc15.x86_64
mesa-dri-drivers-7.10-0.26.fc15.x86_64
mesa-dri-llvmcore-7.10-0.26.fc15.x86_64
mesa-libGLU-7.10-0.26.fc15.x86_64
xorg-x11-drv-ati-6.14.0-2.20110204gita27b5dbd9.fc15.x86_64
 
[kunkelc@P5K-EWIFI ~]$ uname -r
2.6.38-0.rc5.git7.1.fc15.x86_64
[kunkelc@P5K-EWIFI ~]$ rpm -q mutter clutter mesa-libGL mesa-dri-filesystem mesa-dri-drivers mesa-dri-llvmcore mesa-libGLU xorg-x11-drv-ati
mutter-2.91.6-4.fc15.x86_64
clutter-1.6.6-1.fc15.x86_64
mesa-libGL-7.10-0.26.fc15.x86_64
mesa-dri-filesystem-7.10-0.26.fc15.x86_64
mesa-dri-drivers-7.10-0.26.fc15.x86_64
mesa-dri-llvmcore-7.10-0.26.fc15.x86_64
mesa-libGLU-7.10-0.26.fc15.x86_64
xorg-x11-drv-ati-6.14.0-2.20110204gita27b5dbd9.fc15.x86_64

The next difference is that I have a home directory on a separate LV that has been in constant use since Fedora 10 days and God only knows what cruft is in the gnome dirs in my home directory and it also has a kde directory in it.  I installed F15A RC1 in a clean LV, but used my existing home directory.

Comment 71 Adam Williamson 2011-02-23 20:27:44 UTC
so you didn't hit this bug in either case, with the x86-64 stuff.

Comment 72 Clyde E. Kunkel 2011-02-23 20:37:37 UTC
I did in rawhide, and still do in rawhide.  Still cannot automatically go into gnome shell in F15A RC1.

Comment 73 Adam Williamson 2011-02-23 20:47:11 UTC
is your rawhide 32-bit or 64-bit?

the detection problem is something else, i'm trying to focus on this particular bug.

Comment 74 Mike C 2011-02-23 20:48:50 UTC
I am downloading the x86-64 iso from the test day and will test it shortly with the radeon M300 machine I have. 

I just tested the current nightly desktop-i386-20110223.05.iso in this machine and it still fails the same way and won't go into gnome shell. In this test I also yum update'ed clutter and it made no difference.

Comment 75 Clyde E. Kunkel 2011-02-23 21:04:20 UTC
Rawhide = x86_64

Comment 76 Mike C 2011-02-23 21:22:01 UTC
OK - two things - one is that the machine is not capable of running x86_64 - so can't test that (I should have looked ealier)

Secondly - the retrace server script worked fine apparently but I don't know where the backtrace is?  The last part of the terminal output is:


Connecting to retrace server @ ssl://retrace01.fedoraproject.org:443... OK
Sending request... OK
Receiving response... OK
----------
HTTP/1.1 201 Created
Date: Wed, 23 Feb 2011 21:12:16 GMT
Server: Apache/2.2.15 (Red Hat)
Content-Length: 0
X-Task-Id: 987361686
X-Task-Password: kKrbqnJGWa5mdHApxLK23Extjuo21jLX
AppTime: D=48105738
AppServer: retrace01.fedoraproject.org
Connection: close
Content-Type: text/plain

----------
Cleanup... OK

This test is with a laptop with a Radeon M300 graphics chip - I booted into runlevel 3, and then moved aside /etc/gdm/custom.conf - then systemctl start prefdm.service which gave the gdm greeter and liveuser login option.
Selecting liveuser starts to run gnome3 but the failure described in the early part of this report was seen. Did ctrl-alt-2 to get a VT, then systemctl stop prefdm.service and then checked the /etc/spool/abrt directory for the crash directory.

Having identified it I then pulled in the upload.py script, made it executable and ran 
./upload.py /var/spool/abrt/ccpp-1298549157-1746 retrace01.fedoraproject.org

(this was connecting to the machine via ssh so I could copy the terminal output)

Can you see the output at the retrace server from the above information, or do I need to do something else first?

Comment 77 Mike C 2011-02-23 21:23:58 UTC
I should have added that the last test and running the retrace server script was for the mutter crash using gfx_test_week_20110221_i386.iso

Comment 78 Adam Williamson 2011-02-23 21:36:31 UTC
mike: read the instructions on the wiki page. you have to send it crafted wget commands to check status and get the backtrace back, currently. it doesn't seem be set up right to work with f15 reports atm, though. I hope abrt team will fix that.

Comment 79 Mike C 2011-02-23 21:39:46 UTC
Adam: OK - I have to go now but if the script can be fixed I can rerun this test and get a backtrace tomorrow or the next day. I presume that if it works properly then provided I know the structure of the required wget commands I can get at the file and then upload to this bz? Are the changes needed only at the server end or will the python script need amending also?

Comment 80 Mike C 2011-02-23 21:45:44 UTC
OK yes I also ran a check with the special wget command and it showed that the status was "X-Task-Status: FINISHED_FAILURE" which is presumably what you had when you tested with the data I sent at #76

If you or someone from abrt can indicate that the server runs with f15 crashes I will be able to retest almost any evening in the coming days.

Comment 81 Dennis Martin Herbers 2011-02-23 22:13:45 UTC
I can confirm that while the 32bit GFX Test Day LiveCD failed, the 64bit one works (RV770/HD4850 1GB). I have not tried the Alpha RC1 64bit, though...

Comment 82 Dennis Martin Herbers 2011-02-23 22:33:24 UTC
After installing the GFX Test Day LiveCD to hard disk and doing a full yum update, it doesn't launch Gnome Shell again after login, but there isn't any "xy has crashed" showing up at all so it might be a whole different bug...

Comment 83 mario rathinho 2011-02-23 23:18:04 UTC
+1 @adam no chance with the GFX Test Day LiveCD here.. the current nightly does'nt help :(

Comment 84 Dennis Martin Herbers 2011-02-23 23:20:23 UTC
@mario: please specify 32bit/64bit. 64bit seems to work.

Comment 85 mario rathinho 2011-02-23 23:28:11 UTC
@Dennis - running on 32bit and

lspci | grep VGA
01:00.0 VGA compatible controller: ATI Technologies Inc M92 LP [Mobility Radeon HD 4300 Series]

Comment 86 John Watzke 2011-02-24 04:43:19 UTC
@Adam, I got both the i686 and x86_64 gfx test ISOs.  There's good news and bad news.  The good news is that the i686 version didn't crash but instead went to the fallback gnome.  The bad news is that x86_64 didn't work and also fell back to the fallback gnome.

So it looks like x86_64 image has taken a step back from what I've currently got installed.

Comment 87 Adam Williamson 2011-02-24 04:54:20 UTC
just do a logout / login cycle or run gnome-shell --replace . there's a known bug with the script that tests for shell support taking too long.

Comment 88 John Watzke 2011-02-24 06:59:05 UTC
@Adam, You're correct.  I just remembered that there's that issue where gnome-shell sometimes doesn't start on first login.

So, scratch comment #86.  When I boot with the x86_64 LiveCD, I get the fallback when I first login so I logout/login and then gnome-shell comes up nicely.  However, in the i686 LiveCD, it doesn't matter how many times I logout/login.  Gnome-shell just refuses to start.  I didn't know about running gnome-shell --replace.  I'm exhausted for the night so I might try that tomorrow on the i686 LiveCD to see if that somehow makes a difference over the logout/login.

Comment 89 v.plessky 2011-02-24 07:36:16 UTC
Hello all,

I was testing Rawhide since 2011-02-15.
Mutter was crashing on two machines - one with NVIDIA, another with ATI video
Filed two bug reports:

Bug 677810 - Fedora Rawhide LiveCD - crashes on startup @ mutter
https://bugzilla.redhat.com/show_bug.cgi?id=677810

Bug 677931 - [RS480] Fedora Rawhide LiveCD - crashes on startup @ mutter : in r300_dri.so
https://bugzilla.redhat.com/show_bug.cgi?id=677931

Both bug reports have attachments with: 
* session-errors 
* /var/log/messages 
* Xorg.0.log 

Hope this would help to catch bug with mutter

Comment 90 John Watzke 2011-02-25 02:52:33 UTC
(In reply to comment #88)
> I didn't know about running gnome-shell --replace.  I'm exhausted for the
> night so I might try that tomorrow on the i686 LiveCD to see if that 
> somehow makes a difference over the logout/login.

Just tried 'gnome-shell --replace' on the i686 LiveCD since it kept falling back and mutter promptly crashed.  I'll await the next things to try.

Comment 91 Adam Williamson 2011-02-25 04:40:46 UTC
so we rejected this as blocker due to the 32-bit only nature of the bug, but now we've slipped, I'd like to propose it as NTH so we can take a fix if X/gcc teams can come up with one soon. We do ship the 32-bit image and people use it, so we should probably fix this if we can.

Comment 92 v.plessky 2011-02-25 06:51:20 UTC
I installed yesterday OpenBox WM, and GNOME is starting weell via GNOME/OpenBox session.
May be this a good workaround for 32-bit images?
Adding OpenBox to Desktop-* would allow to test GNOME.

Comment 93 Adam Williamson 2011-02-25 07:28:51 UTC
not really a great workaround, no. the whole point is to test the main intended desktop, not a workaround mode.

Comment 94 v.plessky 2011-02-25 08:03:06 UTC
My understanding of GFX Test Days
https://fedoraproject.org/wiki/Test_Day:2011-02-23_Radeon
(and also for Nouveau, Intel video)
is that during those days people test different video adapters with latest X server and video drivers.
That's why I decided to participate.

To do test of glxgears, you don't need mutter :)
Same is valid about dpms, XVideo extension and Rotate.

I switched to LXDE on low-power (3-4 years old) computers.
And I believe it is enough for 90% of users (Firefox browser, file manager, image viewer)
Really don't understand why it's needed to test "gnome3 - ALL Radeon cards should support Shell"
Well, of course all DEs should be able to work correctly.
But Gnome3 should not be a mimimum/rquiremtns tfor Xorg testing.

BTW: I didn't know what is "mutter" before GFX Test Day/nightly-compose ISO image test, and filing 1st bug report against it.

Comment 95 Adam Williamson 2011-02-25 08:06:55 UTC
I was talking about Alpha release criteria. The X Test Days are over now.

Comment 96 v.plessky 2011-02-25 08:26:26 UTC
When bugs reported during X Test Days supposed to be fixed?
If those bugs are fixed, mutter/GNOME3 would work fine.

Than there is no need for GNOME3/mutter work-around.

Comment 97 Mike C 2011-02-25 10:01:24 UTC
(In reply to comment #91)
> so we rejected this as blocker due to the 32-bit only nature of the bug, but
> now we've slipped, I'd like to propose it as NTH so we can take a fix if X/gcc
> teams can come up with one soon. We do ship the 32-bit image and people use it,
> so we should probably fix this if we can.

I think a little differently - it will likely hurt quite a lot of people who are intending to run f15 on machines which cannot run x86_64 - and if so it will hurt Fedora so it would be much better if this was fixed before final release.  Or were you suggesting this might not be fixed for alpha but certainly fixed for final?

Comment 98 Adam Williamson 2011-02-25 16:18:26 UTC
Mike: yes.

plessky: please stop filling up Bugzilla with unnecessary discussion, ask questions and so on on a mailing list or IRC. I'm not going to answer your questions as it's only going to clutter up this report further and make it harder to work on. Sorry.

Comment 99 Bruno Wolff III 2011-02-25 18:13:16 UTC
From alpha blocker meeting:
#agreed 677842 remains non-blocker, is accepted as NTH for Alpha

Comment 100 Mike C 2011-02-25 18:33:03 UTC
I just did a little more exploration - I booted the gfx graphics live test day iso to runlevel 3, and moved /etc/gdm/custom.conf aside by going to VT2 and logging in as root.

Then went to VT 1 and logged in as liveuser, and then ran startx and logged in to the graphical gnome3 and waited for the mutter crash.

Going back to VT2 I stopped prefdm.service as root, and then went back to VT1

Now there appears a whole bunch of lines in VT1 which I had not seen previously because of the sequence of steps I used to try to start gnome3.

Anyway it was too much to write out, and I can't see the key lines in the log files - but what seemed to be potentially useful were:

Gtk message: failed to load module "pk-gtk-module"
and
mutter warning: could not load library  user-lib-mutter-plugins/libgnome-shell.so
undefined symbol gjs-context-maybe-gc

and 
window manager warning - log level 16 - AT-SPI accessibility bus not found - using session bus

I had to copy these by hand as more stuff started scrolling into the console - and I could not get a photo of the screen with the camera I have available.

If this is any help, or if other lines are worth quoting from this let me know - or if there is a way to capture the output from gnome3/mutter that generated this then I am happy to do it and upload the resulting file. However I am on the edge of my knowledge with this!

Comment 101 Adam Williamson 2011-02-25 18:44:30 UTC
So, it seems fairly clear this is an issue in the compile toolchain. Re-assigning to gcc. Jakub, see comment #66 - "Issue seems to be related with -fno-omit-frame-pointer & gcc 4.6 local build without that flags works properly. I guess debug build would also works". The scenario here is that when booting a 32-bit image on affected systems, clutter crashes on login, but booting a 64-bit image, it works. The bug seems to be in the compilation on 32-bit with -fno-omit-frame-pointer parameter set.

We have a ticking clock here - Alpha RC2 will likely be built tonight. If a fix for this is available and tested by then it will go in, if not, it won't. So can X and toolchain devs please get together and come up with a fix today if possible? It'd be really good to have this fixed by Alpha.

I'm reasonably sure this bug or a very close facsimile also affects some NVIDIA hardware: see https://bugzilla.redhat.com/show_bug.cgi?id=677810 , where again, testers report x86-64 working, i686 crashing in mutter. So our hardware impact appears to be more than just Radeon r600+ now, and may potentially include hardware that really doesn't support x86-64. Haven't yet looked into Intel cases. But there's clearly something wrong here.

Comment 102 Adam Williamson 2011-02-25 18:45:43 UTC
mike: we really don't need any more investigation, thanks. we have a clear handle on the issue here, it is a compile-time bug affecting 32-bit compiles only. messages you see are incidental and have nothing to do with this bug. the backtrace from abrt is sufficient to identify and diagnose it.

Comment 103 Mike C 2011-02-25 18:48:44 UTC
Created attachment 481063 [details]
Error output from mutter crash in VT

Comment 104 Mike C 2011-02-25 18:50:05 UTC
OK sorry Adam - I sent before seeing your comments - #103 is from running 
"startx 2>gnome-mutter.txt"

Maybe it helps confirm your suspicions?

Comment 105 Jakub Jelinek 2011-02-25 19:15:25 UTC

*** This bug has been marked as a duplicate of bug 679924 ***

Comment 106 Adam Williamson 2011-02-25 21:08:48 UTC
There's now a candidate fix for this! Can anyone affected please test:

http://koji.fedoraproject.org/koji/buildinfo?buildID=230450

*quickly*? Thanks. Obviously you must test the 32-bit build not the 64-bit one :) We need test results in the next few hours to get this into the Alpha. Thanks again.

Comment 107 Dennis Martin Herbers 2011-02-25 21:22:19 UTC
Good news :-)

I can confirm this fixes the problem! I used the Fedora 15 Alpha RC1 32-bit LiveCD. AMD Radeon HD 4850 1GB (RV770).

Comment 108 Adam Williamson 2011-02-25 21:55:15 UTC
that's awesome. we really need people to add karma to the update:

https://admin.fedoraproject.org/updates/mesa-7.10-0.27.fc15

log in using the blue 'log in' link on the left, with a FAS account, or else your karma won't count. thanks a lot!

Comment 109 v.plessky 2011-02-26 07:55:57 UTC
I logged in.
But where is the link to latest image with mesa update?

Comment 110 Mike C 2011-02-26 10:02:22 UTC
Looking at the timestamps I guess mesa-7.10-0.27.fc15 did not quite make it into the alpha build?  However presume that the next graphics test days will have isos that include the fix now - and it is very nice to see this resolved!

Comment 111 v.plessky 2011-02-26 12:31:43 UTC
Downloaded and successfully booted with desktop-i386-20110226.02.iso image (from http://alt.fedoraproject.org/pub/alt/nightly-composes/desktop/ ) on Acer 6935G (NVIDIA-Nouveau).
This LiveCD has Mesa 7.10-0.26.fc15.i686

'+': GNOME desktop starts, no mutter crash
'-':  glxgears crash
      gnome-control-center crash (Select "Network settings", repeatable)

Comment 112 Adam Williamson 2011-02-26 15:04:09 UTC
"Looking at the timestamps I guess mesa-7.10-0.27.fc15 did not quite make it
into the alpha build?"

You presume wrong.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 113 Mike C 2011-02-26 15:39:10 UTC
Am delighted to stand corrected!  Great - I'll check out the release later today.

Comment 114 Mike C 2011-02-26 19:23:35 UTC
Beautiful - F15 alpha RC2 works like a charm - really great to see this resolved.

Comment 115 Owen Taylor 2011-05-25 21:48:11 UTC
*** Bug 679809 has been marked as a duplicate of this bug. ***