Description of problem: canberra-gtk-play crashes at desktop startup, presumably trying to play the gnome login jingle. Version-Release number of selected component (if applicable): libcanberra-0.11-8.fc11 How reproducible: every time Steps to Reproduce: 1. boot F11 Beta and login to gdm Actual results: Bug Buddy dialog appears Expected results: jingle to play and no crash
Created attachment 337199 [details] canberra-gtk-play-bugreport.txt this is the info saved by bug-buddy. I am running inside qemu-kvm fwiw.
Uh. Strange issue. Any chance you can get me a stack trace?
I can try to obtain it provided it'll still be reproducible after pulling today's updates. There is also a shitload of selinux denials during logging in related to pulseaudio, so this could be related.
Created attachment 337232 [details] Bug Buddy report with appropriate debuginfo packages installed This happens with up-to-date rawhide as well. SELinux denials are gone, they must have been unrelated.
Given that both cases this happened are in a vm and I cannot make the slightest sense of this I am tempted to say that this is a bug in kvm in some way. Maybe we should CC someone from the kvm folks?
Hmm, this is probably related to PTHREAD_PRIO_INHERIT in some way.
It seems that when PTHREAD_PRIO_INHERIT is set for a mutex sometimes pthread_mutex_unlock() fails for no apparent reason. Changing PA to not use PTHREAD_PRIO_INHERIT makes the problem go away, as it seems. Reassigning to kernel.
Created attachment 337239 [details] Patch that makes the problem go away Pulseaudio packages for i386 with the said patch included are available here: http://belegdol.fedorapeople.org/pulse They're i386 instead of i586 since I didn't know how to build i586 ones on my Fedora 10 x86_64, but I hope it does not matter here.
So this is a pulseaudio bug then?
This patch only disables use of PTHREAD_PRIO_INHERIT. Lennart also suspected some problems with kvm, as both I and Jens are seeing the issue inside kvm virtual machine.
(In reply to comment #9) > So this is a pulseaudio bug then? No, I am pretty sure this is unrelated to PA. That's why I reassigned this to the kernel. Might be a bug in KVM, otherwise in the kernel or in nptl. No clue.
A partial ""workaround"" is to remove bug-buddy ;) - that at least stops metacity crashes locking up the desktop everytime there is a desktop sound event or beep.
I am not sure if the severity of this bug is appreciated - it basically means that fedora 11 desktop will not run on qemu/kvm.
For reference, this still happens with a fresh install of preview release.
Is there a simple test program available that demonstrates this problem?
totem dies on startup as well, but I'm not sure if for the same reason.
There isn't going to be any progress on this bug without a test program that can be used to reproduce the problem.
Created attachment 341873 [details] Updated bug-buddy report This happens as a result of running: /usr/bin/canberra-gtk-play --id="desktop-login" --description="GNOME Login"
Created attachment 341874 [details] Totem bug-buddy report And this one is just for starting totem. I know these aren't dedicated test cases, but at least the problem is reproducible 100 % of the times.
I guess pressing Tab inside a terminal is not good enough. How to echo BELL inside a shell or from C?
I'm not convinced. What hardware is this on? Are you using VT or SVM?
You mean the host? Core 2 Duo T7200.
I cannot reproduce this using current rawhide for both guest and host on a Core 2 Q6600 based system. Is this still an issue with all updates?
It is, but I'm using F-10 as a host.
Created attachment 342006 [details] Metacity backtrace Turns out that this bug can cause metacity to lock up as well: http://thread.gmane.org/gmane.linux.redhat.fedora.devel/111520
Created attachment 342008 [details] Metacity backtrace
What kernel version on the F-10 host? Does 2.6.29 make it go away?
kernel-2.6.27.21-170.2.56.fc10.x86_64. kernel-2.6.29.1-42.fc10.x86_64 from updates-testing does not help.
Sigh, given this doesn't appear to happen with F-11, I'm going to go out on a limb and say it's not the kernel. The changes between the F-10 and F-11 2.6.29 kernels are minimal. Mark, Avi, any thoughts what the cause could be?
My wild guess is that it could be kvm.
(In reply to comment #29) > Sigh, given this doesn't appear to happen with F-11 Erm, who said it is not happening with F11: I can reproduce with both F10 and F11 hosts.
(In reply to comment #23) > I cannot reproduce this using current rawhide for both guest and host on a Core > 2 Q6600 based system. Is this still an issue with all updates? Well I can: rawhide-i386 host and guest on a Dell Precision 390.
I presume Justin's box is 64-bit, sigh. 32-bit specific? Nice.
http://kyle.fedorapeople.org/pthread_prio_inherit_test.c Can you try running this in your guest? Should build with: kyle@ihatethathostname ~ $ gcc -D_XOPEN_SOURCE=500 -lpthread -o pthread_prio_inherit_test pthread_prio_inherit_test.c Thanks, Kyle
(In reply to comment #34) > http://kyle.fedorapeople.org/pthread_prio_inherit_test.c Yes, runs fine I think: $ ./pthread_prio_inherit_test lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... lock... acquired... released... :
A few more data points: I can't reproduce with rawhide-i386 live image guests FWIW. I have reproduced on F-10 x86_64 and F-11 i386 hosts (both with i386 guest). I will try testing a 64bit guest later.
I find it very easy reproduce now (I mean even after removing bug-buddy;) by just pressing Tab a few times in gnome-terminal and I see metacity restart which causes the gnome-terminal to lose focus. (But I guess if one doesn't get the initial metacity lockup on a fresh guest install say then one can't reproduce.)
(In reply to comment #36) > I will try testing a 64bit guest later. Anaconda has been preventing me from doing this yet unfortunately...
Created attachment 342559 [details] test program with two threads at different priorities This might be a more realistic test program. Compile the same as the previous one: gcc -o pmutex -O2 -lpthread -D_XOPEN_SOURCE=500 pmutex.c
Thanks, pmutex.c also seems fine: : parent acquired...child acquired...released released parent acquired...child acquired...released released parent acquired...child acquired...released released parent acquired...child acquired...released released parent acquired...child acquired...released released child acquired...parent acquired...released released child acquired...parent acquired...released released child acquired...parent acquired...released released child acquired...parent acquired...released released child acquired...parent acquired...released released child acquired...parent acquired...released released child acquired...parent acquired...released released :
Okay - finally got a x86_64 rawhide guest and I can't reproduce any crashes yet there. So making this bug under i386 arch.
I should probably have added I can still reproduce this on i386 rawhide guest.
*** Bug 493801 has been marked as a duplicate of this bug. ***
Is this happening on both Intel and AMD processors or just one flavor?(In reply to comment #40) > Thanks, pmutex.c also seems fine: > > : > parent acquired...child acquired...released > released > parent acquired...child acquired...released > released Strange, when I run it I get: parent acquired...released child acquired...released parent acquired...released child acquired...released parent acquired...released child acquired...released
okay, just reproduced this in a 32 bit KVM guest on a 64 bit host (all latest rawhide)
this happens with 32 bit SMP and UP guests chatting to avi on IRC, he suspects it might be related to a mismatch of cpuid features in the host and guest: host: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm tpr_shadow vnmi flexpriority guest: fpu de pse tsc msr pae mce cx8 apic pge cmov pat mmx fxsr sse sse2 up pni hypervisor cx16 is missing in the guest
This sounds awfully likely (cx16 being cmpxchg16) although, I can't imagine how... glibc's libpthread doesn't use cmpxchg16 as near as I can tell, nor does qemu-kvm (and, indeed, only appears to be emulated on when building qemu for target x86_64?) cheers, Kyle
(In reply to comment #44) > Is this happening on both Intel and AMD processors or just one flavor? I have only tested on Intel (and don't have any AMD currently). (In reply to comment #46) > cx16 is missing in the guest So that changed between F10 and F11? (Nothing to do with move to kernel-PAE.i686 in f11?)
There's a thought... does using kernel.i586 in F-11 help at all?
Seems like it does. With 2.6.29.2-126.fc11.i586 the problem seems to go away, while it still happens with the corresponding PAE kernel. I'm running F-10 x86_64 as a host if that matters.
Same here: host is x86_64 f10, with the non-PAE kernel 2.6.29.2-126.fc11.i586 the problem is gone. With the PAE kernel same problem here. Physical CPU is AMD 4050e.
Well, now - that narrows things down a bit
Important to note the PAE kernel is i586, so it's i586/non-PAE vs. i686/PAE
Obviously, if someone had time to build and try with an i686/non-PAE kernel, that would help a lot
Suggestions from avi: - strace the failing tasks, look for errors on the futex ops - try playing with the clocksource
Mark, I'll do a build and update this bug with a link to the rpms.
http://koji.fedoraproject.org/koji/taskinfo?taskID=1348053 ^- is the scratch build. Should be cooked in an hour or so.
The kernel linked to in comment #57 seems to work correctly.
Well shit... I wonder how long this has been broken, does i686-PAE F-10 kernels work? What about the vanilla i686 flavour on F-10? (We only killed the seperate i686 flavour for F-11...) My guess is this is a kvm bug though. :/
Sorry for the silence: (In reply to comment #59) > does i686-PAE F-10 kernels work? Naively I tried testing f10 kernel-PAE's on f11 but the boot hangs loading syslog... I will try kernel-PAE on an f10 guest tomorrow. > What about the vanilla i686 flavour on F-10? I am pretty sure that works ok, as does kernel.i586.
Avi suggests this futex fix might help: http://lkml.org/lkml/2009/5/18/225
http://koji.fedoraproject.org/koji/taskinfo?taskID=1361292 please try this scratch build which contains markmc's fix.
(In reply to comment #62) > http://koji.fedoraproject.org/koji/taskinfo?taskID=1361292 > > please try this scratch build which contains markmc's fix. Thanks Kyle; doesn't seem to help, though
Yeah, the kernel linked to in comment #62 does not help with this problem.
Also, it seems there are some issues with PA (?) when using plain i586 kernel anyway. It is impossible to e.g. make rhythmbox play an audio file, it'll loop the first few dozen miliseconds infinitely. With PAE kernel, though, any app trying to play sound through PA will crash, so this might be unrelated. pulseaudio -vvvv will says something about possible alsa bug.
Created attachment 344519 [details] pulseaudio log Output of pulseaudio -vvvv running for a few moments. Lennart, is this related or rather a separate issue?
Answering to myself: it seems like these issues are unrelated, bugs #475236 and #497392 have more info.
As Linus points out in the comments on that patch, it's... crap. A better one from Thomas Gleixner is at: http://lkml.org/lkml/diff/2009/5/18/370/1 Please try a new scratch build at: http://koji.fedoraproject.org/koji/taskinfo?taskID=1361745 (As an aside, I really hope this fixes it, otherwise there's a whole slew of futex fixes to try and backport.)
While I'm at it, can you guys try the 2.6.30-rc$n kernels as they come out in dist-f12, would help narrowing this down when that patch from tglx gets upstream if it isn't the culprit.
(Ah btw I worked out why Live is ok - of course it is using kernel.i586!;)
(In reply to comment #68) > Please try a new scratch build at: > http://koji.fedoraproject.org/koji/taskinfo?taskID=1361745 Same for me - I still metacity crashing (after removing bug-buddy).
I tried kernel-2.6.30-0.81.rc5.git1.fc12 too and it seems to crash/lockup for me even more. gdm login locks ups quite see and gnome-terminal immediately. Rhythmbox crashed as soon as I played something.
I am just noting (and also wondering why) this bug was removed from the blocker list.
(In reply to comment #60) > I will try kernel-PAE on an f10 guest tomorrow. I tested kernel-PAE-2.6.27.21-170.2.56.fc10 guest without any problems - looks fine to me.
Reassigning to kvm.
Er, sorry, perhaps there's been some confusion. The kernels I posted through this have been for testing as guests, not hosts. Since that would be where the problem would be given it's hanging in futex code, and not somewhere else. That said, I've got another build which attempts to disable the feature bit for PAE, who knows if it will help... but it should be booted as the host kernel. http://koji.fedoraproject.org/koji/taskinfo?taskID=1362223
(In reply to comment #76) > The kernels I posted through this have been for testing as guests, not hosts. No confusion - all my results above today are for PAE guests on f10 x86_64 host.
(In reply to comment #75) > Reassigning to kvm. Um, why? (The kvm package doesn't even exist in F-11)
The kernels from comment #76 do not help either.
(In reply to comment #73) > I am just noting (and also wondering why) this bug was removed from the blocker > list. Fair question, the reasoning is: 1) It doesn't affect anaconda installs; they use an i586 2) It doesn't affect live installs; they also use an i586 kernel 3) So, this only affects people trying to use the desktop in a 32 bit KVM guest. Not a large enough class of users to block the release and the workaround is to replace kernel-PAE.i686 with kernel.i586 4) We aren't remotely close to figuring out what the problem here is, so we'd be talking about delaying the release indefinitely That's not to say this isn't a very serious bug. It certainly is.
Tested the F10 PAE kernel, it is broken. Jens, you reported it works. Can you retest?
It's a shadow mmu problem. futex_init() dereferences a NULL pointer, expecting it to fault, but it doesn't. This disabled most futex ops.
(In reply to comment #81) > Tested the F10 PAE kernel, it is broken. > Jens, you reported it works. Can you retest? Hmm, what is the correct way to test? :) I am running the latest f10 kernel-PAE-2.6.27.21-170.2.56.fc10 and don't see metacity crash when I tab complete in gnome-terminal, but I guess there is a more technically correct way to test. ;)
I just ran that kernel and canberra-gtk-play crashed on me. For example 'canberra-gtk-play -i 0' should crash.
Hmm dunno, for me I hear the login sound theme jingle when I start my desktop session.
looks like kvm_flush_tlb() is the culprit.
Okay, the kernel is originally mapped at low addresses, and then moved to PAGE_OFFSET. While this is done pdpte[0] == pdpte[3] in order to have identical mappings. Later, the kernel drops pdpte[0] to unmap low addresses and tell the cpu by flushing the tlb. However, the kvm paravirt tlb flush doesn't check pdptrs (they aren't really part of the tlb, but are reloaded as a side effect of the mov cr3 instruction). So the low addresses remain mapped, and the futex test fails.
Created attachment 345100 [details] host kernel fix Please test the attached patch. Apply to host kernel!
http://koji.fedoraproject.org/koji/taskinfo?taskID=1370732 please test the scratch build found here. Thanks! Kyle
Will this kernel work on F-10 host? If not, could you please provide a patched F-10 kernel as well?
It should, yes. Just remove it afterwards to get back on the 2.6.27 track.
The issue is still present with F-10 host running the kernel from comment #89 and guest running the PAE kernel.
It worked for me. Please provide: - host uname -a - guest uname -a - host /proc/cpuinfo - what you are doing to test, exactly
(In reply to comment #93) > It worked for me. > > Please provide: > - host uname -a Linux snowball 2.6.29.3-157.bz492838.fc11.x86_64 #1 SMP Fri May 22 11:35:33 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux > - guest uname -a Linux localhost.localdomain 2.6.29.3-155.fc11.i686.PAE #1 SMP Wed May 20 17:31:09 EDT 2009 i686 i686 i386 GNU/Linux > - host /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz stepping : 6 cpu MHz : 1000.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow bogomips : 3990.58 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz stepping : 6 cpu MHz : 1000.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow bogomips : 3989.82 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: > - what you are doing to test, exactly Three things: - canberra-gtk-play crashes at log-in - totem crashes at startup - rhythmbox crashes when trying to play a file
Avi, are you testing with F11 host? Maybe the newer kvm there has an influence on this problem?
Ok, I tested 2.6.30 (which worked) and the wrong F11 guest kernel (which also worked). So there's an additional bug in there.
There is; we need to reload the PDPTEs when cr4 is reloaded.
Created attachment 345263 [details] patch to reload cr3 when cr4 is reloaded additional host kernel fix attached. Kyle, please spin a new test kernel with this patch in addition to the previous one.
I hate this bug.
http://koji.fedoraproject.org/koji/taskinfo?taskID=1375088 please test the new kernel available here. Avi, the diff didn't apply as kvm_mmu_reset_context is split in git head... I hope the patch is still correct...?
The patch is still correct (assuming it still adds the new lines to the end of kvm_set_cr4()).
(In reply to comment #100) > http://koji.fedoraproject.org/koji/taskinfo?taskID=1375088 Aha that seems to fix it for me! :) I tested rawhide-i386 guest with above kernel.i586 host and haven't seen any sound crashes yet. Maybe someone else can also confirm?
Looks like it works with F-10 x86_64 host as well. Congrats, Avi.
Created attachment 345298 [details] replacement for second patch Attached patch replaces my previous second patch. Should be functionally identical but adheres more closely to the spec.
http://koji.fedoraproject.org/koji/taskinfo?taskID=1375672 new scratch build with the replacement to the second patch. Let me know if this is the one we want to put in F-10/F-11. cheers, Kyle
It is. Marcelo reviewed it, and I am going to upstream it shortly. Thank you for flying this bug report, we hope you will select us again for your next crash.
[fwiw, I finally figured out that I did have a VT machine and could reproduce this. Confirmed it fixes it for me as well.] Great, I've committed this for F-11, hopefully it's not too late to tag for release. thanks for your help Avi, Jens, Julian.