Bug 152386
Summary: | gij/eclipse sets SCHED_RR scheduling policy, hogs machine. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | taj <taj> | ||||||||
Component: | gcc | Assignee: | Jakub Jelinek <jakub> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 4 | CC: | andrej, dledford, dwmw2, eclipse-bugs, ekanter, fedora, girish_panchal, greenrd, hugo.mey, jonathan.taylor, jsutton1027, mandhro, manjunathan_py, marcus, mlists, m.sazynski, overholt, pramod_nic, sacntct, selinux, srini.listmail, stemeri, taj, tromey, veliks, wtogami | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | gcc-4.0.1-1 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2005-07-27 20:49:52 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 136451 | ||||||||||
Attachments: |
|
Description
taj
2005-03-29 04:59:18 UTC
Created attachment 112409 [details]
netdump showTask and several showPc's
Andrew, has your team seen anything like this? Looks like perhaps another report of this on the fedora-devel-java-list. https://www.redhat.com/archives/fedora-devel-java-list/2005-March/msg00265.html To summarize the mailing list thread, Eclipse runs fine if invoked under "strace -ff". A gcj/gij bug then? It looks like there is another report of this issue. I think its the same bug at least. If you dont jump out of X before it hangs, sysrq will appear to not work. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=154532 I just confirmed that this bug is indeed unique to user root. Now 1226 kernel with eclipse 3.1.0fc-0.M5.14. This laptop is just a scrapper on the side so I had been using root. The bug is not reproducable from a normal user account. Any news here from the kernel side of things? We're at a loss as to how gij could be causing this. Andrew Haley said that the only way he can imagine gij causing it is if it was forking too fast. Could that be what's happening here? *** Bug 154532 has been marked as a duplicate of this bug. *** *** Bug 157649 has been marked as a duplicate of this bug. *** Does it still occur if you run with selinux disabled? Can some of the reporters who've seen this please try? I don't have a second box on which to try this ATM. booting with selinux=0 does not get past the problem. now using glibc 2.3.5-6 kernel 2.6.11-1.1290_FC4 eclipse 3.1.0_fc-0.M6.15 /proc/cmdline root=/dev/hda2 selinux=0 Changed arch to all. I have reproduced this on x86_64 before as dledford has. severity should be changed to high if this is going to ship in fc4 without a workaround in the eclipse front script. something like: [ -x /usr/bin/id ] || return [ `/usr/bin/id -u` -eq 0 ] && barf The eclispe script workaround is not going to be used. https://www.redhat.com/archives/fedora-devel-java-list/2005-May/msg00050.html Changing priority to high. Adding a workaround to the script is bad because it will only hide the problem. The real cause must be found and corrected. *** Bug 157083 has been marked as a duplicate of this bug. *** *** Bug 160455 has been marked as a duplicate of this bug. *** Dear Team, I too face the same problem in the latest stable version of Fedora Core 4 (13th June 2005 release ) Details : Kernel: 2.6.11-1.1369_FC4smp gij (GNU libgij) : 4.0.0 20050519 (Redhat 4.0.0-8 ) native eclipse : Powered on GCJ based on Wclipse 3.1M6 Reproducible as root user only As a normal user it work fine. Created attachment 115580 [details]
eclipse startup log
Created attachment 115581 [details]
strace eclipse > strace-eclipse.txt 2>&1 &
Also find this bug in FC4 release under i386 on Athlon XP 1.6/ log on as root *** Bug 160848 has been marked as a duplicate of this bug. *** This has happened 3 times out for 3 for me, gets to the "welcome" screen and hangs the box, do you guys have enough test cases, or do you want me to repeat the logging that others have used? This has happened 3 times out for 3 for me, gets to the "welcome" screen and hangs the box, do you guys have enough test cases, or do you want me to repeat the logging that others have used? Me too, I'm using Dell Latitude D500 + Intel I855GM and official FC4. Note that, it also happened with official Eclipse 3.0.2 (linux-gtk) from Eclipse.org running with official JDK1.5.0_03 from java.sun.com. I'm running eclipse as root. Me too, I'm using Dell Latitude D500 + Intel I855GM and official FC4. Note that, it also happened with official Eclipse 3.0.2 (linux-gtk) from Eclipse.org running with official JDK1.5.0_03 from java.sun.com. I'm running eclipse as root. Sorry, the bug is not occured with official Eclipse + JDK. I'm remove java development packages but it seems that /usr/bin/java and gij is not completely removed. My stupidity. *** Bug 161165 has been marked as a duplicate of this bug. *** I'm going to try to get together with David Woodhouse this week to try to find out what's going wrong. SysRq : Show Regs Pid: 4953, comm: java EIP: 0060:[<c01108f2>] CPU: 0 EIP is at get_offset_tsc+0x2/0x17 EFLAGS: 00000282 Not tainted (2.6.11-1.1369_FC4smp) EAX: 4b80a015 EBX: 00925f64 ECX: 00000000 EDX: 0000037f ESI: b60ff220 EDI: 7fffffff EBP: 00925f64 DS: 007b ES: 007b CR0: 8005003b CR2: afe80ee8 CR3: 2efe3520 CR4: 000006f0 [<c010880c>] do_gettimeofday+0x20/0xd0 [<c0125d6f>] sys_gettimeofday+0x14/0x58 [<c0104025>] syscall_call+0x7/0xb SysRq : Show Regs Pid: 4942, comm: java EIP: 0073:[<b76a26c0>] CPU: 1 EIP is at 0xb76a26c0 ESP: 007b:b3efd23c EFLAGS: 00000206 Not tainted (2.6.11-1.1369_FC4smp) EAX: 00cefa60 EBX: b34bd6cc ECX: 00000008 EDX: 00000000 ESI: fffda1d1 EDI: 7fffffff EBP: b3efd278 DS: 007b ES: 007b CR0: 8005003b CR2: af481eb8 CR3: 2efe3520 CR4: 000006f0 SysRq : Show Regs Pid: 4942, comm: java EIP: 0060:[<c0103ff5>] CPU: 1 EIP is at system_call+0x9/0x32 EFLAGS: 00000282 Not tainted (2.6.11-1.1369_FC4smp) EAX: 0000004e EBX: b3efd220 ECX: 00000000 EDX: b7d4f954 ESI: fffd9ee9 EDI: 7fffffff EBP: b3efd238 DS: 007b ES: 007b CR0: 8005003b CR2: af481eb8 CR3: 2efe3520 CR4: 000006f0 eclipse S 00000001 2312 4918 4194 4929 (NOTLB) cd0b2f5c 00000086 c011963a 00000001 00000000 00000007 00000002 00000020 0015abfd 00000432 c01b93d7 00002820 ef736144 ef736020 c035ac20 c1608160 00162e54 00000432 00000000 cd0b2000 ffffffff 00000004 00000004 ef736020 Call Trace: [<c011963a>] do_page_fault+0x269/0x6a7 [<c01b93d7>] avc_has_perm+0x4e/0x67 [<c01ba480>] task_has_perm+0x2a/0x2e [<c0125027>] do_wait+0x264/0x3a2 [<c011d054>] scheduler_tick+0x23b/0x414 [<c011d22d>] default_wake_function+0x0/0xc [<c0125211>] sys_wait4+0x35/0x39 [<c0104025>] syscall_call+0x7/0xb java S 00000040 2248 4929 4918 4930 (NOTLB) c5ba5eb8 00200082 00000002 00000040 00200246 0000002c 00000093 00000020 6ac0c8f5 00000439 000000d0 00161c81 ef7da164 ef7da040 ef6cc550 c1608160 6ac1a6ab 00000439 00000000 c5ba5000 c1608ac0 00000000 c5ba5ea8 ef6cc550 Call Trace: [<c0307f0f>] schedule_timeout+0xcd/0x100 [<c01349cf>] add_wait_queue+0xf/0x30 [<c02cc1cb>] tcp_poll+0x13c/0x1a2 [<c017387d>] do_select+0x2a7/0x342 [<c0173441>] __pollwait+0x0/0x97 [<c0173b0f>] sys_select+0x1df/0x38d [<c0161e49>] vfs_read+0x10a/0x10e [<c0104025>] syscall_call+0x7/0xb java S C0450A24 2692 4930 4918 4937 4929 (NOTLB) dae13eb4 00000082 00000001 c0450a24 00000000 00000080 dae13e9c 00000020 24fae2ec 00000439 ef7da040 000121b1 ef795674 ef795550 efe83550 c1610160 2501ae8e 00000439 00000001 dae13000 ef7da040 c0402200 ef7da040 c1608160 Call Trace: [<c0307f0f>] schedule_timeout+0xcd/0x100 [<c01565f2>] find_extend_vma+0x12/0x4f [<c0137aa2>] get_futex_key+0x38/0x133 [<c0138085>] unqueue_me+0x69/0xaa [<c01349cf>] add_wait_queue+0xf/0x30 [<c01382b4>] futex_wait+0x1ee/0x21b [<c011d22d>] default_wake_function+0x0/0xc [<c013856d>] do_futex+0x4b/0x7c [<c01385ee>] sys_futex+0x50/0x108 [<c0106641>] do_IRQ+0x55/0x86 [<c01167d0>] smp_apic_timer_interrupt+0xb6/0xce [<c0104025>] syscall_call+0x7/0xb java S 00000246 2812 4937 4918 4938 4930 (NOTLB) ecd64eb4 00000082 c01539e1 00000246 0f3dd067 00000000 c92507c0 00000020 2330dd1f 00000439 ef7da040 000008b1 ef45e144 ef45e020 ef795550 c1610160 233ef748 00000439 00000001 ecd64000 c1610ac0 00000000 c01037cf ef795550 Call Trace: [<c01539e1>] do_anonymous_page+0x5c/0x196 [<c01037cf>] setup_sigcontext+0xe3/0x122 [<c0307f0f>] schedule_timeout+0xcd/0x100 [<c01565f2>] find_extend_vma+0x12/0x4f [<c0137aa2>] get_futex_key+0x38/0x133 [<c01349cf>] add_wait_queue+0xf/0x30 [<c01382b4>] futex_wait+0x1ee/0x21b [<c011d22d>] default_wake_function+0x0/0xc [<c013856d>] do_futex+0x4b/0x7c [<c01385ee>] sys_futex+0x50/0x108 [<c01035f5>] sys_sigreturn+0xc3/0xd4 [<c0104025>] syscall_call+0x7/0xb java S 00000246 2496 4938 4918 4942 4937 (NOTLB) c66fceb4 00000082 00000001 00000246 00000040 c66fcebc c66fcefc 00000020 2330beb9 00000439 c010a9f4 000012b3 eff3c694 eff3c570 ef7da040 c1608160 233e6a27 00000439 00000000 c66fc000 c1608ac0 00000000 c01037cf ef7da040 Call Trace: [<c010a9f4>] convert_fxsr_to_user+0xe6/0x17b [<c01037cf>] setup_sigcontext+0xe3/0x122 [<c0307f0f>] schedule_timeout+0xcd/0x100 [<c01565f2>] find_extend_vma+0x12/0x4f [<c0137aa2>] get_futex_key+0x38/0x133 [<c01349cf>] add_wait_queue+0xf/0x30 [<c01382b4>] futex_wait+0x1ee/0x21b [<c011d22d>] default_wake_function+0x0/0xc [<c013856d>] do_futex+0x4b/0x7c [<c01385ee>] sys_futex+0x50/0x108 [<c01035f5>] sys_sigreturn+0xc3/0xd4 [<c0104025>] syscall_call+0x7/0xb java R running 3464 4942 4918 4953 4938 (NOTLB) java R running 2900 4953 4918 4954 4942 (NOTLB) java R running 3484 4954 4918 4955 4953 (NOTLB) java S EF754340 3484 4955 4918 4956 4954 (NOTLB) c4ca7eb4 00000082 c01539e1 ef754340 117e1067 00000000 c4ca6400 00000020 88bfd0b6 00000439 c0153d16 00000671 ef62dbc4 ef62daa0 efa3baa0 c1608160 88c1eaeb 00000439 00000000 c4ca7000 c1608ac0 00000000 c4ca7ed4 efa3baa0 Call Trace: [<c01539e1>] do_anonymous_page+0x5c/0x196 [<c0153d16>] do_no_page+0x1fb/0x30b [<c0307ef2>] schedule_timeout+0xb0/0x100 [<c01565f2>] find_extend_vma+0x12/0x4f [<c012a546>] process_timeout+0x0/0x5 [<c01382b4>] futex_wait+0x1ee/0x21b [<c011d22d>] default_wake_function+0x0/0xc [<c013856d>] do_futex+0x4b/0x7c [<c01d6dbe>] copy_from_user+0x42/0x84 [<c01385ee>] sys_futex+0x50/0x108 [<c0104025>] syscall_call+0x7/0xb java S EF754340 3484 4956 4918 4955 (NOTLB) d17b5eb4 00000082 c01539e1 ef754340 11037067 00000000 d114b408 00000020 88bfd0b6 00000439 c0153d16 00002f02 efbc9bc4 efbc9aa0 ef62daa0 c1608160 88c1aa7e 00000439 00000000 d17b5000 c1608ac0 00000000 d17b5ed4 ef62daa0 Call Trace: [<c01539e1>] do_anonymous_page+0x5c/0x196 [<c0153d16>] do_no_page+0x1fb/0x30b [<c0307ef2>] schedule_timeout+0xb0/0x100 [<c01565f2>] find_extend_vma+0x12/0x4f [<c012a546>] process_timeout+0x0/0x5 [<c01382b4>] futex_wait+0x1ee/0x21b [<c011d22d>] default_wake_function+0x0/0xc [<c013856d>] do_futex+0x4b/0x7c [<c01d6dbe>] copy_from_user+0x42/0x84 [<c01385ee>] sys_futex+0x50/0x108 [<c0104025>] syscall_call+0x7/0xb This 'fixes' it.... --- kernel/sched.c.orig 2005-06-29 10:43:27.000000000 +0100 +++ kernel/sched.c 2005-06-29 10:43:31.000000000 +0100 @@ -3436,6 +3436,7 @@ static int do_sched_setscheduler(pid_t p int retval; struct sched_param lparam; struct task_struct *p; + char comm[sizeof(current->comm)]; if (!param || pid < 0) return -EINVAL; @@ -3447,8 +3448,14 @@ static int do_sched_setscheduler(pid_t p read_unlock_irq(&tasklist_lock); return -ESRCH; } - retval = sched_setscheduler(p, policy, &lparam); + get_task_comm(comm, p); + if (strcmp(comm, "java")) + retval = sched_setscheduler(p, policy, &lparam); + else + retval = -EPERM; read_unlock_irq(&tasklist_lock); + printk("Attempt to set scheduler prio for pid %d (%s) to policy %d, prio %d. Retval %d\n", + pid, comm, policy, lparam.sched_priority, retval); return retval; } Making it also send SIGBUS when a Java process tries this results in the following... Program received signal SIGBUS, Bus error. [Switching to Thread -1270875216 (LWP 3284)] 0x00892402 in __kernel_vsyscall () (gdb) bt #0 0x00892402 in __kernel_vsyscall () #1 0x0027f429 in sched_setscheduler () from /lib/libc.so.6 #2 0x001c23ab in pthread_setschedparam () from /lib/libpthread.so.0 #3 0x011490d6 in _Jv_ThreadSetPriority () from /usr/lib/libgcj.so.6 #4 0x00e8ec60 in java::lang::Thread::setPriority () from /usr/lib/libgcj.so.6 #5 0x04cf3dea in org::eclipse::core::internal::jobs::Worker::run () from /usr/lib/eclipse/plugins/org.eclipse.core.runtime_3.1.0.jar.so #6 0x00e8f241 in _Jv_ThreadRun () from /usr/lib/libgcj.so.6 #7 0x011491a4 in _Jv_ThreadUnRegister () from /usr/lib/libgcj.so.6 #8 0x012067cf in GC_start_routine () from /usr/lib/libgcj.so.6 #9 0x001c1b80 in start_thread () from /lib/libpthread.so.0 #10 0x00298dee in clone () from /lib/libc.so.6 For some bizarre reason we're setting SCHED_RR when we do java::lang::Thread::setPriority (). I cannot begin to imagine why anyone ever thought this might be a good idea. 2005-06-29 Andrew Haley <aph> * posix-threads.cc (_Jv_ThreadSetPriority): Use SCHED_OTHER (regular, non-realtime scheduling), not SCHED_RR (realtime, round-robin). Index: posix-threads.cc =================================================================== RCS file: /cvs/gcc/gcc/libjava/posix-threads.cc,v retrieving revision 1.36 diff -u -p -r1.36 posix-threads.cc --- posix-threads.cc 16 Feb 2005 04:16:06 -0000 1.36 +++ posix-threads.cc 29 Jun 2005 10:55:06 -0000 @@ -343,7 +343,7 @@ _Jv_ThreadSetPriority (_Jv_Thread_t *dat struct sched_param param; param.sched_priority = prio; - pthread_setschedparam (data->thread, SCHED_RR, ¶m); + pthread_setschedparam (data->thread, SCHED_OTHER, ¶m); } #endif } FC-4 gcc errata is waiting for 4.0.1 release and the following churn of patches delayed for once 4.0.1 gets released. I'll include this then. Andrew, do you plan to commit this to gcc-4_0-branch once it reopens? I'll commit it to 4.0 branch and to HEAD. David, should I take it that SCHED_OTHER is the right thing to use? (In reply to comment #35) > David, should I take it that SCHED_OTHER is the right thing to use? Yes. The problem seems to be fixed with latest kernel 2.6.12-1.1387_FC4 :) Seems to be fixed? How can that be? Sorry, I really have no idea. It just works. And log5factor seems to finally work as well. And kernel seems to not randomly hang anymore. That's all. Any minor change in scheduling could cause it to appear to work. Whatever it is that the high-priority thread is busy-waiting for, if it happens _before_ the offending thread starts hogging the CPU, then all will appear to work. appear to work? isn't this still fixed? No, it's not fixed. The gcc branch is frozen, and we can't check in the fix until it's thawed. Soon, I hope. So I guess that as soon as gcc 4.0.1 ships, every fc4 should be recompiled, shouldn't they? So I guess that as soon as gcc 4.0.1 ships, every package fc4 should be recompiled, shouldn't they? No, there's no need to recompile packages. The errant code is in a shared library, so appears only once. Shared libraries are great! (In reply to comment #38) > Seems to be fixed? How can that be? > I can confirm that kernel 2.6.12-1.1387_FC4 doesn't fix the problem. Is there some end user workaround for this? Eclipse doesn't have any means of authenticating a different user in order to install updates, so it can't be updated through its built in update functionality at the moment. (In reply to comment #46) > Is there some end user workaround for this? You could try an LD_PRELOAD hack which overrides sched_setscheduler, or just run it in gdb with a breakpoint on sched_setscheduler and subvert it manually. I'm very surprised that a workaround is required. Surely running Eclipse as root is simply a mistake. Apparently this is being done just to allow Eclipse to use its self-updating functionality. I'm vaguely surprised that this works at all, and that it hasn't been disabled as I think Firefox's has. Just to add to the list... I was experiencing the hang every time I ran any version of Eclipse (typically after seeing the Welcome screen). I used yum to remove the shipped java, jessie and jre packages. I then installed from jre-1_5_0_02-linux-i586-rpm.bin which I had hanging around from Sun. Then I created a link to /usr/java/jre1.5.0_02/bin/java in /usr/bin/ Now I'm running Eclipse 3.1 (Build id: I20050627-1435) and all is sweetness and light (it seems faster than 3.0 on Fedora Core 3). I'm running the 2.6.11-1.1369_FC4 kernel. Please keep up the good work to fix this, I'm just confirming that it's possible to workaround this fatal issue. Ah, the Eclipse updater. I certainly prefer all updates to be done using the operating system's pacahge management system -- whatever that happens to be. However, I understand that people want to use the Eclipse updater. Rest assured that we will fix this, and soon. Just for the record: all updates to the SDK itself _should_ be done with the system package management system. The updater itself can't be disabled because people want to use plugins that we don't ship and we include a patch to enable them to install painlessly in their home directories (~/.eclipse/<some stuff>). *** Bug 162598 has been marked as a duplicate of this bug. *** Pach now in upstream gcc. Fix in gcc-4.0.1-1 in rawhide. Any chance to see it as an update for FC4? Nice work dwmw2, The bug as originally reported is no longer reproducable in rawhide while using gcc-4.0.1-4. Leaving as open as some are reporting the issue is still pending in FC4. Switching version to fc4 as rawhide is no longer seeing the bug. Switching status to need info to reflect comment #56 - "Any chance to see it as an update for FC4?" The FC4 update has been pushed, thanks Jakub! http://www.redhat.com/archives/fedora-announce-list/2005-July/msg00124.html http://www.redhat.com/archives/fedora-announce-list/2005-July/msg00125.html http://www.redhat.com/archives/fedora-announce-list/2005-July/msg00131.html IMHO, this can be closed now. All known issues regarding this bug as first submitted have been resolved upstream and are in current rleases of FC4 and Rawhide. If you have further problems, please open a new bug. For example root user updates Comment #49 *** Bug 172979 has been marked as a duplicate of this bug. *** *** Bug 160793 has been marked as a duplicate of this bug. *** |