Bug 161483
Summary: | Please include _Jv_CondWait patch [was: Eclipse CVS checkouts hang at final synchronization] | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Robin Green <greenrd> | ||||||||||
Component: | gcc | Assignee: | Jakub Jelinek <jakub> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | rawhide | CC: | aph, eclipse-bugs, jakub, mckinlay, tadej.j, tromey | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | i686 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | 4.0.2-8.fc4 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2005-11-26 11:57:14 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Robin Green
2005-06-23 17:30:17 UTC
I'm experiencing this now as well. gcc*-4.0.1-3 eclipse*-3.1.0_fc-6 x86 On my machine, an oprofile run done when eclipse seemed to be hanging, shows activity concentrated in interpreted code: # opreport -l -t 1 'image:*gcj*' CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % app name symbol name 3193 26.0547 libgcj.so.6.0.0 _Jv_InterpMethod::run(void*, ffi_raw*) 928 7.5724 libgcj.so.6.0.0 _Jv_CondWait(_Jv_ConditionVariable_t*, _Jv_Mutex_t*, long long, int) 880 7.1807 libgcj.so.6.0.0 java::lang::Object::wait(long long, int) 796 6.4953 libgcj.so.6.0.0 _Jv_platform_gettimeofday() 769 6.2750 libgcj.so.6.0.0 __i686.get_pc_thunk.bx 564 4.6022 libgcj.so.6.0.0 GC_mark_from 448 3.6557 libgcj.so.6.0.0 ffi_closure_raw_SYSV 412 3.3619 libgcj.so.6.0.0 java::lang::System::currentTimeMillis() 240 1.9584 org.eclipse.core.runtime_3.1.0.jar.so org::eclipse::core::internal::jobs::Semaphore::acquire(long long) 183 1.4933 libgcj.so.6.0.0 .plt 149 1.2158 libgcj.so.6.0.0 _Jv_equalUtf8Consts(_Jv_Utf8Const const*, _Jv_Utf8Const const*) 147 1.1995 libgcj.so.6.0.0 ffi_raw_call 129 1.0526 libgcj.so.6.0.0 _Jv_MonitorEnter 125 1.0200 libgcj.so.6.0.0 java::lang::String::compareTo(java::lang::String*) Unfortunately, due to the lack of a SIGQUIT handler for gij, I am not sure how to easily determine what code is being executed. (The cacao VM has a SIGQUIT handler, but it completes the CVS synchronization in a reasonable amount of time [although still slowly].) [Removing bogus bug alias, which was inserted by a buggy firefox plugin.] After fixing bug 163969, I was able to get a clearer picture of what was going on with oprofile. Again, I sampled the hang for a few minutes - I didn't wait for it to complete. I was doing other stuff at the time with my machine, but according to top, the vast majority of the time was spent in gij. However, the oprofile report for the entire system starts like this: CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name app name symbol name 560026 71.5807 no-vmlinux no-vmlinux (no symbols) 19564 2.5006 libgklayout.so libgklayout.so (no symbols) 18942 2.4211 anon (tgid:13508 range:0x8b2000-0x8b3000) gij (no symbols) 12236 1.5640 libmozjs.so libmozjs.so (no symbols) 8833 1.1290 libxpconnect.so libxpconnect.so (no symbols) 8651 1.1057 anon (tgid:4986 range:0x88cb000-0x88d0000) Xorg (no symbols) 8498 1.0862 libgcc_s.so.1 libgcc_s.so.1 __moddi3 8306 1.0616 libgcc_s.so.1 libgcc_s.so.1 __divdi3 7345 0.9388 libxpcom.so libxpcom.so (no symbols) 6085 0.7778 anon (tgid:4986 range:0x88dc000-0x8911000) Xorg (no symbols) 6032 0.7710 libgcj.so.6.0.0 libgcj.so.6.0.0 _Jv_CondWait(_Jv_ConditionVariable_t*, _Jv_Mutex_t*, long long, int) 5885 0.7522 libgcj.so.6.0.0 libgcj.so.6.0.0 java::lang::Object::wait(long long, int) 5401 0.6903 anon (tgid:4986 range:0x87ce000-0x87ef000) Xorg (no symbols) 5236 0.6692 libpython2.4.so.1.0 libpython2.4.so.1.0 (no symbols) 5160 0.6595 libpthread-2.3.90.so libpthread-2.3.90.so pthread_mutex_lock 5152 0.6585 libgcj.so.6.0.0 libgcj.so.6.0.0 _Jv_platform_gettimeofday() I suspect that gij is generating a vast number of syscalls, which is causing the kernel to burn up most of the CPU time. Now to isolate the gij parts of the profile: CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples % image name app name symbol name 18942 38.5737 anon (tgid:13508 range:0x8b2000-0x8b3000) gij (no symbols) 6032 12.2836 libgcj.so.6.0.0 libgcj.so.6.0.0 _Jv_CondWait(_Jv_ConditionVariable_t*, _Jv_Mutex_t*, long long, int) 5885 11.9843 libgcj.so.6.0.0 libgcj.so.6.0.0 java::lang::Object::wait(long long, int) 5152 10.4916 libgcj.so.6.0.0 libgcj.so.6.0.0 _Jv_platform_gettimeofday() 3032 6.1744 libgcj.so.6.0.0 libgcj.so.6.0.0 __i686.get_pc_thunk.bx 2182 4.4434 libgcj.so.6.0.0 libgcj.so.6.0.0 java::lang::System::currentTimeMillis() 1421 2.8937 org.eclipse.core.runtime_3.1.0.jar.so org.eclipse.core.runtime_3.1.0.jar.so org::eclipse::core::internal::jobs::Semaphore ::acquire(long long) 1387 2.8245 libgcj.so.6.0.0 libgcj.so.6.0.0 .plt 910 1.8531 libgcj.so.6.0.0 libgcj.so.6.0.0 java::lang::Object::wait(long long) 735 1.4968 libgcj.so.6.0.0 libgcj.so.6.0.0 GC_mark_from 318 0.6476 libgcj.so.6.0.0 libgcj.so.6.0.0 _Jv_InterpMethod::run(void*, ffi_raw*) 110 0.2240 libgcj.so.6.0.0 libgcj.so.6.0.0 _Jv_LookupInterfaceMethodIdx Clearly, a major part of the time the CPU is in the gij process, is spent in locking code. (What is that anonymous map with 38.6% of the CPU time, though? Could be just an artifact of an oprofile reporting bug, I suppose.) So, could be: 1. libgcj's locking code being too slow 2. eclipse calling libgcj's locking code a ridiculously high number of times or 3. A combination of 1 and 2. I'm going to try and look into possibility #2. Oddly, I can't duplicate this at all. My checkout ends with U org.eclipse.emf/tests/org.eclipse.emf.tests-feature/feature.properties U org.eclipse.emf/tests/org.eclipse.emf.tests-feature/feature.xml U org.eclipse.emf/tests/org.eclipse.emf.tests-feature/license.html cvs checkout: Updating org.eclipse.emf/tests/org.eclipse.emf.tests-feature/rootfiles U org.eclipse.emf/tests/org.eclipse.emf.tests-feature/rootfiles/epl-v10.html U org.eclipse.emf/tests/org.eclipse.emf.tests-feature/rootfiles/notice.html ok (took 2:01.000) *** Ah, spoke too soon. Exiting Eclipse doesn't terminate. Backtrace is: #0 0x0711d182 in java::lang::Object::wait (this=@a83e8d70, timeout=9223372036854774393, nanos=0) at ./sysdep/locks.h:43 #1 0x07133081 in java.lang.Object.wait(long) (this=@a83e8d70, timeout=9223372036854774393) at ../../../libjava/java/lang/Object.java:449 #2 0xb44ea9ae in org::eclipse::core::internal::jobs::Semaphore::acquire () from /usr/lib/eclipse/plugins/org.eclipse.core.runtime_3.1.0.jar.so #3 0xb44e9996 in org::eclipse::core::internal::jobs::OrderedLock::doAcquire () from /usr/lib/eclipse/plugins/org.eclipse.core.runtime_3.1.0.jar.so #4 0xb44e96e2 in org::eclipse::core::internal::jobs::OrderedLock::acquire () from /usr/lib/eclipse/plugins/org.eclipse.core.runtime_3.1.0.jar.so #5 0xb44e966a in org::eclipse::core::internal::jobs::OrderedLock::acquire () from /usr/lib/eclipse/plugins/org.eclipse.core.runtime_3.1.0.jar.so #6 0xa88f5fda in org::eclipse::team::internal::ccvs::core::resources::EclipseSynchronizer::beginOperation () from /usr/lib/eclipse/plugins/org.eclipse.team.cvs.core_3.1.0/cvs.jar.so #7 0xa88f4a56 in org::eclipse::team::internal::ccvs::core::resources::EclipseSynchronizer::getFolderSync () from /usr/lib/eclipse/plugins/org.eclipse.team.cvs.core_3.1.0/cvs.jar.so #8 0xa88f2c88 in org::eclipse::team::internal::ccvs::core::resources::EclipseFolder::isCVSFolder () from /usr/lib/eclipse/plugins/org.eclipse.team.cvs.core_3.1.0/cvs.jar.so #9 0xa88f2faf in org::eclipse::team::internal::ccvs::core::resources::EclipseFolder::isIgnored () from /usr/lib/eclipse/plugins/org.eclipse.team.cvs.core_3.1.0/cvs.jar.so #10 0xa88c7b7c in org::eclipse::team::internal::ccvs::core::CVSSyncTreeSubscriber::isSupervised () from /usr/lib/eclipse/plugins/org.eclipse.team.cvs.core_3.1.0/cvs.jar.so #11 0xadce3909 in org::eclipse::team::core::variants::ResourceVariantTreeSubscriber::members () from /usr/lib/eclipse/plugins/org.eclipse.team.core_3.1.0/team.jar.so #12 0xadcf7be4 in org::eclipse::team::internal::core::subscribers::SubscriberChangeSetCollector$EventHandler::handleChange () from /usr/lib/eclipse/plugins/org.eclipse.team.core_3.1.0/team.jar.so This is only on rawhide stuff. 3.1M6 that we shipped with FC4 doesn't have this problem. Try checking out GNU Classpath as the example. This happens with or without the corresponding org.eclipse.team.cvs.core_3.1.0.jar.{so,db}. FWIW, I found that disabling: Preferences -> CVS -> Synchronize/Compare -> Consider file contents in comparisons avoids this deadlock. I doubt this info will help fix the real problem but it's a workaround for those using native eclipse. I've been working on this. It appears that the patch for Eclipse bug 89416 is what brings this libgcj issue to light. I'm trying to create a test case for the libgcj guys, but it's a very time-consuming process. This is the only bug that is standing in the way of Eclipse 3.1 hitting FC4, IMO. If I take off CVS Decorators (Window->Preferences->Team->CVS->Label Decorations) - all of them - I can get GNU Classpath to check out. It appears this problem goes away when using a natively-compiled (with -O2) org.eclipse.core.resources_3.1.0.jar. aot-compile-rpm for FC4 did not use -fjni as an argument for the native compilation of jars. This resulted in certain jars (o.e.c.resources, o.e.swt.gtk.linux, etc.) not being used BC-compiled. This can be verified by running eclipse with -vmargs -verbose and looking in the output for "(bytecode)" (minus the quotes). I've built a new version in rawhide (-15) which I'll test and then hopefully we can get things back-ported to FC4 soon. Unfortunately it turns out that the problem isn't fixed by -O2 compilation. Andrew has noticed that when running Eclipse using the gij -verbose option, the problem does not occur. Our best guess at this point is that this is a timing issue/race condition, as subtle changes to the environment (inserting debug print statements, etc) also make the problem come and go. We've also noticed that it happens all the time on a hyperthreaded machine, but only sometimes on a single-threaded CPU. Repeating what I said at [1]: How to set things up to figure out WTF is going on with CVS checkouts in Eclipse with libgcj -- Andrew Overholt Last modified: 2005-10-13 . install Eclipse 3.1.1 from rawhide yum --enablerepo=development install eclipse-pde-devel . start it using the Sun (or other proprietary) JVM mkdir -p ~/workspaces eclipse -vm <path to JVM>/bin/java -data ~/workspaces/cvs3.1.1 . import the source for the plug-ins we need: Import->External Plug-ins and Fragments . check "Projects with source folders" . next . pick org.eclipse.team.cvs.core, hit "Add ->" . hit "Required Plug-ins ->" . finish . create ant build files for each foreach(org.eclipse.ant.core, org.eclipse.core.resources, org.eclipse.core.runtime, org.eclipse.core.variables, org.eclipse.team.core, org.eclipse.team.cvs.core, org.eclipse.update.configurator): . right-click on plugin.xml . PDE Tools->Create Ant Build File . right-click on build.xml . Run As -> Ant Build... . check "build.update.jar", "build.jars" (should be checked), "clean", "refresh" . change the order to be: clean, build.jars, build.update.jar, refresh . hit Apply and then Run . in another terminal, create a directory to store the native bits of the above jars mkdir -p ~/eclipse/cvsissues/nativebits . get scripts to make nativifying easy (attached) cd ~/eclipse/cvsissues/nativebits wget http://overholt.ca/nativifyjar.sh wget http://overholt.ca/all.sh . make the jar and db files for the above jars point to your modified ones: pushd /usr/lib/gcj/eclipse sudo mv org.eclipse.ant.core_3.1.1.jar.db{,.bak} sudo mv org.eclipse.core.resources_3.1.0.jar.db{,.bak} sudo mv org.eclipse.core.runtime_3.1.1.jar.db{,.bak} sudo mv org.eclipse.core.variables_3.1.0.jar.db{,.bak} sudo mv org.eclipse.team.core_3.1.1.jar.db{,.bak} sudo mv org.eclipse.team.cvs.core_3.1.1.jar.db{,.bak} sudo mv org.eclipse.update.configurator_3.1.0.jar.db{,.bak} sudo ln -s ~/eclipse/cvsissues/nativebits/org.eclipse.ant.core_3.1.1.jar.db sudo ln -s ~/eclipse/cvsissues/nativebits/org.eclipse.core.resources_3.1.0.jar.db sudo ln -s ~/eclipse/cvsissues/nativebits/org.eclipse.core.runtime_3.1.1.jar.db sudo ln -s \ ~/eclipse/cvsissues/nativebits/org.eclipse.core.variables_3.1.0.jar.db sudo ln -s ~/eclipse/cvsissues/nativebits/org.eclipse.team.core_3.1.1.jar.db sudo ln -s \ ~/eclipse/cvsissues/nativebits/org.eclipse.team.cvs.core_3.1.1.jar.db sudo ln -s \ ~/eclipse/cvsissues/nativebits/org.eclipse.update.configurator_3.1.0.jar.db popd pushd /usr/share/eclipse/plugins sudo mv org.eclipse.ant.core_3.1.1.jar /tmp sudo mv org.eclipse.core.resources_3.1.0.jar /tmp sudo mv org.eclipse.core.runtime_3.1.1.jar /tmp sudo mv org.eclipse.core.variables_3.1.0.jar /tmp sudo mv org.eclipse.team.core_3.1.1.jar /tmp sudo mv org.eclipse.team.cvs.core_3.1.1.jar /tmp sudo mv org.eclipse.update.configurator_3.1.0.jar /tmp sudo ln -s \ ~/workspaces/cvs3.1.1/org.eclipse.ant.core/org.eclipse.ant.core_3.1.1.jar sudo ln -s \ ~/workspaces/cvs3.1.1/org.eclipse.core.resources/org.eclipse.core.resources_3.1.0.jar sudo ln -s \ ~/workspaces/cvs3.1.1/org.eclipse.core.runtime/org.eclipse.core.runtime_3.1.1.jar sudo ln -s \ ~/workspaces/cvs3.1.1/org.eclipse.core.variables/org.eclipse.core.variables_3.1.0.jar sudo ln -s \ ~/workspaces/cvs3.1.1/org.eclipse.team.core/org.eclipse.team.core_3.1.1.jar sudo ln -s \ ~/workspaces/cvs3.1.1/org.eclipse.team.cvs.core/org.eclipse.team.cvs.core_3.1.1.jar sudo ln -s \ ~/workspaces/cvs3.1.1/org.eclipse.update.configurator/org.eclipse.update.configurator_3.1.0.jar popd . make changes in Eclipse as you see fit, re-run the Ant build for the encompassing plug-in each time (right-click on build.xml -> Run As -> Ant Build), re-run the native part for either all jars (all.sh) or just the one you changed: either ./all.sh sudo rebuild-gcj-db OR one or more of: ./nativifyjar.sh org.eclipse.ant.core 3.1.1 ~/workspaces/cvs3.1.1 ./nativifyjar.sh org.eclipse.core.resources 3.1.0 ~/workspaces/cvs3.1.1 ./nativifyjar.sh org.eclipse.core.runtime 3.1.1 ~/workspaces/cvs3.1.1 ./nativifyjar.sh org.eclipse.core.variables 3.1.0 ~/workspaces/cvs3.1.1 ./nativifyjar.sh org.eclipse.team.core 3.1.1 ~/workspaces/cvs3.1.1 ./nativifyjar.sh org.eclipse.team.cvs.core 3.1.1 ~/workspaces/cvs3.1.1 ./nativifyjar.sh org.eclipse.update.configurator 3.1.0 ~/workspaces/cvs3.1.1 followed by: sudo rebuild-gcj-db . enable tracing if you want by editing <plugin>/.options and then tacking a -debug ~/workspaces/cvs3.1.1/<plugin>/.options onto the end of the eclipse launch command . run native eclipse: rm -rf ~/workspaces/testCVS eclipse -data ~/workspaces/testCVS [-debug ~/workspaces/cvs3.1.1/<plugin>.options] . NOTE: you should also be able to run it from within Eclipse but I can't figure out how to get the classes in <project>/bin (which are theoretically the same that are in the generated jars) to be mapped to their corresponding .jar.so (in Eclipse, set up a run profile that runs an Eclipse application with gij (you can set up an installed JRE for java-gcj-compat pretty easily by using /usr/lib/jvm as the location in the Installed JREs dialog)) [1] https://www.redhat.com/archives/fedora-devel-java-list/2005-October/msg00025.html Created attachment 119942 [details]
nativifyjar.sh
Created attachment 119943 [details]
all.sh
After much looking into this, I am confident that the time is indeed being spent in org::eclipse::core::internal::jobs::Semaphore::acquire as others above have seen with backtraces and oprofile. This is a synchronized method and AFAICT, there are three different threads competing: "Updating Change Sets for CVS Workspace", "Checking out 'classpath'", and "Decoration Calculation". We aren't dead-locking, but the lock acquiring and releasing in org::eclipse::team::internal::ccvs::core::resources::EclipseSynchronizer::beginOperation and org::eclipse::team::internal::ccvs::core::resources::EclipseSynchronizer::endOperation is somehow messed up. We're holding the locks way too long. Figuring out why is what I'm currently attempting to do. There were times when I thought we were mistakenly returning files as being dirty when they weren't but after adding some sorting code (to compare output between runs with the Sun JVM and with our stuff) to org::eclipse::team::internal::ccvs::core::resources::EclipseSynchronizer::beginOperation, I can say for sure that we are not. Another dead end I went down was seeing if the setTeamPrivateMember call (I can't remember where this was) was outside the IWorkspaceRunnable and affecting things but after modifying the code to set that at Resource creation time, the problem persisted so it's not there, either. The calls to System.currentTimeMillis() are not causing delays, either. I write this all just for posterity :) (In reply to comment #14) > How to set things up to figure out WTF is going on with CVS checkouts in > Eclipse with libgcj -- Andrew Overholt > > Last modified: 2005-10-13 > > . install Eclipse 3.1.1 from rawhide > > yum --enablerepo=development install eclipse-pde-devel > > . start it using the Sun (or other proprietary) JVM > > mkdir -p ~/workspaces > eclipse -vm <path to JVM>/bin/java -data ~/workspaces/cvs3.1.1 Here, you're better off using another Eclipse distribution (ie. download the upstream .tar.gz) since we're going to replace some jars and stuff later on. Sorry for the confusion. Created attachment 121089 [details]
statements to help trigger the bug
This patch contains some commented out statements that I added for debugging
purposes. When I thought this was fixed on my laptop, re-adding these lines
and following the steps in my earlier comments (about how to hack on this from
within Eclipse and rebuild the jar and jar.so, etc.) exposed the problem again.
The problem is always present on my HT x86 box.
(In reply to comment #19) > This patch contains some commented out statements that I added for debugging > purposes. This patch can be applied like this: expand org.eclipse.team.cvs.core expand src expand org.eclipse.team.internal.ccvs.core.resources right-click on EclipseSynchronizer.java -> Team -> Apply patch enter the location of the patch file change "Ignore leading path name segments" to 5 hit "Guess" on maximum fuzz factor the entries in the patch should now not have red exclamation points beside them hit finish Created attachment 121453 [details]
Script to rebuild ant plugins
This script may be useful. It quickly rebuilds an Eclipse plugin and installs
it, allowing the programmer easily to make experimental changes.
I have verified that Andrew's patch does indeed fix this issue. What a relief! Thank you very much, Andrew! gcc-4.0.2-8.fc4 should appear in fc4 testing updates RSN (has been already built and the update request filed). (In reply to comment #24) > gcc-4.0.2-8.fc4 should appear in fc4 testing updates RSN (has been already built > and the update request filed). I have verified that this gcc build does indeed fix this bug. Thanks! Confirming this is fixed in updates-testing with the following package versions: gcc-4.0.2-8.fc4 eclipse-platform-3.1.1-1jpp_1fc.FC4.11 |