Bug 179002
Summary: | update failed - /usr/bin/rebuild-gcj-db hung! | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Neal Becker <ndbecker2> | ||||||||||||||||||||
Component: | kernel | Assignee: | Dave Jones <davej> | ||||||||||||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> | ||||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||||
Priority: | high | ||||||||||||||||||||||
Version: | rawhide | CC: | alan.krause, aph, gczarcinski, green, ianburrell, ivg231, pfrields, redhatbugs, tromey, wtogami | ||||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||
Last Closed: | 2006-02-03 20:28:16 UTC | Type: | --- | ||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||
Attachments: |
|
Description
Neal Becker
2006-01-26 13:36:03 UTC
Can you please post the output of: thread apply all bt in gdb? thread apply all bt (gdb) thread apply all bt Thread 2 (Thread 1084229984 (LWP 30832)): #0 0x0000003ce6409436 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000034716e68d2 in _Jv_CondWait () from /usr/lib/../lib64/libgcj.so.7 #2 0x00000034716d2d04 in gnu::gcj::runtime::FinalizerThread::run () from /usr/lib/../lib64/libgcj.so.7 #3 0x00000034716e050a in _Jv_ThreadRun () from /usr/lib/../lib64/libgcj.so.7 #4 0x00000034716e6461 in _Jv_ThreadRegister () from /usr/lib/../lib64/libgcj.so.7 #5 0x0000003471b64279 in GC_start_routine () from /usr/lib/../lib64/libgcj.so.7 #6 0x0000003ce640615a in start_thread () from /lib64/libpthread.so.0 #7 0x0000003ce5bc92bd in clone () from /lib64/libc.so.6 #8 0x0000000000000000 in ?? () Thread 1 (Thread 47048294863472 (LWP 30831)): #0 0x0000003ce640b01d in sem_wait () from /lib64/libpthread.so.0 #1 0x0000003471b64c93 in GC_stop_world () from /usr/lib/../lib64/libgcj.so.7 #2 0x0000003471b5729b in GC_stopped_mark () from /usr/lib/../lib64/libgcj.so.7 #3 0x0000003471b575db in GC_try_to_collect_inner () from /usr/lib/../lib64/libgcj.so.7 #4 0x0000003471b57840 in GC_collect_or_expand () from /usr/lib/../lib64/libgcj.so.7 #5 0x0000003471b57d3b in GC_allocobj () from /usr/lib/../lib64/libgcj.so.7 #6 0x0000003471b5bc90 in GC_generic_malloc_inner () from /usr/lib/../lib64/libgcj.so.7 #7 0x0000003471b5c6d7 in GC_generic_malloc_many () from /usr/lib/../lib64/libgcj.so.7 #8 0x0000003471b64714 in GC_local_gcj_malloc () from /usr/lib/../lib64/libgcj.so.7 #9 0x00000034716aaae8 in _Jv_AllocObjectNoFinalizer () from /usr/lib/../lib64/libgcj.so.7 #10 0x00000034716fa341 in gnu::gcj::runtime::PersistentByteMap::addBytes () from /usr/lib/../lib64/libgcj.so.7 #11 0x00000034716fa657 in gnu::gcj::runtime::PersistentByteMap::put () from /usr/lib/../lib64/libgcj.so.7 #12 0x00000034716f9e25 in gnu::gcj::runtime::PersistentByteMap::putAll () from /usr/lib/../lib64/libgcj.so.7 #13 0x0000000000404592 in ?? () #14 0x00000034716d33e3 in gnu::java::lang::MainThread::call_main () from /usr/lib/../lib64/libgcj.so.7 #15 0x00000034716e050a in _Jv_ThreadRun () from /usr/lib/../lib64/libgcj.so.7 #16 0x00000034716abcdf in _Jv_RunMain () from /usr/lib/../lib64/libgcj.so.7 #17 0x0000003ce5b1cde4 in __libc_start_main () from /lib64/libc.so.6 #18 0x0000000000402c49 in ?? () #19 0x00007fffff85dbe8 in ?? () #20 0x0000000000000000 in ?? () Also, can you attach the output of: ps auwx The ps output you posted isn't full-width so I can't see the actual commands being run. Created attachment 123714 [details]
sudo cat /proc/31020/cmdline
I don't see how that command line can come from rebuild-gcj-db. Can you post the full log of: ps auwx instead? Created attachment 123716 [details]
ps auxwwf
OK, the output is still getting cut off. I see the problem in Comment #4 though. cmdline is null-delimited so you were only printing argv[0]. Can you post the output of: sudo cat /proc/<gcj-dbtool proc id>/cmdline | tr '\0' '\n' Created attachment 123717 [details]
sudo cat /proc/<gcj-dbtool proc id>/cmdline | tr '\0' '\n'
Maybe one problem is that the command line really IS truncated. Yes, that could well be. Try running gcj-dbtool manually, like this: /usr/bin/gcj-dbtool -m /usr/lib64/gcj-4.1.0/classmap.db /usr/lib64/gcj-4.1.0/classmap.db /usr/lib6 Do you get the same hang? No. (wasn't sure if you intended to put /usr/lib64/gcj-4.1.0/classmap.db 2 times): [nbecker@nbecker4 ~]$ sudo gcj-dbtool -m /usr/lib64/gcj-4.1.0/classmap.db /usr/lib6 java.io.FileNotFoundException: /usr/lib6 (No such file or directory) at gnu.java.nio.channels.FileChannelImpl.open (libgcj.so.7) at gnu.java.nio.channels.FileChannelImpl.<init> (libgcj.so.7) at java.io.FileInputStream.<init> (libgcj.so.7) at gnu.gcj.runtime.PersistentByteMap.<init> (libgcj.so.7) at gnu.gcj.tools.gcj_dbtool.Main.main (gcj-dbtool) [nbecker@nbecker4 ~]$ sudo gcj-dbtool -m /usr/lib64/gcj-4.1.0/classmap.db /usr/lib64/gcj-4.1.0/classmap.db /usr/lib6 java.io.FileNotFoundException: /usr/lib6 (No such file or directory) at gnu.java.nio.channels.FileChannelImpl.open (libgcj.so.7) at gnu.java.nio.channels.FileChannelImpl.<init> (libgcj.so.7) at java.io.FileInputStream.<init> (libgcj.so.7) at gnu.gcj.runtime.PersistentByteMap.<init> (libgcj.so.7) at gnu.gcj.tools.gcj_dbtool.Main.main (gcj-dbtool) Yes, I meant to put it twice; the first is an argument to the -m option meaning "merge into this destination database" but since we're adding to the existing classmap.db database we want to merge it too. Try: sudo gcj-dbtool -m /usr/lib64/gcj-4.1.0/classmap.db /usr/lib64/gcj-4.1.0/classmap.db /usr/lib64/gcj/jakarta-commons-el/jakarta-commons-el-1.0.jar.db /usr/lib6 Then try running the full command line (including the truncated one) standalone (you can get the full commandline with: sudo cat /proc/<gcj-dbtool proc id>/cmdline | tr '\0' ' '). Created attachment 123718 [details]
echo $( cat stuff ) > stuff2
What is this the output of? I meant try running this manually: sudo /usr/bin/gcj-dbtool -m /usr/lib64/gcj-4.1.0/classmap.db /usr/lib64/gcj-4.1.0/classmap.db /usr/lib64/gcj/jakarta-commons-el/jakarta-commons-el-1.0.jar.db /usr/lib64/gcj/xalan-j2/xsltc-2.6.0.jar.db /usr/lib64/gcj/xalan-j2/xalan-j2-2.6.0.jar.db /usr/lib64/gcj/xalan-j2/xalan-j2-samples.jar.db /usr/lib64/gcj/regexp/regexp-1.3.jar.db /usr/lib64/gcj/eclipse-bugzilla/bugs.jar.db /usr/lib64/gcj/eclipse-bugzilla/bugzilla.jar.db /usr/lib64/gcj/gnu-crypto/gnu-crypto-sasl-jdk1.4.jar.db /usr/lib64/gcj/gnu-crypto/gnu-crypto-jce-jdk1.4.jar.db /usr/lib64/gcj/gnu-crypto/gnu-crypto-2.0.1.jar.db /usr/lib64/gcj/mx4j/mx4j-remote-boa-3.0.1.jar.db /usr/lib64/gcj/mx4j/mx4j-3.0.1.jar.db /usr/lib64/gcj/mx4j/mx4j-tools-3.0.1.jar.db /usr/lib64/gcj/jakarta-commons-beanutils/jakarta-commons-beanutils-1.7.0.jar.db /usr/lib64/gcj/eclipse-changelog/changelog.jar.db /usr/lib64/gcj/tomcat5/servlets-ssi.renametojar.db /usr/lib64/gcj/tomcat5/servlets-invoker.jar.db /usr/lib64/gcj/tomcat5/jasper5-compiler-5.0.30.jar.db /usr/lib64/gcj/tomcat5/naming-java.jar.db /usr/lib64/gcj/tomcat5/servletapi5-5.0.30.jar.db /usr/lib64/gcj/tomcat5/catalina-optional.jar.db /usr/lib64/gcj/tomcat5/jasper5-runtime-5.0.30.jar.db /usr/lib64/gcj/tomcat5/naming-resources.jar.db /usr/lib64/gcj/tomcat5/catalina-cluster.jar.db /usr/lib64/gcj/tomcat5/catalina.jar.db /usr/lib64/gcj/tomcat5/bootstrap.jar.db /usr/lib64/gcj/tomcat5/tomcat-util.jar.db /usr/lib64/gcj/tomcat5/tomcat-jk2.jar.db /usr/lib64/gcj/tomcat5/servlets-cgi.renametojar.db /usr/lib64/gcj/tomcat5/servlets-common.jar.db /usr/lib64/gcj/tomcat5/tomcat-coyote.jar.db /usr/lib64/gcj/tomcat5/catalina-ant-5.0.30.jar.db /usr/lib64/gcj/tomcat5/servlets-webdav.jar.db /usr/lib64/gcj/tomcat5/naming-common.jar.db /usr/lib64/gcj/tomcat5/servlets-default.jar.db /usr/lib64/gcj/tomcat5/tomcat-http11.jar.db /usr/lib64/gcj/tomcat5/jspapi-5.0.30.jar.db /usr/lib64/gcj/tomcat5/naming-factory.jar.db /usr/lib64/gcj/xerces-j2/xerces-j2-2.6.2.jar.db /usr/lib64/gcj/xerces-j2/xerces-j2-samples.jar.db /usr/lib64/gcj/jsch/jsch-0.1.18.jar.db /usr/lib64/gcj/lucene/lucene-demos-1.4.3.jar.db /usr/lib64/gcj/lucene/lucene-1.4.3.jar.db /usr/lib64/gcj/jakarta-commons-collections/jakarta-commons-collections-3.1.jar.db /usr/lib64/gcj/ant/ant-jsch-1.6.5.jar.db /usr/lib64/gcj/ant/ant-jdepend-1.6.5.jar.db /usr/lib64/gcj/ant/ant-apache-log4j-1.6.5.jar.db /usr/lib64/gcj/ant/ant-commons-logging-1.6.5.jar.db /usr/lib64/gcj/ant/ant-apache-bsf-1.6.5.jar.db /usr/lib64/gcj/ant/ant-junit-1.6.5.jar.db /usr/lib64/gcj/ant/ant-antlr-1.6.5.jar.db /usr/lib64/gcj/ant/ant-nodeps-1.6.5.jar.db /usr/lib64/gcj/ant/ant-apache-oro-1.6.5.jar.db /usr/lib64/gcj/ant/ant-javamail-1.6.5.jar.db /usr/lib64/gcj/ant/ant-apache-regexp-1.6.5.jar.db /usr/lib64/gcj/ant/ant-apache-bcel-1.6.5.jar.db /usr/lib64/gcj/ant/ant-launcher-1.6.5.jar.db /usr/lib64/gcj/ant/ant-1.6.5.jar.db /usr/lib64/gcj/ant/ant-apache-resolver-1.6.5.jar.db /usr/lib64/gcj/ant/ant-trax-1.6.5.jar.db /usr/lib64/gcj/ant/ant-swing-1.6.5.jar.db /usr/lib64/gcj/xml-commons/xml-commons-which-1.0.jar.db /usr/lib64/gcj/xml-commons/xml-commons-apis-1.0.jar.db /usr/lib64/gcj/geronimo-specs/spec-jms-1.1-rc2.jar.db /usr/lib64/gcj/geronimo-specs/spec-j2ee-deployment-1.1-rc2.jar.db /usr/lib64/gcj/geronimo-specs/spec-j2ee-jacc-1.0-rc2.jar.db /usr/lib64/gcj/geronimo-specs/spec-ejb-2.1-rc2.jar.db /usr/lib64/gcj/geronimo-specs/spec-j2ee-management-1.0-rc2.jar.db /usr/lib64/gcj/geronimo-specs/spec-j2ee-connector-1.5-rc2.jar.db /usr/lib64/gcj/geronimo-specs/spec-jta-1.0.1B-rc2.jar.db /usr/lib64/gcj/java_cup/java_cup-runtime-0.10.jar.db /usr/lib64/gcj/java_cup/java_cup-0.10.jar.db /usr/lib64/gcj/eclipse-pydev/pydev.jar.db /usr/lib64/gcj/eclipse-pydev/pydev-debug.jar.db /usr/lib64/gcj/jakarta-commons-logging/jakarta-commons-logging-1.0.4.jar.db /usr/lib64/gcj/eclipse/org.eclipse.update.core_3.1.1.jar.db /usr/lib64/gcj/eclipse/org.eclipse.debug.ui_3.1.1.jar.db /usr/lib64/gcj/eclipse/org.eclipse.ui.views_3.1.1.jar.db /usr/lib64/gcj/eclipse/org.eclipse.core.runtime.compatibility_3.1.0.jar.db /usr/lib64/gcj/eclipse/org.eclipse.ui.workbench_3.1.1.jar.db /usr/lib6 Yes, I did run that command manually. I did sudo cat /proc/<gcj-dbtool procid>/cmdline | tr '\0' ' ' > stuff, then sudo $( cat stuff ) I verified that the content of 'stuff' is exactly correct, by doing echo $( cat stuff ) > stuff2 and attaching this result. I see what you were trying to do. I think you meant to do this: sudo $( cat stuff ) > stuff2 sudo echo $( cat stuff ) just prints the contents of stuff, all on one line. sudo $( cat stuff ) java.io.FileNotFoundException: /usr/lib6 (No such file or directory) at gnu.java.nio.channels.FileChannelImpl.open (libgcj.so.7) at gnu.java.nio.channels.FileChannelImpl.<init> (libgcj.so.7) at java.io.FileInputStream.<init> (libgcj.so.7) at gnu.gcj.runtime.PersistentByteMap.<init> (libgcj.so.7) at gnu.gcj.tools.gcj_dbtool.Main.main (gcj-dbtool) It does not hang. Notice it _did_ see the end of the command. I don't understand how this could act differently than when run from the shell script. Can you turn stuff into a shell script and run it by hand to see if it hangs? Can you also post the output of: find /usr/lib64/gcj -follow -name '*.db' Created attachment 123721 [details]
modified script
Created attachment 123722 [details]
stderr from script
OK, so the -n 1 solves the problem? Did you try adding that to your system rebuild-gcj-db and re-installing the packages that were failing? No! What I showed is that it hangs on _1 particular file_ and that I could _duplicate that result running on just that file_ and that _I ran rpm --verify to show there is nothing wrong with that file_. Maybe there was a Bugzilla mid-air collision? All I see is the "modified script" attachment and the "stderr from script" attachment. Created attachment 123723 [details]
strace
Sorry must have lost some text there. I found it hangs on this command: /usr/bin/gcj-dbtool -m /usr/lib64/gcj-4.1.0/classmap.db /usr/lib64/gcj-4.1.0/classmap.db /usr/lib64/gcj/eclipse-cdt/cdtdebugcore.jar.db Then I did: rpm -qf /usr/lib64/gcj/eclipse-cdt/cdtdebugcore.jar.db eclipse-cdt-3.0.1-1jpp_3fc Then I did: rpm --verify eclipse-cdt [silence] So it reproducibly fails here, and there's nothing wrong with that file. Is there maybe some kind of file lock? gdb seemed to show it's stuck waiting on a semaphore. Can you also post the results of gcj-dbtool -t /usr/lib64/gcj/eclipse-cdt/cdtdebugcore.jar.db gcj-dbtool -t /usr/lib64/gcj/eclipse-cdt/cdtdebugcore.jar.db [silence] No output Check the error status right after running the command: gcj-dbtool -t /usr/lib64/gcj/eclipse-cdt/cdtdebugcore.jar.db echo $? [root@nbecker4 nbecker]# gcj-dbtool -t /usr/lib64/gcj/eclipse-cdt/cdtdebugcore.jar.db || echo "byte me" [nothing] [root@nbecker4 nbecker]# echo $? 0 Are you running Eclipse on your system while you are doing these tests? Can you post the results of: /usr/sbin/lsof | grep db$ Created attachment 123726 [details]
strace -f
sudo /usr/sbin/lsof | grep db$ kopete 2739 nbecker 16u REG 253,0 197632 10296060 /home/nbecker/.kde/share/apps/kopete/kopete_statistics-0.1.db OK, great. Just a little more information, then I'll hand it off to Andrew Haley: rpm -qf `which gcj-dbtool` gcj-dbtool --version I'll need copies of the .db files that are affected. [nbecker@nbecker4 ~]$ rpm -qf `which gcj-dbtool` libgcj-4.1.0-0.16 libgcj-4.1.0-0.16 [nbecker@nbecker4 ~]$ gcj-dbtool --version gcj-dbtool (GNU libgcj) 4.1.0 20060121 (Red Hat 4.1.0-0.16) Copyright 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Created attachment 123731 [details]
db files
I can't reproduce this problem. I'm using FC5 Rawhide, gcj-dbtool (GNU libgcj) 4.1.0 20060121 (Red Hat 4.1.0-0.16) kernel 2.6.15-1.1869_FC5 Whenever I do the db merge it completse. Neal, did you try copying the files, untarring into /tmp and doing the merge command there? Is the reult any different when you fon't merge the in-system copies? Also, can you please post your /proc/cpuinfo [Argh. Too many typos] I can't reproduce this problem. I'm using FC5 Rawhide, gcj-dbtool (GNU libgcj) 4.1.0 20060121 (Red Hat 4.1.0-0.16) kernel 2.6.15-1.1869_FC5 Whenever I do the db merge it completes. Neal, did you try copying the files, untarring into /tmp and doing the merge command there? Is the result any different when you don't merge the in-system copies? Also, can you please post your /proc/cpuinfo Did you try this on x86_64? I just noticed that beagle, a mono app, is showing what looks at least superficially as the same kind of symptom. Maybe this is a problem with thread synchronization. Yes, I should have said: [root@zorro tmp]# cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 4 model name : AMD Athlon(tm) 64 Processor 3200+ stepping : 8 cpu MHz : 2002.645 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow bogomips : 4014.71 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp I agree that this is probably a concurrency problem. That's why I need to know your CPU confug. I tried untar into /tmp and run from there as you suggest and I get the same hang. cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 36 model name : AMD Turion(tm) 64 Mobile Technology ML-37 stepping : 2 cpu MHz : 800.000 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm bogomips : 1595.23 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc I wonder if we've been looking in entirely the wrong place. I'm now guessing that this is either a pthreads or kernel bug. I wonder if there's some way to test signals and threads on this platform. My WAG (wild assed guess) is this has to do with glibc-2.3.90, and is related to threads. It IS the kernel! I just tried with vanilla 2.6.15 (build with fedora .config) and it does NOT HANG. possibly related? https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179228 *** Bug 179293 has been marked as a duplicate of this bug. *** *** Bug 179435 has been marked as a duplicate of this bug. *** I propose we close this as a dup of bug 179228. The stack traces from gdb suggest this. And 179228 has been assigned to Ingo, while this hasn't been assigned to anybody in particular. *** This bug has been marked as a duplicate of 179228 *** *** Bug 180908 has been marked as a duplicate of this bug. *** |