Created attachment 497881 [details] Crash traceback Description of problem: Running gnucash yields the splash screen, then a segfault. The last message in the splash window is "gnucash/import-export/aqbanking". Version-Release number of selected component (if applicable): gnucash-2.4.5-2.fc16.x86_64 How reproducible: 100% Steps to Reproduce: 1. Run gnucash 2. Sweep up the core dump 3. Actual results: Segmentation violation, crash, no application window Expected results: Normal gnucash window showing the dire state of my finances. Additional info: If you could fix my finances too that would be extra cool.
This crash appears to be somewhere in the gnutls/libgcrypt stack, and I do note the gnutls is different between F-15 and F-16. Moving there for now.
This also happens in Fedora 14 to me, but only if there are open reports. If a delete/remove .gnucash/ gnucash start fine, but as soon as I open a report, gnucash crashes. On Fedora 15 gnucash seems to be working fine.
Removing .gnucash changes nothing for me; I still get a segfault on startup.
Does this still persist? gnucash works fine for me in F-16, although I haven't spun up a rawhide VM yet.
As of yesterday's rawhide, yes, the problem is still there.
*** Bug 742202 has been marked as a duplicate of this bug. ***
From the duplicated bug: "Tomas Mraz 2011-10-06 02:21:23 EDT I suspect gnucash does something wrong at startup perhaps it tries to initialize the gnutls multiple times simultaneously or something similar." gnucash itself doesn't use gnutls. Moving to gwenhywfar.
So, some debugging: The gnutls/gcrypt initialization is done from gwenhywfar, which is brought in by AQBanking. This *is* actually odne twice during gnucash setup. First, when gnucash scans for its modules, it dlopen()s the module, and checks it for symbols. This calls the initialization constructor in gwenhywfar. However, the module is then closed, calling the destructor. It's then opened again when the module is fully initialized. Reading the gwenhywfar code, it looks like it won't call the destructor for gnutls if it didn't think it initializaed itself correctly. Can anyone who's seeing this attach the output of "GWEN_LOGLEVEL=debug gnucash"?
*** Bug 744310 has been marked as a duplicate of this bug. ***
GWEN_LOGLEVEL=debug give me: 7:2011/10/07 14-04-23:gwen(52702):gwenhywfar.c: 250: Initializing I18N module 6:2011/10/07 14-04-23:gwen(52702):i18n.c: 199: Real locale is [en_US.utf8] 7:2011/10/07 14-04-23:gwen(52702):gwenhywfar.c: 254: Initializing InetAddr module 7:2011/10/07 14-04-23:gwen(52702):gwenhywfar.c: 258: Initializing Socket module 7:2011/10/07 14-04-23:gwen(52702):gwenhywfar.c: 262: Initializing Libloader module 7:2011/10/07 14-04-23:gwen(52702):gwenhywfar.c: 266: Initializing Crypt3 module 7:2011/10/07 14-04-23:gwen(52702):gwenhywfar.c: 270: Initializing Process module 7:2011/10/07 14-04-23:gwen(52702):gwenhywfar.c: 274: Initializing Plugin module 7:2011/10/07 14-04-23:gwen(52702):gwenhywfar.c: 278: Initializing DataBase IO module 6:2011/10/07 14-04-23:gwen(52702):plugin.c: 544: Plugin type "dbio" registered 6:2011/10/07 14-04-23:gwen(52702):dbio.c: 106: Adding plugin path [/usr/lib64/gwenhywfar/plugins/60/dbio] 7:2011/10/07 14-04-23:gwen(52702):gwenhywfar.c: 282: Initializing ConfigMgr module 6:2011/10/07 14-04-23:gwen(52702):plugin.c: 544: Plugin type "configmgr" registered 6:2011/10/07 14-04-23:gwen(52702):configmgr.c: 80: Adding plugin path [/usr/lib64/gwenhywfar/plugins/60/configmgr] 7:2011/10/07 14-04-23:gwen(52702):gwenhywfar.c: 286: Initializing CryptToken2 module 6:2011/10/07 14-04-23:gwen(52702):plugin.c: 544: Plugin type "ct" registered 6:2011/10/07 14-04-23:gwen(52702):ctplugin.c: 65: Adding plugin path [/usr/lib64/gwenhywfar/plugins/60/ct] 6:2011/10/07 14-04-23:gwen(52702):plugin.c: 574: Plugin type "ct" unregistered 6:2011/10/07 14-04-23:gwen(52702):plugin.c: 574: Plugin type "configmgr" unregistered 6:2011/10/07 14-04-23:gwen(52702):plugin.c: 574: Plugin type "dbio" unregistered Segmentation fault (core dumped)
That doesn't shed any light on it, alas. I'm assuming if you break on gnutls_global_init(), it's only called twice - once from gnc_module_get_info very early, and then when it crashes?
Also, given that multiple people seem to be able to reproduce this (and I can't): - Is there anything unusual in your setup? (weird environment variables, unusual authentication methods, etc) - Are you using the online account access in GnuCash?
Yep. First call from gnc_module_system_refresh(), second from somewhere else in libgnc-module.so (I don't have all the debuginfo on the system, can fix that if it would help).
I don't *think* I have anything that weird in my setup. I have no strange auth methods and am not using online account access. That said, something must clearly be different somewhere, since it doesn't hit everybody.
One other debugging aid may be installing the gnutls debuginfo and stepping through gnutls_global_deinit to see if something looks like it's going awry there.
Here's what's happening: * gnutls is loaded and initialized when libgncmod-aqbanking.so is loaded * gnutls initializes libgcrypt * when initializing libgcrypt, gnutls passes in pointers to gnutls mutex callback functions * during initialization, libgcrypt uses gnutls mutex management functions to create mutex * this creates private data about the mutex inside gnutls * gnutls is unloaded when libgncmod-aqbanking.so is unloaded * However, LIBGCRYPT IS NOT UNLOADED, because it is linked directly against gnucash rather than loaded dynamically with libgncmod-aqbanking.so * gnutls is loaded and initialized again later when libgncmod-aqbanking.so is loaded again * gnutls initializes libgcrypt again * but libgcrypt was never loaded and so it thinks it's still initialized * but the private data associated with the mutex that libgcrypt created and still has is no longer valid, because gnutls's private data was erased when it was unloaded * bam, segfault when libgcrypt tries to use the mutex Easiest fix: link gnutls directly against gnucash and call gnutls_global_init() before loading any modules.
Woops, in the third bullet from the end, I should have said, "libgcrypt was never UNloaded".
Created attachment 527065 [details] patch to link against libgnutls Actually, it's even easier than that. You don't need to modify the source code to call gnutls_global_init. You just need to link against libgnutls when compiling gnucash so that it doesn't get unloaded when aqbanking gets unloaded. The attached patch does this.
Moving this ticket back to gnucash, since it's a gnucash shared-library loading/unloading thing that's causing the issue and a gnucash patch (attached to my last comment) is needed to fix it.
Just adding a "me too" here. I haven't done any serious debugging on this, but I was using gnucash on F15 with no problem, and it broke when I upgraded to F16 Alpha last month. If there are any additional data points that I could give to help with this one, let me know.
+1 to Jonathan's patch. I applied it and rebuilt, and no more segfault for me. Thanks!
I'd question why GnuCash is linking against libgcrypt directly? Can someone pass this patch (or at least this bug report) upstream?
I doubt GC is linking against libgcrypt directly. It's linking against another shared library that links against libgcrypt.
(In reply to comment #22) > I'd question why GnuCash is linking against libgcrypt directly? LD_DEBUG=all shows initialization goes: libgncmod-gnome-utils -> libgnome-keyring -> libgcrypt, in a brief check here. This does imply a more generic issue with the load-all-modules, unload-all-modules, load-all-modules-again method that could pop up again later. Of course, most libraries that these modules use don't have initialization side effects.
Reading this, it seems like gnutls_global_deinit() should call the inverse of gnutls_crypto_init() ... it doesn't. (Possibly because such a function doesn't exist.)
(In reply to comment #25) > Reading this, it seems like gnutls_global_deinit() should call the inverse of > gnutls_crypto_init() ... it doesn't. (Possibly because such a function doesn't > exist.) Unfortunately it's not that simple. There actually appear to be significant architectural issue in the way that gcrypt and gnutls interact with each other. For example, as noted previously which gnutls initializes gcrypt, it passes in a set of callbacks inside gnutls for gcrypt to use. These callbacks are global to gcrypt, i.e., there's only one set of callbacks for all the things calling into gcrypt. But one if something else besides gnutls wants to initialize gcrypt with its own callbacks? It can't... they both can't exist in the same program at the same time. And this isn't even visible to the caller... If gnutls initializes gcrypt with its own callbacks, and then something else initializes gcrypt with its callbacks, the latter initialization will "succeed" and the caller won't know that his callbacks aren't actually going to be used. Similarly, gnutls can't just uninitalize gcrypt, because there may be something other than gnutls using gcrypt, and if gcrypt is uninitialized that other code will break. This could be avoided by requiring different instances of gcrypt to be instantiated with different static data for anyone who links against it, but I'm not even sure if that can be enforced by the shared library itself, as opposed to requiring that whoever is doing the linking explicitly requesting it. Furthermore, doing that would cause significant architectural issues of its own. Bill, if you want to wade into these shark-infested waters to try and figure out just how gnutls and gcrypt are related to each other and how all this should be structured and implemented, I wish you the best of luck, but in the meantime, I hope you'll just patch gnucash so it doesn't keep crashing on people. :-)
Oh, I can patch gnucash, it just seems like a hack, hand also doesn't necessarily explain why a large number of people hit this reliably, and others (like me) don't hit it at all.
(In reply to comment #27) > Oh, I can patch gnucash, it just seems like a hack, Yeah, but the stuff gnucash does to load / unload / load modules again is also a bit of a hack that certainly pushes the boundaries of the dynamic loading system, so think of it as one hack compensating for the breakage caused by another one. :-) > and also doesn't > necessarily explain why a large number of people hit this reliably, and others > (like me) don't hit it at all. It's memory-management-dependent, so it depends on exactly what executes when gnucash starts up and in what order. You may be running it on a different architecture, or with different perl modules, or with different versions of shared libraries, or with different gnucash preferences that cause load-time behavior to change, etc., etc.
(In reply to comment #28) > > and also doesn't > > necessarily explain why a large number of people hit this reliably, and others > > (like me) don't hit it at all. > > It's memory-management-dependent, so it depends on exactly what executes when > gnucash starts up and in what order. You may be running it on a different > architecture, or with different perl modules, or with different versions of > shared libraries, or with different gnucash preferences that cause load-time > behavior to change, etc., etc. It shouldn't be, though - if the error is because libgcrypt is being brought in by gnucash itself via DSO dependencies, it's going to be linked in before module scanning, no matter what.
(In reply to comment #29) > It shouldn't be, though - if the error is because libgcrypt is being brought in > by gnucash itself via DSO dependencies, it's going to be linked in before > module scanning, no matter what. When gcrypt is loaded into memory is not the issue. The issue is that gcrypt caches references to data within gnutls when gcrypt is initialized by gnutls, and then gnutls is unloaded and those references become invalid.
From testing, it's it's prelink dependent. Prelink does ahead-of-time linking, which causes gnutls to always get the same address space, so things just happen to work. I suspect that everyone who's seeing this isn't running prelink....
I am running prelink.
Created attachment 527492 [details] slightly different patch Here's what I'm intending to push and build. This changes GnuCash so that, on successfully scanning a module, it marks it as resident so it won't be dlclose()d/unloaded. This solves the problem here, and should cover any non-gnutls cases that might pop up later. Given that gnc_module_unload() actually *doesn't* call g_module_close(), it seems to be in-line with what GnuCash is doing in the main module system.
Your fix is obviously better than mine. Thanks for taking the time to come up with it!
gnucash-2.4.7-3.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/gnucash-2.4.7-3.fc16
Jonathan - thanks for the help in tracking the problem down.
Yay - gnucash works again! Thanks. There's just one other little problem: it reports that I spent all my money and can't afford to buy beer. I'd really rather it showed my bank balance as being rather higher and that college tuition payment already made. But I guess I should probably file a separate bug report for that one.
Package gnucash-2.4.7-3.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing gnucash-2.4.7-3.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-14258 then log in and leave karma (feedback).
Package: gnucash-2.4.7-1.fc16 Architecture: x86_64 OS Release: Fedora release 16 (Verne) Comment ----- Start gnucash in a freshly installed F16beta.
fyi, gnucash-2.4.7-4.fc16, works for me.
gnucash-2.4.7-4.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.