Description of problem: gkrellm crash with Segmentation fault when user is not in /etc/passwd Version-Release number of selected component (if applicable): gkrellm-2.3.1-3.fc9.i386 How reproducible: It is working after reboot, but after first crash it is crashing always Steps to Reproduce: 1. log in (through gdm or ssh) to LDAP (over SSL) account 2. start gkrellm 3. Actual results: Application crash on start with "Segmentation fault" this appears in /var/log/messages gkrellm[9374]: segfault at 8 ip 010c28b9 sp bfce87b4 error 4 in libcrypto.so.0.9.8g[ff8000+137000] Expected results: working application Additional info:
Thanks for reporting this, unfortunately I do not have a setup which allows me to reproduce this therefor I will need some assistance from you to debug this. Can you please do the following from a terminal as root: yum install yum-utils gdb debuginfo-install gkrellm And then from a terminal as user: gdb gkrellm Now you get a prompt within gdb in this prompt type: run <now wait for gkrellm to crash> bt And then copy and paste the output of the crash and the bt commands please?
[pali@pali-pc ~]$ gdb gkrellm GNU gdb Fedora (6.8-5.fc9) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... (gdb) run Starting program: /usr/bin/gkrellm [Thread debugging using libthread_db enabled] [New Thread 0xb8053710 (LWP 5145)] Program received signal SIGSEGV, Segmentation fault. X509_STORE_add_lookup (v=<value optimized out>, m=<value optimized out>) at x509_lu.c:255 255 sk=v->get_cert_methods; (gdb) bt #0 X509_STORE_add_lookup (v=<value optimized out>, m=<value optimized out>) at x509_lu.c:255 #1 0x02f59030 in X509_STORE_load_locations (ctx=<value optimized out>, file=<value optimized out>, path=<value optimized out>) at x509_d2.c:97 #2 0x02e755fe in SSL_CTX_load_verify_locations (ctx=Could not find the frame base for "SSL_CTX_load_verify_locations". ) at ssl_lib.c:2475 #3 0x00ab27bb in ldap_int_tls_init_ctx () from /usr/lib/libnss_ldap.so.2 #4 0x00ab2dda in alloc_handle () from /usr/lib/libnss_ldap.so.2 #5 0x00ab2fe2 in ldap_int_tls_connect () from /usr/lib/libnss_ldap.so.2 #6 0x00ab3177 in ldap_int_tls_start () from /usr/lib/libnss_ldap.so.2 #7 0x00a923c3 in ldap_int_open_connection () from /usr/lib/libnss_ldap.so.2 #8 0x00aa2288 in ldap_new_connection () from /usr/lib/libnss_ldap.so.2 #9 0x00a921b1 in ldap_open_defconn () from /usr/lib/libnss_ldap.so.2 #10 0x00aa2d58 in ldap_send_initial_request () from /usr/lib/libnss_ldap.so.2 #11 0x00a98372 in ldap_sasl_bind () from /usr/lib/libnss_ldap.so.2 #12 0x00a98901 in ldap_simple_bind () from /usr/lib/libnss_ldap.so.2 #13 0x00a84276 in do_bind (ld=0x8a32e98, timelimit=145068216, dn=0xacf961 "cn=XXXXXXX,ou=YYY,dc=tmapy,dc=cz", pw=0xacf982 "ZZZZZZZZZZZZZZZZ", with_sasl=0) at ldap-nss.c:1851 #14 0x00a8688e in do_with_reconnect (base=0xacfa1d "ou=Users,dc=tmapy,dc=cz", scope=1, filter=0xbfd6c77c "(&(objectClass=posixAccount)(uid=pali))", attrs=0xad06e0, sizelimit=1, private=0xbfd6cfc4, search_func=0xa874d0 <do_search_s>) at ldap-nss.c:1697 #15 0x00a872be in _nss_ldap_search_s (args=0xbfd6d010, filterprot=0xad3ea0 "(&(objectClass=posixAccount)(uid=%s))", sel=LM_PASSWD, user_attrs=0x0, sizelimit=1, res=0xbfd6cfc4) at ldap-nss.c:3154 #16 0x00a87a87 in _nss_ldap_getbyname (args=0xbfd6d010, result=0xbfd6d0b4, buffer=0x8a26700 "hsqldb", buflen=1024, errnop=0xb80536d8, filterprot=0xad3ea0 "(&(objectClass=posixAccount)(uid=%s))", sel=LM_PASSWD, parser=0xa87cd0 <_nss_ldap_parse_pw>) at ldap-nss.c:3501 #17 0x00a881c0 in _nss_ldap_getpwnam_r (name=0xbfd6ef4c "pali", result=0xbfd6d0b4, buffer=0x8a26700 "hsqldb", buflen=1024, errnop=0xb80536d8) at ldap-pwd.c:245 #18 0x00c31782 in __getpwnam_r (name=<value optimized out>, resbuf=<value optimized out>, buffer=<value optimized out>, buflen=<value optimized out>, result=<value optimized out>) at ../nss/getXXbyYY_r.c:253 #19 0x0017caf6 in g_get_any_init_do () at gutils.c:1596 #20 0x0017e5ed in IA__g_get_home_dir () at gutils.c:1747 #21 0x0078599a in gtk_rc_add_initial_default_files () at gtkrc.c:540 #22 0x00786394 in _gtk_rc_init () at gtkrc.c:863 #23 0x007282e4 in do_post_parse_initialization (argc=Could not find the frame base for "do_post_parse_initialization". ) at gtkmain.c:681 #24 0x0072837f in post_parse_hook (context=Could not find the frame base for "post_parse_hook". ) at gtkmain.c:721 #25 0x00156f7d in IA__g_option_context_parse (context=<value optimized out>, argc=<value optimized out>, argv=<value optimized out>, error=<value optimized out>) at goption.c:1806 #26 0x0072866d in IA__gtk_parse_args (argc=Could not find the frame base for "IA__gtk_parse_args". ) at gtkmain.c:876 #27 0x007286e5 in IA__gtk_init_check (argc=Could not find the frame base for "IA__gtk_init_check". ) at gtkmain.c:912 #28 0x0072872f in IA__gtk_init (argc=Could not find the frame base for "IA__gtk_init". ) at gtkmain.c:950 #29 0x0805bbf2 in main (argc=1, argv=0xbfd6d464) at main.c:2062 (gdb) hsqldb is las account in /etc/passwd If you need more information I will try do my best.
Hmm, Its crashing deep in gtk, only a few lines of code into main, so this does not seem gkrellm specific in anyway, what happens when you start another gtk app from this account, I have the feeling this is a generic gtk bug. Can you for example try, as root: yum install gtkterm And then as the troublesome user: gtkterm Thanks.
I have installed gtkterm then I have started it and no crash. It looks stable. But I don't know, what can I do with gtkterm. I have tried other programs: yum install gonvert.noarch gobby.i386 gmrun.i386 gimmix.i386 geany.i386 gajim.i386 galculator.i386 None of them crash while running.
Hmm, I might have an idea what is causing this problem, and I've made an attempt at fixing it, can you please try the rpms from this scratch build ? : http://koji.fedoraproject.org/koji/taskinfo?taskID=616463
I have got it: [pali@pali-pc ~]$ rpm -qa | grep gkrellm gkrellm-2.3.1-4.fc9.i386 gkrellm-debuginfo-2.3.1-4.fc9.i386 but it is always crashing. Do you want backtrace again? This is my idea what might be causing this problem: LDAP information is not available anonymously. LDAP bind is required, in past I have had problems with this settions in some application but I don't remember details (it was maybe in Solaris or previous fedora/centos/rhel versions).
Okay, I've now changed gkrellm so that the init code before calling gtk_init is identical to gtkterm, so now it should no longet crash in gtk_init or something really strange is going on, can you give this version a spin: http://koji.fedoraproject.org/koji/taskinfo?taskID=627839 And if that crashes again, do a backtrace again? Thanks!
There is something wrong. When I click to -> buildArch (gkrellm-2.3.1-4.fc10.src.rpm, i386) this is: http://koji.fedoraproject.org/koji/taskinfo?taskID=627842 I can see only old binaries: Output build.log root.log state.log gkrellm-2.3.1-4.fc9.i386.rpm gkrellm-daemon-2.3.1-4.fc9.i386.rpm gkrellm-debuginfo-2.3.1-4.fc9.i386.rpm gkrellm-devel-2.3.1-4.fc9.i386.rpm How can I get new ones?
The binaries at that link are new ones, but I didn't bump the release compared to my previous test version / scratch build. Here is (part of) the output of rpm -qip on gkrellm.i386 from: http://koji.fedoraproject.org/koji/taskinfo?taskID=627842 Name : gkrellm Relocations: (not relocatable) Version : 2.3.1 Vendor: Fedora Project Release : 4.fc9 Build Date: Sun 25 May 2008 02:24:54 PM CEST Note the build date!
[pali@pali-pc ~]$ gdb gkrellm GNU gdb Fedora (6.8-5.fc9) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... (gdb) run Starting program: /usr/bin/gkrellm [Thread debugging using libthread_db enabled] [New Thread 0xb8094710 (LWP 29767)] Program received signal SIGSEGV, Segmentation fault. X509_STORE_add_lookup (v=<value optimized out>, m=<value optimized out>) at x509_lu.c:255 255 sk=v->get_cert_methods; (gdb) bt #0 X509_STORE_add_lookup (v=<value optimized out>, m=<value optimized out>) at x509_lu.c:255 #1 0x02f59030 in X509_STORE_load_locations (ctx=<value optimized out>, file=<value optimized out>, path=<value optimized out>) at x509_d2.c:97 #2 0x005e05fe in SSL_CTX_load_verify_locations (ctx=Could not find the frame base for "SSL_CTX_load_verify_locations". ) at ssl_lib.c:2475 #3 0x004f47bb in ldap_int_tls_init_ctx () from /usr/lib/libnss_ldap.so.2 #4 0x004f4dda in alloc_handle () from /usr/lib/libnss_ldap.so.2 #5 0x004f4fe2 in ldap_int_tls_connect () from /usr/lib/libnss_ldap.so.2 #6 0x004f5177 in ldap_int_tls_start () from /usr/lib/libnss_ldap.so.2 #7 0x004d43c3 in ldap_int_open_connection () from /usr/lib/libnss_ldap.so.2 #8 0x004e4288 in ldap_new_connection () from /usr/lib/libnss_ldap.so.2 #9 0x004d41b1 in ldap_open_defconn () from /usr/lib/libnss_ldap.so.2 #10 0x004e4d58 in ldap_send_initial_request () from /usr/lib/libnss_ldap.so.2 #11 0x004da372 in ldap_sasl_bind () from /usr/lib/libnss_ldap.so.2 #12 0x004da901 in ldap_simple_bind () from /usr/lib/libnss_ldap.so.2 #13 0x004c6276 in do_bind (ld=0x90848d8, timelimit=151694072, dn=0x511961 "cn=XXX,ou=YYY,dc=tmapy,dc=cz", pw=0x511982 "ZZZ", with_sasl=0) at ldap-nss.c:1851 #14 0x004c888e in do_with_reconnect (base=0x511a1d "ou=Users,dc=tmapy,dc=cz", scope=1, filter=0xbfbaeb5c "(&(objectClass=posixAccount)(uid=pali))", attrs=0x5126e0, sizelimit=1, private=0xbfbaf3a4, search_func=0x4c94d0 <do_search_s>) at ldap-nss.c:1697 #15 0x004c92be in _nss_ldap_search_s (args=0xbfbaf3f0, filterprot=0x515ea0 "(&(objectClass=posixAccount)(uid=%s))", sel=LM_PASSWD, user_attrs=0x0, sizelimit=1, res=0xbfbaf3a4) at ldap-nss.c:3154 #16 0x004c9a87 in _nss_ldap_getbyname (args=0xbfbaf3f0, result=0xbfbaf494, buffer=0x9078108 "gkrellmd", buflen=1024, errnop=0xb80946d8, filterprot=0x515ea0 "(&(objectClass=posixAccount)(uid=%s))", sel=LM_PASSWD, parser=0x4c9cd0 <_nss_ldap_parse_pw>) at ldap-nss.c:3501 #17 0x004ca1c0 in _nss_ldap_getpwnam_r (name=0xbfbb0eda "pali", result=0xbfbaf494, buffer=0x9078108 "gkrellmd", buflen=1024, errnop=0xb80946d8) at ldap-pwd.c:245 #18 0x00c31782 in __getpwnam_r (name=<value optimized out>, resbuf=<value optimized out>, buffer=<value optimized out>, buflen=<value optimized out>, result=<value optimized out>) at ../nss/getXXbyYY_r.c:253 #19 0x0088baf6 in g_get_any_init_do () at gutils.c:1596 #20 0x0088d5ed in IA__g_get_home_dir () at gutils.c:1747 #21 0x0554899a in gtk_rc_add_initial_default_files () at gtkrc.c:540 #22 0x05549394 in _gtk_rc_init () at gtkrc.c:863 #23 0x054eb2e4 in do_post_parse_initialization (argc=Could not find the frame base for "do_post_parse_initialization". ) at gtkmain.c:681 #24 0x054eb37f in post_parse_hook (context=Could not find the frame base for "post_parse_hook". ) at gtkmain.c:721 #25 0x00865f7d in IA__g_option_context_parse (context=<value optimized out>, argc=<value optimized out>, argv=<value optimized out>, error=<value optimized out>) at goption.c:1806 #26 0x054eb66d in IA__gtk_parse_args (argc=Could not find the frame base for "IA__gtk_parse_args". ) at gtkmain.c:876 #27 0x054eb6e5 in IA__gtk_init_check (argc=Could not find the frame base for "IA__gtk_init_check". ) at gtkmain.c:912 #28 0x054eb72f in IA__gtk_init (argc=Could not find the frame base for "IA__gtk_init". ) at gtkmain.c:950 #29 0x0805bbe5 in main (argc=1, argv=0xbfbaf844) at main.c:2069
I have just tried this: I have added one line to /etc/passwd pali:x:####:####:Pavel Lisy:/home/pali:/bin/bash where #### are my UID and GID from LDAP and gkrellm doesn't crash with it. Similar line in /etc/shadow isn't necessary. I hope this can give you new direction for debugging. When I comment it out # pali:x:####:####:Pavel Lisy:/home/pali:/bin/bash it is crashing again.
Hmm, This is really weird, as with my latest changes the part of main() executed before gtk_init is identical for gtkterm and gkrellm. So the c-code path which is walked before the crash is identical! Yet you still say that gtkterm works fine, even when you start it as the user pali, and on the same machine, started from the same terminal, gkrellm crashes?
I am not sure. Strange thing is that after reboot gkrellm was working some time (maybe more then 1 hour) well but then it crashed. After first crash it is crashing every time when I try to start it again (up to next reboot). I can start gtkterm and it is running. No crash. But I don't know if it could crash after some time too, because it is doing nothing. It was running 30-60min with no crash.
When gkrellm crashes when started, do other gtk apps like gtkterm still start?
ping: (In reply to comment #14) > When gkrellm crashes when started, do other gtk apps like gtkterm still start? >
Yes. gtkterm still start after gkrellm is crashing. BTW. I don't understand what ping: means? Is it only error?
ping is a way of saying: hello are you still there? Anyways I'm out of clues what might be the cause, given that the crash is happing deep in ldap_nss I'm going to change the component to ldap_nss, and hope that the maintainer of that has an idea whats happening here.
I can confirm this bug. I also use ldap_nss and gkrellm is crashing in X509_STORE_add_lookup() for me. I have compiled gkrellm myself and found that the executable gets linked BOTH against openssl and gnutls. It seems that this is causing the crash, since if I configure gkrellm not to use gnutls it works fine. When gkrellm is linked both against gnutls and openssl it crashes like this: (gdb) where #0 0x00000000076aaf60 in X509_STORE_add_lookup (v=0x0, m=0x79322e0) at x509_lu.c:255 #1 0x00000000076a47b1 in X509_STORE_load_locations (ctx=0x0, file=0x1a601a0 "/etc/openldap/cacerts/ldap.spectr.lan.cert", path=0x1a60220 "/etc/openldap/cacerts") at x509_d2.c:90 #2 0x0000000000d9e75d in SSL_CTX_load_verify_locations () from /home/zap/rpm/BUILD/openssl-0.9.8g/libssl.so.7 Then I recompile it to use ONLY openssl, and put a breakpoint on X509_STORE_add_lookup. Then I run it and get this: (gdb) where #0 X509_STORE_add_lookup (v=0x1c588c0, m=0xa3d2e0) at x509_lu.c:250 #1 0x00000000007af7b1 in X509_STORE_load_locations (ctx=0x1c588c0, file=0x1c22890 "/etc/openldap/cacerts/ldap.spectr.lan.cert", path=0x1c22910 "/etc/openldap/cacerts") at x509_d2.c:90 #2 0x000000000014d75d in SSL_CTX_load_verify_locations () from /home/zap/rpm/BUILD/openssl-0.9.8g/libssl.so.7 As you can see, X509_STORE_load_locations() is called with a NULL context in first case. I guess that's because libnss_ldap gets some functions from openssl and some from gnutls. A simple fix is to patch configure not to link against gnutls. For this I have added "CONFIGURE_ARGS=--without-gnutls" to the make invokation in the %build section.
The F-9 gkrellm-2.3.1-3.fc9 is not linked against openssl, only gnutls. Distributing a build of it linked with openssl would probably be a no go due to license incompatibilities.
Okay, I think I know whats going on here, gkrellm links to libgnutls-openssl which is a special sub lib of gnutls which makes it easy to port openssl programs to gnutls, and which thus provides symbols also found in openssl. ldap_nss is build against openssl and when it gets loaded in to the gkrellm process it starts using the openssl symbols from gnutls-openssl (and maybe even a few from openssl itself too) -> boom ! Solution: 1) build gkrellm without ssl support 2) build gkrellm with openssl (after asking upstream about the license issue) But this problem is bigger then gkrellm, any binary linked against gnutls-openssl and using nss will cause problems when ldap_nss is in use, so I tend to blame ldap_nss here, luckily the affected list seems small: [hans@localhost src]$ repoquery -q --whatrequires 'libgnutls-openssl.so.26()(64bit)' mcabber-0:0.9.7-1.fc10.x86_64 gnutls-0:2.4.1-1.fc10.x86_64 gnutls-devel-0:2.4.1-1.fc10.x86_64 zoneminder-0:1.23.3-1.fc10.x86_64 gkrellm-0:2.3.1-4.fc10.x86_64 So the other 2 problematic programs are zoneminder and mcabber, neither of which are very popular. I'll start a thread about this on fedora-development to see how we want to solve this in general.
Indeed, gkrellm in f9 is not linked directly but this does not stop it from crashing in libssl.so :-) In fact that's what made me think that it is linked against it...
IMHO, glibc's nss_ldap should be ported to use Mozilla NSS instead of OpenSSL as its security implementation, then this problem will just go away, and it would also help the CryptoConsolidation project.
(In reply to comment #22) > IMHO, glibc's nss_ldap should be ported to use Mozilla NSS instead of OpenSSL > as its security implementation, then this problem will just go away, and it > would also help the CryptoConsolidation project. Um, it's not actually part of glibc. NSS has a direct dependency on libpthread, which makes it unsuitable for use inside of an nsswitch module (at least, it did the last time I heard anything about it).
For a data point, can people who are running into this check if the packages in http://koji.fedoraproject.org/scratch/nalin/task_787844/ (available for a limited time only) prevent this error from occurring? The workaround being used here is to static link with libssl and libcrypto, similarly to how it already does to libldap and liblber, so it's less than ideal from a maintenance standpoint.
Yes, I can confirm it. It is working now! Thanks.
(In reply to comment #24) > For a data point, can people who are running into this check if the packages in > http://koji.fedoraproject.org/scratch/nalin/task_787844/ (available for a > limited time only) prevent this error from occurring? > > The workaround being used here is to static link with libssl and libcrypto, > similarly to how it already does to libldap and liblber, so it's less than > ideal from a maintenance standpoint. Thanks for doing this, together with Pavel's confirmation this proofs that my theory about what was causing this is right. Now the question is howto solve this. I'm waiting for confirmation from Spot, but it seems that I can make gkrellm use the real openssl without license issues as openssl is considered a system library. But that only solves this for gkrellm, of course as long as we can use the real openssl with GPL applications mcabber and zoneminder could be fixed by using the real openssl too, esp as gnutls is GPLv3 now, so chances are that mcabber and zoneminder currently actually have a license issue. Then we can remove gnutls-openssl al together to avoid this problem in the future. So what do others think of just getting rid of gnutls-openssl ? Another problem is the openssl compat of nss, I will file a seperate bug for that one as that one really needs fixing (as we want to standardize on nss as the crypto lib in the future).
I've filed bugs against libnss-compat-ossl (bug 460305) and agaisnt libgnutls-openssl (bug 460310) asking them to change their openssl compatiblity headers (and the matching libs) in such a way that there are no symbol collisions.
I've just gotten permission from upstream to distribute gkrellm linked to openssl, which will fix this, so I'm reassigning this to me and doing a new gkrellm for devel and F-9 build against openssl.
gkrellm-2.3.1-5.fc9 has been submitted as an update for Fedora 9. http://admin.fedoraproject.org/updates/gkrellm-2.3.1-5.fc9
Ok, a fixed gkrellm has been submitted to updates-testing for F-9, I've set things up so that one positive feedback in bodhi will make it get pushed to stable. So please test this and if it works, add a positive vote to: http://admin.fedoraproject.org/updates/gkrellm-2.3.1-5.fc9
gkrellm-2.3.1-5.fc9 has been pushed to the Fedora 9 stable repository. If problems still persist, please make note of it in this bug report.