Hide Forgot
Description of problem: During performance testing of Red Hat Directory Server we found server aborting from g_logv function. This is undesirable, as we would not want to see Directory Server abort due to configuration issue. Version-Release number of selected component (if applicable): glib2-2.26.1-3 How reproducible: consistently Abort occurs when g_logv is trying to log this error: g_logv (log_domain=0x7f82c40cde06 "GLib", log_level=<value optimized out>, format= 0x7f82c40d45d8 "Cannot create pipe main loop wake-up: %s\n", args1=0x7f82d7ffc290) at gmessages.c:557 And it aborts at this line 557. 539 if ((test_level & G_LOG_FLAG_FATAL) && !masquerade_fatal) 540 { [...] 556 #else /* !G_ENABLE_DEBUG || !SIGTRAP */ 557 abort (); 558 #endif /* !G_ENABLE_DEBUG || !SIGTRAP */ 559 #endif /* !G_OS_WIN32 */ The errno is 24 "Too many open files". (gdb) p errno $3 = 24 Stacktrace: Core was generated by `./ns-slapd -D /etc/dirsrv/slapd-ibm-x3655-01 -i /var/run/dirsrv/slapd-ibm-x3655'. Program terminated with signal 6, Aborted. #0 0x00007f6fa0c0b925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007f6fa0c0b925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f6fa0c0d105 in abort () at abort.c:92 #2 0x00007f6f704da98a in g_logv (log_domain=0x7f6f70531e06 "GLib", log_level=<value optimized out>, format= 0x7f6f705385d8 "Cannot create pipe main loop wake-up: %s\n", args1=0x7f6f71ddf290) at gmessages.c:557 #3 0x00007f6f704daa23 in g_log (log_domain=<value optimized out>, log_level=<value optimized out>, format=<value optimized out>) at gmessages.c:577 #4 0x00007f6f704d0ba0 in g_main_context_init_pipe (context=0x7f6ef80690b0) at gmain.c:520 #5 0x00007f6f704d0c53 in g_main_context_new () at gmain.c:615 #6 0x00007f6f704d0e1d in g_main_context_default () at gmain.c:653 #7 0x00007f6f903bbe51 in dbus_connection_setup_with_g_main (connection=0x7f6ef80674d0, context=0x0) at dbus-gmain.c:568 #8 0x00007f6f903bbf8a in dbus_g_bus_get (type=<value optimized out>, error=0x0) at dbus-gmain.c:763 #9 0x00007f6f905d6fc1 in create_manager (pamh=0x7f6ef80577e0, flags=<value optimized out>, argc=0, argv=<value optimized out>) at pam_fprintd.c:126 #10 do_auth (pamh=0x7f6ef80577e0, flags=<value optimized out>, argc=0, argv=<value optimized out>) at pam_fprintd.c:379 #11 pam_sm_authenticate (pamh=0x7f6ef80577e0, flags=<value optimized out>, argc=0, argv=<value optimized out>) at pam_fprintd.c:430 #12 0x00007f6f98a80cee in _pam_dispatch_aux (pamh=0x7f6ef80577e0, flags=0, choice=1) at pam_dispatch.c:110 #13 _pam_dispatch (pamh=0x7f6ef80577e0, flags=0, choice=1) at pam_dispatch.c:407 #14 0x00007f6f98a80600 in pam_authenticate (pamh=0x7f6ef80577e0, flags=0) at pam_auth.c:34 #15 0x00007f6f98c8fef4 in do_one_pam_auth (pb=0x299fac0, method=<value optimized out>, final_method=1, pam_service=<value optimized out>, map_ident_attr=<value optimized out>, fallback=0, pw_response_requested=1) at ldap/servers/plugins/pam_passthru/pam_ptimpl.c:314 #16 0x00007f6f98c9039a in pam_passthru_do_pam_auth (pb=0x299fac0, cfg=<value optimized out>) at ldap/servers/plugins/pam_passthru/pam_ptimpl.c:465 #17 0x00007f6f98c9111a in pam_passthru_bindpreop (pb=0x299fac0) at ldap/servers/plugins/pam_passthru/pam_ptpreop.c:433 #18 0x00007f6fa31a180a in plugin_call_func (list=0x2742210, operation=401, pb=0x299fac0, call_one=0) at ldap/servers/slapd/plugin.c:1453 #19 0x00007f6fa31a1a3f in plugin_call_list (pb=0x299fac0, whichfunction=401) at ldap/servers/slapd/plugin.c:1415 #20 plugin_call_plugins (pb=0x299fac0, whichfunction=401) at ldap/servers/slapd/plugin.c:398 #21 0x000000000040f6d2 in do_bind (pb=0x299fac0) at ldap/servers/slapd/bind.c:706 #22 0x0000000000414134 in connection_dispatch_operation () at ldap/servers/slapd/connection.c:569 #23 connection_threadmain () at ldap/servers/slapd/connection.c:2339 #24 0x00007f6fa15d1b53 in _pt_root (arg=0x29c01f0) at ../../../mozilla/nsprpub/pr/src/pthreads/ptthread.c:156 #25 0x00007f6fa0f749d1 in start_thread (arg=0x7f6f71de2700) at pthread_create.c:301 #26 0x00007f6fa0cc1a8d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Manipulating limits did not prevent this error from occuring: echo "fs.file-max = 64000" >> /etc/sysctl.conf echo "* soft nofile 8192" >> /etc/security/limits.conf echo "* hard nofile 8192" >> /etc/security/limits.conf echo "ulimit -n 8192" >> /etc/profile Additional info:
Hi Colin, Do you think this problem is related to Bug 1003940 (reassigned to fprintd)? Once we get a patched fprintd (fprintd-0.1-20.git04fd09cfa.el6.x86_64 + http://cgit.freedesktop.org/libfprint/fprintd/commit/?id=00ac488cde0f9b916d58af2c1d4f1ae9e05293d0), could this bug be fixed, as well? Or is it a totally different issue? Thanks! --noriko
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
Original summary: Directory Server crashes in g_logv logging "Cannot create pipe main loop wake-up" Directory Server (389-ds-base) uses pam module from its pam plug-in. When the pam passthrough authentication is enabled on the Directory Server, it calls pam_authenticate, in which pipe2 is called eventually. Stacktrace stopped in pipe2: Breakpoint 5, pipe2 () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 pipe2 () at ../sysdeps/unix/syscall-template.S:82 #1 0x00007f56c98d2b04 in g_main_context_init_pipe (context=0x7f56c40188e0) at gmain.c:515 #2 0x00007f56c98d2c53 in g_main_context_new () at gmain.c:615 #3 0x00007f56c98d2e1d in g_main_context_default () at gmain.c:653 #4 0x00007f56ec229e51 in dbus_connection_setup_with_g_main () from /usr/lib64/libdbus-glib-1.so.2 #5 0x00007f56ec229f8a in dbus_g_bus_get () from /usr/lib64/libdbus-glib-1.so.2 #6 0x00007f56ec451fd1 in create_manager (pamh=0x7f56c4001130, flags=<value optimized out>, argc=0, argv=<value optimized out>) at pam_fprintd.c:126 #7 do_auth (pamh=0x7f56c4001130, flags=<value optimized out>, argc=0, argv=<value optimized out>) at pam_fprintd.c:379 #8 pam_sm_authenticate (pamh=0x7f56c4001130, flags=<value optimized out>, argc=0, argv=<value optimized out>) at pam_fprintd.c:430 #9 0x00007f56f2741cee in _pam_dispatch_aux (pamh=0x7f56c4001130, flags=0, choice=1) at pam_dispatch.c:110 #10 _pam_dispatch (pamh=0x7f56c4001130, flags=0, choice=1) at pam_dispatch.c:407 #11 0x00007f56f2741600 in pam_authenticate (pamh=0x7f56c4001130, flags=0) at pam_auth.c:34 #12 0x00007f56f2950ef4 in do_one_pam_auth (pb=0x152c9a0, method=<value optimized out>, final_method=1, pam_service=<value optimized out>, map_ident_attr=<value optimized out>, fallback=0, pw_response_requested=1) at ldap/servers/plugins/pam_passthru/pam_ptimpl.c:314 #13 0x00007f56f295139a in pam_passthru_do_pam_auth (pb=0x152c9a0, cfg=<value optimized out>) at ldap/servers/plugins/pam_passthru/pam_ptimpl.c:465 #14 0x00007f56f295211a in pam_passthru_bindpreop (pb=0x152c9a0) at ldap/servers/plugins/pam_passthru/pam_ptpreop.c:433 [...] When calling pipe2 in g_main_context_init_pipe, a flag O_CLOEXEC is passed to the system call, which is a close-on-exec flag. pam_authenticate execs "/sbin/unix_chkpwd". (gdb) l 515 (in g_main_context_init_pipe (context=0x7f56c40188e0) at gmain.c:515 -- GLIB - Library of useful routines for C programming) 510 if (context->wake_up_pipe[0] != -1) 511 return; 512 513 #ifdef HAVE_PIPE2 514 /* if this fails, we fall through and try pipe */ 515 pipe2 (context->wake_up_pipe, O_CLOEXEC); 516 #endif 517 if (context->wake_up_pipe[0] == -1) 518 { 519 if (pipe (context->wake_up_pipe) < 0) I followed the pam code in gdb but the pipe2 file descriptors were not closed with the close system call in the parent process (Directory server), which makes the Directory Server abort when it reaches the max file descriptor limit (1024, by default). Also, valgrind --track-fds=yes reports the file descriptor leaks: ==11184== Warning: invalid file descriptor 1024 in syscall pipe2() [...] ==11184== FILE DESCRIPTORS: 1008 open at exit. ==11184== Open file descriptor 1022: ==11184== at 0x3568EDBEC7: pipe2 (syscall-template.S:82) This "open at exit" is reported as many as the pam auth request is sent. (In my test 998 times.) I'm running the test using this Distro RHEL6.5-20131009.0 Server x86_64. This is the pam related package list on the system. # rpm -qa | egrep pam pam_passwdqc-1.0.5-6.el6.x86_64 pam_passwdqc-debuginfo-1.0.5-6.el6.x86_64 pam-devel-1.1.1-17.el6.x86_64 pam-debuginfo-1.1.1-17.el6.x86_64 fprintd-pam-0.1-21.git04fd09cfa.el6.x86_64 pam-1.1.1-17.el6.x86_64 This is the glib related package list on the system. # rpm -qa | egrep glib glibc-headers-2.12-1.130.el6.x86_64 glib2-debuginfo-2.26.1-3.el6.x86_64 glibc-debuginfo-common-2.12-1.130.el6.x86_64 glibc-common-2.12-1.130.el6.x86_64 glibc-2.12-1.130.el6.x86_64 dbus-glib-0.86-6.el6_4.x86_64 glibc-devel-2.12-1.130.el6.x86_64 glibc-debuginfo-2.12-1.130.el6.x86_64 glib2-2.26.1-3.el6.x86_64 glib2-devel-2.26.1-3.el6.x86_64 Please note that this leak is not observed if the following line is not in the config file. auth sufficient pam_fprintd.so
(In reply to Noriko Hosoi from comment #1) > Hi Colin, > Do you think this problem is related to Bug 1003940 (reassigned to fprintd)? > Once we get a patched fprintd (fprintd-0.1-20.git04fd09cfa.el6.x86_64 + > http://cgit.freedesktop.org/libfprint/fprintd/commit/ > ?id=00ac488cde0f9b916d58af2c1d4f1ae9e05293d0), could this bug be fixed, as > well? Most likely it would also be fixed by that patch, yes.
I will note once again that 389DS is to the best of my knowledge the only component in RHEL which performs multiple pam conversations per process, and this greatly increases its exposure to issues from attempting to dynamically unload libraries and reload them again.
I'll put this on needinfo until you can test the packages that are now in brew, as mentioned in bug 1003940.
I tested using this combination and found the fd leaks are still observed. # rpm -qa | egrep fprintd fprintd-0.1-21.git04fd09cfa.el6.x86_64 fprintd-debuginfo-0.1-21.git04fd09cfa.el6.x86_64 fprintd-pam-0.1-21.git04fd09cfa.el6.x86_64 Unfortunately, the answer was 'no'... > (In reply to Noriko Hosoi from comment #1) >> Hi Colin, >> Do you think this problem is related to Bug 1003940 (reassigned to fprintd)? >> Once we get a patched fprintd (fprintd-0.1-20.git04fd09cfa.el6.x86_64 + >> http://cgit.freedesktop.org/libfprint/fprintd/commit/ >> ?id=00ac488cde0f9b916d58af2c1d4f1ae9e05293d0), could this bug be fixed, as >> well? > Most likely it would also be fixed by that patch, yes.
(In reply to Noriko Hosoi from comment #7) > I tested using this combination and found the fd leaks are still observed. > # rpm -qa | egrep fprintd > fprintd-0.1-21.git04fd09cfa.el6.x86_64 > fprintd-debuginfo-0.1-21.git04fd09cfa.el6.x86_64 > fprintd-pam-0.1-21.git04fd09cfa.el6.x86_64 Hmm. We could change pam-fprintd.so to create a private DBus connection with a private GMainContext which it then tears down. That'd ensure we leak neither the DBus fd nor the main context wakeup pipe. But...why is this code getting run twice? My understanding of the -znodelete fix is that pam-fprintd.so along with all of its dependencies should remain in memory, and so we shouldn't keep creating main contexts.
Thank you for your response, Colin. (In reply to Colin Walters from comment #8) > (In reply to Noriko Hosoi from comment #7) > > I tested using this combination and found the fd leaks are still observed. > > # rpm -qa | egrep fprintd > > fprintd-0.1-21.git04fd09cfa.el6.x86_64 > > fprintd-debuginfo-0.1-21.git04fd09cfa.el6.x86_64 > > fprintd-pam-0.1-21.git04fd09cfa.el6.x86_64 > > Hmm. We could change pam-fprintd.so to create a private DBus connection > with a private GMainContext which it then tears down. That'd ensure we leak > neither the DBus fd nor the main context wakeup pipe. > > But...why is this code getting run twice? My understanding of the > -znodelete fix is that pam-fprintd.so along with all of its dependencies > should remain in memory, and so we shouldn't keep creating main contexts. Sorry, I think that's a part of pam implementation. We, Directory Sever, are just calling pam_authenticate for auth... We decided to recommend DS users not to include fprintd in the pam configuration, for now. So, we could lower the priority and severity. Could you please talk to our manager, Nathan Kinder, if it is needed? Thanks!
(In reply to Noriko Hosoi from comment #9) > We decided to recommend DS users not to include fprintd in the pam > configuration, for now. So, we could lower the priority and severity. > Could you please talk to our manager, Nathan Kinder, if it is needed? > Thanks! This is correct. There's no reason that Directory Server needs to use pam_fprintd.so, but it would be nice to not leak descriptors in case someone just points DS at the system-auth PAM service file (which includes pam_frintd.so). We will update our documentation saying "don't do that", but there is still potential that someone misses this note and doesn't find out that there is a problem until their production Directory Server stops working when it exhausts it's file descriptors. This isn't urgent, but it would be nice if we could do something about it for RHEL 6.6.
(In reply to Nathan Kinder from comment #10) > This is correct. There's no reason that Directory Server needs to use > pam_fprintd.so, but it would be nice to not leak descriptors in case someone > just points DS at the system-auth PAM service file (which includes > pam_frintd.so). Right. > We will update our documentation saying "don't do that", > but there is still potential that someone misses this note and doesn't find > out that there is a problem until their production Directory Server stops > working when it exhausts it's file descriptors. I think we should strive for a state where RHEL works "out of the box", and so I'll take a look at what we can do from the fprintd/glib sides.
Hi Bastien, I have a customer asking me if we are aware of this issue and wondering if so when it will be fixed. Since 6.6 Beta is out seems it is under consideration for a later release. Can you please provide me with an update of where we are with re: to updating the documentation? Thanks very much! Cynthia Baldwin, RHCE Senior Support Relationship Manager IRC: cbaldwin
*** This bug has been marked as a duplicate of bug 1024825 ***