Bug 253069 - cyclic dependency from getpwnam() in log rotation code
cyclic dependency from getpwnam() in log rotation code
Status: CLOSED CURRENTRELEASE
Product: Red Hat Directory Server
Classification: Red Hat
Component: Directory Server (Show other bugs)
6.21
All Linux
low Severity low
: ---
: ---
Assigned To: Noriko Hosoi
Viktor Ashirov
:
Depends On:
Blocks: 240316
  Show dependency treegraph
 
Reported: 2007-08-16 16:40 EDT by Ulf Weltman
Modified: 2016-05-06 10:47 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-06 10:47:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
cvs diff (ldapserver) (5.93 KB, patch)
2007-08-16 21:17 EDT, Noriko Hosoi
no flags Details | Diff
cvs commit message (1.08 KB, text/plain)
2007-08-16 22:13 EDT, Noriko Hosoi
no flags Details

  None (edit)
Description Ulf Weltman 2007-08-16 16:40:12 EDT
We've spotted a problem in DS 6.21 SP3, though I believe the same thing should
happen with DS 7.1 SP2 since it has the same fix:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173687

Here's the problem scenario:
1) A new connection is made to the Directory Server.  The main daemon loop calls
PR_Accept().
2) PR_Accept() returns an error causing an error log message to be printed. 
This may be more common on HP-UX, as every so often PR_Accept() error message is
printed in the error log regarding insufficient resources.  This can be caused
during heavy load and an impatient and "when the transport stack received RST
just after SYN was received .. [and] RST comes before accept()".   This is
unrelated, the point is that we're printing an error message from the main
accept loop.
3) The time has come to rotate the error log; before the PR_Accept() error
message can be printed, we must rotate the log.
4) With the 173687 fix, the error log rotation code calls getpwnam() to get the
passwd structure of the run-time user of the Directory Server in order to fix
the ownership before rotating.
5) nsswitch is configured to use ldap before files.
6) The entity handling the ldap backend for the name service (LDAP-UX in our
case) attempts to contact Directory Server to see if the local user exists in
the LDAP backend.  This connection hangs since the daemon loop is hung and not
accepting connections.
7) Any further connections to the Directory Server also hang since the daemon
loop is blocked, waiting on the error log rotation to complete.

There are workarounds:  extend the error log rotation period to delay the
problem, ensure the ldap backend is not used for the lookup of the
nsslapd-localuser, and set a timeout on the entity handling the ldap name
service lookup.

But calling getpwnam() when we're up and running seems like an invitation to
cyclical dependency problems since one common use of DS is as a name service.
Any ideas for a permanent fix?  Maybe we can stash the uid and gid of the
nsslapd-localuser in some global when we use getpwnam() at startup?
Comment 1 Noriko Hosoi 2007-08-16 21:17:05 EDT
Created attachment 161709 [details]
cvs diff (ldapserver)

Thanks, Ulf, for reporting the problem and the suggestion to fix the bug.
Following your advice, I'm moving the getpwnam to the startup time and
introducing slapdFrontendConfig->config_set_localuser to store the copy of
"struct pw", which is reused unless localuser is reset.  If you could review
the diffs, I'd appreciate it.

Fix description:
protect_db.c 
  -- instead of calling getpwnam, use slapdFrontendConfig->localuserinfo.
main.c
  -- set slapdFrontendConfig->localuserinfo if it's NULL.  
     If getpwnam(localuser) fails, we can capture it here.
log.c
  -- instead of calling getpwnam, use slapdFrontendConfig->localuserinfo.
libglobs.c
  -- set slapdFrontendConfig->localuserinfo in config_set_localuser, as well.
slap.h
  -- added localuserinfo to slapdFrontendConfig_t
Comment 2 Rich Megginson 2007-08-16 21:40:46 EDT
Ok, but please add the new member *localuserinfo as the last member of the
structure, to minimize binary incompatability.
Comment 3 Noriko Hosoi 2007-08-16 22:06:18 EDT
(In reply to comment #2)
> Ok, but please add the new member *localuserinfo as the last member of the
> structure, to minimize binary incompatability.
Thanks, Rich.  I've moved locauserinfo to the last of the structure.
Index: slap.h
===================================================================
RCS file: /cvs/dirsec/ldapserver/ldap/servers/slapd/slap.h,v
retrieving revision 1.24
diff -t -w -U2 -r1.24 slap.h
--- slap.h      3 Aug 2007 22:14:41 -0000       1.24
+++ slap.h      17 Aug 2007 02:03:43 -0000
@@ -1926,4 +1926,7 @@
   char *ldapi_search_base_dn;   /* base dn to search for mapped entries */
   char *ldapi_auto_dn_suffix;   /* suffix to be appended to auto gen DNs */
+#ifndef _WIN32
+  struct passwd *localuserinfo; /* userinfo of localuser */
+#endif /* _WIN32 */
 } slapdFrontendConfig_t;

And there was some mistakes in my comment in comment #1. (sorry.)
> I'm moving the getpwnam to the startup time and
introducing slapdFrontendConfig->localuserinfo to store the copy of
                                 ^^^^^^^^^^^^^
"struct pw", which is reused unless localuser is reset.
Comment 4 Noriko Hosoi 2007-08-16 22:13:25 EDT
Created attachment 161711 [details]
cvs commit message

Reviewed by Rich (Thank you!)

Checked in into HEAD.
Comment 5 Ulf Weltman 2007-08-17 13:51:03 EDT
Wow, thanks for the fast patch!!!  It looks good to me.
I will apply to our 6.21 branch and spin a patch.
Comment 6 Noriko Hosoi 2007-08-17 14:01:36 EDT
You are welcome, Ulf.

I've tested the change with verbose printout to the errors log with 1MB error
log size limit + rsearch.  Verified log lotations were made without any
problems.  But, I don't have the test env including the DS in name service
switch handy...  If possible, could you run the test and update us?  Thank you
in advance!!
Comment 7 Ulf Weltman 2007-08-17 17:31:41 EDT
I have tested with ldap name service configured to point to the same server, and
I caused the accept loop to print messages for the specific deadlock scenario.
Looks good!
Comment 8 Noriko Hosoi 2007-08-17 20:19:33 EDT
Thanks a lot, Ulf!  *relieved...* :)
Comment 9 Rich Megginson 2008-01-03 15:37:34 EST
Ulf, would you consider this bug to be verified in RHDS 8.0?
Comment 10 Ulf Weltman 2008-01-03 16:30:50 EST
Yes, looks fixed to me.

Note You need to log in before you can comment on or make changes to this bug.