Bug 453011 - Poor memberOf performance for group deletion
Summary: Poor memberOf performance for group deletion
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: 389
Classification: Retired
Component: Server - memberOf Plug-in
Version: 1.1.1
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Nathan Kinder
QA Contact: Chandrasekar Kannan
URL:
Whiteboard:
Depends On:
Blocks: 249650 FDS112
TreeView+ depends on / blocked
 
Reported: 2008-06-26 16:07 UTC by Nathan Kinder
Modified: 2015-01-04 23:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-04 18:21:19 UTC
Embargoed:


Attachments (Terms of Use)
CVS Diffs (31.90 KB, patch)
2008-06-30 23:29 UTC, Nathan Kinder
no flags Details | Diff
Revised Diffs (32.58 KB, patch)
2008-07-01 18:13 UTC, Nathan Kinder
no flags Details | Diff
Re-Revised Diffs (32.80 KB, patch)
2008-07-01 20:56 UTC, Nathan Kinder
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0643 0 normal SHIPPED_LIVE ipa bug fix update 2008-08-04 18:20:50 UTC

Description Nathan Kinder 2008-06-26 16:07:51 UTC
I've observed some very poor performance when deleting groups that are part of a
nested hierarchy and using the memberOf plug-in.

For a looped grouping of n members, the following elapsed times for the
memberof_postop_del functions resulted (only 1 second resolution from log
timestamps):

   5 = 0 sec.
  10 = 4 sec.
  15 = 37 sec.
  20 = 216 sec.
  30 = 2723 sec.

Some further testing has shown that this isn't specific to their being a looped
grouping setup, but it is instead related to the number of superior and nested
groups relative to the group being deleted.  The worst performance is observed
when you delete the group in the middle of the nested chain, which is any group
when you have a nested loop.

The following times are from a straight line nesting of 31 groups with no loop
(g1->g2->...->g30->g31):

  del. group1 (30 nested, 0 superiors)= 17 sec.
  del. group16 (15 nested, 15 superiors) =  334 sec.
  del. group31 (0 nested, 30 superiors) = 0 sec.

This shows the problem is at it's worst when it has to traverse both up and down
the membership hierarchy.  It makes sense that the bottom group in the hierarchy
is very fast.  This group has no members, so it's treated just like a normal
user entry.

Comment 1 Nathan Kinder 2008-06-26 16:21:42 UTC
I traced through the code to find out why the performance is poor.  My notes on
this analysis are below.  These notes are wordy, and were mainly notes to
myself, so consider yourself warned.  The following was observed in a debugger
when deleting group5 in a 5 group nested setup:

We first get a pre-op copy of the entry being deleted (group3).  We then call
into memberof_del_dn_from_groups(), passing the dn of group3.  This ends up
calling the memberof_del_dn_type_callback() callback function for each entry who
has "member=group3", which should only be the group2 entry.  This callback
performs a mod on group2 that removes the "member=group3" value.

We then check if the entry being deleted has any member attributes.  If so, we
call memberof_del_attr_list() and pass in the dn of group3 and a copy of the
member attribute values, which is only the value of group4 in this case.  This
function will just go to the next value if the passed in dn and the value are
the same. This just ends up calling into memberof_mod_attr_list_r().

This function goes through each member attribute value that's passed into it,
calling into memberof_modop_one_r() for each, which just calls into
memberof_modop_one_replace_r().  The parameters passed into this function are
basically telling it to remove the "memberOf=group3" value from the group4
entry.  Before doing this though, we check if group4 has any member attributes.
 In our case, it does, so we start processing it, first checking if we've seen
this entry before to prevent a loop.  We fetch all of the member values from
group4 (only group5 in this case) and call memberof_mod_attr_list_r(), passing
it the dn of group3 and the stack for loop detection.

This ends up recursively calling into memberof_modop_one_replace_r() again, this
time passing in parameters to tell it to remove "memberOf=group3" from the
group5 entry.  We check if group5 has any member attributes, which it doesn't in
our case.  We make sure that the original group dn (group3) isn't the same as
the entry we are about to modify (group5).  Because we have groups listed in
stack, we end up doing a check to see if group5 is somehow a memberOf group3
through some other means (**** which doesn't make sense because group3 is being
completely deleted in our case ****).

We end up calling memberof_is_legit_member(), passing in the original group dn
and the group5 dn for op_to.  This function first tries to fetch the group3
entry, which doesn't exist anymore.  We don't have to do any work because of
this, so this function just cleans up and returns.

We end up calling memberof_is_member() to see if group5 is a member of group3
either directly or indirectly.  **** This seems pointless, and duplicated the
check we performed just previously.  How does this differ from
memberof_is_legit_member()? ****  We first check for direct membership with
memberof_is_direct_member(), which just tries to fetch the group (group3 in our
case) to see if it contains a member attribute for our entry.  In our case,
group3 doesn't exist, so it just returns.  We then do the more expensive
recursion check for indirect membership.

This indirect membership check first starts a new stack and pushes our member dn
onto it (group5).  We then do a search for "member=group5", which finds the
group4 entry.  We check if this match is the same as the value we are trying to
delete (group3), which it's not.  This causes us to recursively call
memberof_is_member_r() to see if group4 is a member of group3.  We push group4
onto the stack, then do a search for "member=group4" and find no matches.  We've
finished the chain here, so we end up returning to state that group5 is not a
member of group3 still, which we knew in the first place since this is a simple
delete.  We then finally delete the group3 memberOf value from the group5 entry.

At this point, we've left any inherited groups around in group5's memberOf
attribute, so we call memberof_test_membership_callback() to clean that up. 
This function goes through all of the memberOf values in the group5 entry, which
are group 1-4 (**** group3 should be gone already, but we're using an old copy
of the entry! ****).  These values are split up into two arrays (direct
membership and (other) candidate membership).  This determination is made by
memberof_is_direct_member(), which checks the group entry for a member attribute
equal to our entry.  We end up putting group4 into the membership array and all
others into the candidate array.  We then iterate through the direct membership
groups, checking to see if they are members of any of the candidate groups. 
This check is also done by memberof_is_direct_member(). **** Couldn't we just
trace up through our direct groups to generate a list of all inherited groups? 
We could then just replace all of our memberOf attributes with these values. 
The bad thing here is we are working off of our memberOf values when we should
only use member values.  A function that just generates an array of superior
groupdn's for a passed in  dn would be useful here (and elsewhere). ****  In our
case, group4 is not a member of any of the candidate groups.  This causes us to
loop through the remaining candidate list, calling memberof_del_one() for each
candidate to remove it's value from group5.

The first value we have to deal with here is group2.  This ends up calling into
memberof_modop_one_replace_r() to do the work.  We check if group5 has any
members we need to recurse into, which it doesn't.  We then end up calling into
memberof_is_member() to see if group5 is a member of group2 **** This is
unnecessary as we've already checked this! ****.  This calls the recursive
version of this function, which finds all entries with "member=group5", which is
only group4.  We recurse and check for all entries who match "member=group4",
which is no one, so we hit the end of the chain and exit.  We then finally
remove the "memberOf=group2" value from group5.  We then call
memberof_test_membership_callback() to clean up inherited groups that may have
been left around.

Like before, this is an old copy of the entry, so we have group1, group2, and
group4 listed as memberOf values in group5.  We do the whole split into members
and candidates again and end up calling memberof_del_one() to remove group2 ****
we already removed this, but we're using an old copy of the entry! ****  This
ends up putting us back in memberof_modop_one_replace_r() to remove
"memberOf=group2" from group5.  We end up going through all of the
memberof_is_member() stuff again and end up doing the delete of
"memberOf=group2" from group5 again.  This ends up with a rc=16, but that
doesn't cause us any harm.

We then try to fixup inherited groups again with
memberof_test_membership_callback(), this time with only group1 and group4
listed in our memberOf values.  This is actually accurate by chance even though
we're using an old copy of the entry.  We do the same stuff described above, and
call memberof_del_one() to remove the "memberOf=group1" value from the group5 entry.

We go into memberof_modop_one_replace_r() as normal, and end up calling into
memberof_is_member() to check if group5 is a member of group1.  This traces
through member attributes to find out what we already knew.  We then finally
delete the memberOf=group1 value from group5, then we all into
memberof_test_membership_callback() with the old copy of the entry.

This old copy of the entry has memberOf values for group1 and group4 still.  It
determines that we need to remove the "memberOf=group1" value, even though it
doesn't exist.  We end up back in memberOf_modop_one_replace_r() to do the
deletion, but we first go through the unnecessary indirect membership check by
calling memberof_is_member().  We then attempt to delete the "memberOf=group1"
value from group5, but it's already gone since we removed it before.  We then go
back into the check for abandoned groups by calling
memberof_test_membership_callback(), this time with the entry being correct and
only having a memberOf value of group4.  This function determines that it has
nothing to do this time, so it just returns.

At this point, we are done with the changes needed to the group5 entry, so a
bunch of the recursive calls clean up and return.  We end up back in one of our
memberof_test_membership_callback() calls, but this call thinks that we still
have more memberOf values in our candidate list to delete.  This ends up trying
to delete the "memberOf=group1" value again, with all of it's fixup and check
calls just like we did above.

We end up returning up a few more frames we recursed into, until we reach the
previous memberof_test_membership_callback() call.  This frame thinks we still
have to delete the group1 and group3 memberOf values from the group5 entry, so
it goes through the attempt to delete those non-existent values and the check
operations. We then return up through some frames until we land in our previous
call to memberof_modop_one_replace_r() that was attempting to remove the
"memberOf=group3" value from group4.

This call finished recursing through the group4 entry, so it goes through it's
processing that it does for non-group entries, which is to check if group4 is a
member of group3 still with memberof_is_member(), perform the actual modify
operation, then do the test membership call to fix up abandoned groups.  This
fixup will have to deal with group1, group2, and group3 calues to clean up, so
it ends up doing the exact same stuff that we had to do for the group5 entry. 
Once we are done with group4, the job is finished and we release the lock.

Comment 2 Nathan Kinder 2008-06-30 23:29:24 UTC
Created attachment 310628 [details]
CVS Diffs

I still want to stress these changes under replication load, but I thought I'd
get them attached here ahead of time.  The basic approach of this fix is to
just regenerate the entire memberOf attribute when we're updating a member
entry for a group deletion or add modification.  Using this method is much less
error prone than trying to be smart and handle only the changes needed for
every case.

With these changes, I've found that I can delete a group that is a part of a
100 group deep loop and the fixup is finished in 2 seconds on my machine. 
Attempting to do this previously made ns-slapd consume nearly all of my CPU for
3-4 hours before I killed it.

I'll descibe the changes in more detail after I finish stressing memberOf
updates with replication.

Comment 3 Andrey Ivanov 2008-07-01 08:58:33 UTC
After such a major change of the algorithm it would be also nice to re-verify
that the previous bugs of memberOf (memory leaks/inconsistencies) do not
reappear in the new version. Though i think that this simplification tends to
reduce the complexity of the code and eliminates the origin of these bugs.

Comment 4 Nathan Kinder 2008-07-01 14:45:56 UTC
(In reply to comment #3)
> After such a major change of the algorithm it would be also nice to re-verify
> that the previous bugs of memberOf (memory leaks/inconsistencies) do not
> reappear in the new version. Though i think that this simplification tends to
> reduce the complexity of the code and eliminates the origin of these bugs.

Yep, I've already re-verified all of those issues as well as extensive valgrind
testing.  You are correct about these changes eliminating the complexity of the
code, and with it, the source of most of the previous bugs.



Comment 5 Rich Megginson 2008-07-01 16:58:35 UTC
Looks good.

In the case of recursive membership detected -
https://bugzilla.redhat.com/attachment.cgi?id=310628&action=diff#ldap/servers/plugins/memberof/memberof.c_sec11

Do we print out a detailed message somewhere e.g. group member dn [%s] is
recursively included in group [%s] - please edit your group configuration ?

That is, it would be nice to tell admins which member is recursive in which group.

Comment 6 Noriko Hosoi 2008-07-01 17:22:11 UTC
Looks good to me, too!

It'd be nice, if you could add a short comment on how the entire procedure is
protected from other threads!

Comment 7 Andrey Ivanov 2008-07-01 17:33:05 UTC
There is something that i don't quite like about the memberOf plug-in as it is
today - it changes the 'modifiersName' attribute to the plug-in name (and
'modifyTimestamp'). I know it should be like that but if we could have an option
in the plug-in config to disable writing into these operational atributes it
would be a nice feature (it would simplify the auditing of who was the last
person to touch the entry).

Comment 8 Rich Megginson 2008-07-01 17:42:41 UTC
(In reply to comment #7)
> There is something that i don't quite like about the memberOf plug-in as it is
> today - it changes the 'modifiersName' attribute to the plug-in name (and
> 'modifyTimestamp'). I know it should be like that but if we could have an option
> in the plug-in config to disable writing into these operational atributes it
> would be a nice feature (it would simplify the auditing of who was the last
> person to touch the entry).

We have this issue in general with operations that invoke other internal
operations - see https://bugzilla.redhat.com/show_bug.cgi?id=249327

I'd rather solve this issue in a general purpose way than to do something
specific for memberOf.

Comment 9 Nathan Kinder 2008-07-01 18:01:00 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > There is something that i don't quite like about the memberOf plug-in as it is
> > today - it changes the 'modifiersName' attribute to the plug-in name (and
> > 'modifyTimestamp'). I know it should be like that but if we could have an option
> > in the plug-in config to disable writing into these operational atributes it
> > would be a nice feature (it would simplify the auditing of who was the last
> > person to touch the entry).
> 
> We have this issue in general with operations that invoke other internal
> operations - see https://bugzilla.redhat.com/show_bug.cgi?id=249327
> 
> I'd rather solve this issue in a general purpose way than to do something
> specific for memberOf.

Agreed.  This is a general server issue that needs to be addressed in a general
purpose way.

Andrey - Would you open a bug to track this side issue?

Comment 10 Nathan Kinder 2008-07-01 18:13:54 UTC
Created attachment 310698 [details]
Revised Diffs

Here's a minor revision to the previous patch.	The only changes from the
previous patch are a bit of refactoring of a structure (added memberof_ prefix)
and a change in the log-level for a message about detecting recursive groups.

Comment 11 Nathan Kinder 2008-07-01 18:25:04 UTC
(In reply to comment #5)
> Looks good.
> 
> In the case of recursive membership detected -
>
https://bugzilla.redhat.com/attachment.cgi?id=310628&action=diff#ldap/servers/plugins/memberof/memberof.c_sec11
> 
> Do we print out a detailed message somewhere e.g. group member dn [%s] is
> recursively included in group [%s] - please edit your group configuration ?
> 
> That is, it would be nice to tell admins which member is recursive in which group.

I'll have to see if there's an easy way to do that.  We basically keep a linked
list that we just add groups to as we process them, but we don't stash any info
away about how we came across that group the first time around.  I can't think
of an easy way to get that data without adding it to the struct we use for the
linked list, but I'll take a look to see if there's some other way.

Comment 12 Rich Megginson 2008-07-01 18:49:18 UTC
Ok.  I just worry about trying to support this, if a user complains that "the
error log says group X has a recursive definition but I don't know where to
begin looking".

Comment 13 Nathan Kinder 2008-07-01 20:49:24 UTC
(In reply to comment #12)
> Ok.  I just worry about trying to support this, if a user complains that "the
> error log says group X has a recursive definition but I don't know where to
> begin looking".

I don't see an easy way to deal with this in the existing code, particularly in
memberof_get_groups_callback() since we use a Slapi_ValueSet instead of the
plug-in's own linked list struct that we could add a parentGroup member to.

It should be fairly easy for someone to find the source of the recursive
grouping from the information we currently log though.  We log the DN of the
group we encounter twice, so one could search upwards in the group hierarchy by
searching for "member=<group>", then searching for "member=<match>" where
"<match>" is each  matching group from the first search.  If this chain is
followed, you will eventually run into the original group again, which would
allow you to determine where the grouping recurses.


Comment 14 Nathan Kinder 2008-07-01 20:56:56 UTC
Created attachment 310721 [details]
Re-Revised Diffs

Rich's question made me think about one of the log messages about detecting
recursion.  I've modified the log message to say that a possible group
recursion was detected in one spot since that code can be hit any case where we
try to process a group a second time.  This can happen when an entry is a
member of a group through multiple paths.  The other two spots where we print
about detecting a recursive group can remain since they only get triggered by
true group recursion.

Comment 15 Nathan Kinder 2008-07-01 22:14:42 UTC
My replication stress tests didn't expose any problems.

I wanted to expand a bit on the description of the new behaviour.  The old code
did many individual modify operations.  We basically did a separate operation
for every memberOf value change in an entry.  This gives the entry a
half-completed membership result if a client tries to query for membership when
a memberOf operation is in progress (which is very likely with the poor
performance issues).

The new code basically figures out all of the memberOf values that should be
present in an entry by checking member values throughout the tree.  Once this
list is generated, a single replace operation is performed to replace any
previous memberOf values in an entry.  The approach of using a single modify
operation per member entry is one of the reasons that this approach performs
well.  It also prevents any clients from getting a partially updated entry when
a memberOf operation is in progress.  This method also has the side effect of
fixing up member entries as membership changes related to that entry are made. 
This will fix up any membership inconsistency that previously existed in entries.

Comment 16 Nathan Kinder 2008-07-01 22:33:42 UTC
Checking in ldap/ldif/template-dse.ldif.in;
/cvs/dirsec/ldapserver/ldap/ldif/template-dse.ldif.in,v  <--  template-dse.ldif.in
new revision: 1.9; previous revision: 1.8
done
Checking in ldap/servers/plugins/memberof/memberof.c;
/cvs/dirsec/ldapserver/ldap/servers/plugins/memberof/memberof.c,v  <--  memberof.c
new revision: 1.11; previous revision: 1.10
done
Checking in ldap/servers/plugins/memberof/memberof.h;
/cvs/dirsec/ldapserver/ldap/servers/plugins/memberof/memberof.h,v  <--  memberof.h
new revision: 1.2; previous revision: 1.1
done
Checking in ldap/servers/plugins/memberof/memberof_config.c;
/cvs/dirsec/ldapserver/ldap/servers/plugins/memberof/memberof_config.c,v  <-- 
memberof_config.c
new revision: 1.2; previous revision: 1.1
done

Comment 20 errata-xmlrpc 2008-08-04 18:21:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0643.html


Note You need to log in before you can comment on or make changes to this bug.