I've observed some very poor performance when deleting groups that are part of a nested hierarchy and using the memberOf plug-in. For a looped grouping of n members, the following elapsed times for the memberof_postop_del functions resulted (only 1 second resolution from log timestamps): 5 = 0 sec. 10 = 4 sec. 15 = 37 sec. 20 = 216 sec. 30 = 2723 sec. Some further testing has shown that this isn't specific to their being a looped grouping setup, but it is instead related to the number of superior and nested groups relative to the group being deleted. The worst performance is observed when you delete the group in the middle of the nested chain, which is any group when you have a nested loop. The following times are from a straight line nesting of 31 groups with no loop (g1->g2->...->g30->g31): del. group1 (30 nested, 0 superiors)= 17 sec. del. group16 (15 nested, 15 superiors) = 334 sec. del. group31 (0 nested, 30 superiors) = 0 sec. This shows the problem is at it's worst when it has to traverse both up and down the membership hierarchy. It makes sense that the bottom group in the hierarchy is very fast. This group has no members, so it's treated just like a normal user entry.
I traced through the code to find out why the performance is poor. My notes on this analysis are below. These notes are wordy, and were mainly notes to myself, so consider yourself warned. The following was observed in a debugger when deleting group5 in a 5 group nested setup: We first get a pre-op copy of the entry being deleted (group3). We then call into memberof_del_dn_from_groups(), passing the dn of group3. This ends up calling the memberof_del_dn_type_callback() callback function for each entry who has "member=group3", which should only be the group2 entry. This callback performs a mod on group2 that removes the "member=group3" value. We then check if the entry being deleted has any member attributes. If so, we call memberof_del_attr_list() and pass in the dn of group3 and a copy of the member attribute values, which is only the value of group4 in this case. This function will just go to the next value if the passed in dn and the value are the same. This just ends up calling into memberof_mod_attr_list_r(). This function goes through each member attribute value that's passed into it, calling into memberof_modop_one_r() for each, which just calls into memberof_modop_one_replace_r(). The parameters passed into this function are basically telling it to remove the "memberOf=group3" value from the group4 entry. Before doing this though, we check if group4 has any member attributes. In our case, it does, so we start processing it, first checking if we've seen this entry before to prevent a loop. We fetch all of the member values from group4 (only group5 in this case) and call memberof_mod_attr_list_r(), passing it the dn of group3 and the stack for loop detection. This ends up recursively calling into memberof_modop_one_replace_r() again, this time passing in parameters to tell it to remove "memberOf=group3" from the group5 entry. We check if group5 has any member attributes, which it doesn't in our case. We make sure that the original group dn (group3) isn't the same as the entry we are about to modify (group5). Because we have groups listed in stack, we end up doing a check to see if group5 is somehow a memberOf group3 through some other means (**** which doesn't make sense because group3 is being completely deleted in our case ****). We end up calling memberof_is_legit_member(), passing in the original group dn and the group5 dn for op_to. This function first tries to fetch the group3 entry, which doesn't exist anymore. We don't have to do any work because of this, so this function just cleans up and returns. We end up calling memberof_is_member() to see if group5 is a member of group3 either directly or indirectly. **** This seems pointless, and duplicated the check we performed just previously. How does this differ from memberof_is_legit_member()? **** We first check for direct membership with memberof_is_direct_member(), which just tries to fetch the group (group3 in our case) to see if it contains a member attribute for our entry. In our case, group3 doesn't exist, so it just returns. We then do the more expensive recursion check for indirect membership. This indirect membership check first starts a new stack and pushes our member dn onto it (group5). We then do a search for "member=group5", which finds the group4 entry. We check if this match is the same as the value we are trying to delete (group3), which it's not. This causes us to recursively call memberof_is_member_r() to see if group4 is a member of group3. We push group4 onto the stack, then do a search for "member=group4" and find no matches. We've finished the chain here, so we end up returning to state that group5 is not a member of group3 still, which we knew in the first place since this is a simple delete. We then finally delete the group3 memberOf value from the group5 entry. At this point, we've left any inherited groups around in group5's memberOf attribute, so we call memberof_test_membership_callback() to clean that up. This function goes through all of the memberOf values in the group5 entry, which are group 1-4 (**** group3 should be gone already, but we're using an old copy of the entry! ****). These values are split up into two arrays (direct membership and (other) candidate membership). This determination is made by memberof_is_direct_member(), which checks the group entry for a member attribute equal to our entry. We end up putting group4 into the membership array and all others into the candidate array. We then iterate through the direct membership groups, checking to see if they are members of any of the candidate groups. This check is also done by memberof_is_direct_member(). **** Couldn't we just trace up through our direct groups to generate a list of all inherited groups? We could then just replace all of our memberOf attributes with these values. The bad thing here is we are working off of our memberOf values when we should only use member values. A function that just generates an array of superior groupdn's for a passed in dn would be useful here (and elsewhere). **** In our case, group4 is not a member of any of the candidate groups. This causes us to loop through the remaining candidate list, calling memberof_del_one() for each candidate to remove it's value from group5. The first value we have to deal with here is group2. This ends up calling into memberof_modop_one_replace_r() to do the work. We check if group5 has any members we need to recurse into, which it doesn't. We then end up calling into memberof_is_member() to see if group5 is a member of group2 **** This is unnecessary as we've already checked this! ****. This calls the recursive version of this function, which finds all entries with "member=group5", which is only group4. We recurse and check for all entries who match "member=group4", which is no one, so we hit the end of the chain and exit. We then finally remove the "memberOf=group2" value from group5. We then call memberof_test_membership_callback() to clean up inherited groups that may have been left around. Like before, this is an old copy of the entry, so we have group1, group2, and group4 listed as memberOf values in group5. We do the whole split into members and candidates again and end up calling memberof_del_one() to remove group2 **** we already removed this, but we're using an old copy of the entry! **** This ends up putting us back in memberof_modop_one_replace_r() to remove "memberOf=group2" from group5. We end up going through all of the memberof_is_member() stuff again and end up doing the delete of "memberOf=group2" from group5 again. This ends up with a rc=16, but that doesn't cause us any harm. We then try to fixup inherited groups again with memberof_test_membership_callback(), this time with only group1 and group4 listed in our memberOf values. This is actually accurate by chance even though we're using an old copy of the entry. We do the same stuff described above, and call memberof_del_one() to remove the "memberOf=group1" value from the group5 entry. We go into memberof_modop_one_replace_r() as normal, and end up calling into memberof_is_member() to check if group5 is a member of group1. This traces through member attributes to find out what we already knew. We then finally delete the memberOf=group1 value from group5, then we all into memberof_test_membership_callback() with the old copy of the entry. This old copy of the entry has memberOf values for group1 and group4 still. It determines that we need to remove the "memberOf=group1" value, even though it doesn't exist. We end up back in memberOf_modop_one_replace_r() to do the deletion, but we first go through the unnecessary indirect membership check by calling memberof_is_member(). We then attempt to delete the "memberOf=group1" value from group5, but it's already gone since we removed it before. We then go back into the check for abandoned groups by calling memberof_test_membership_callback(), this time with the entry being correct and only having a memberOf value of group4. This function determines that it has nothing to do this time, so it just returns. At this point, we are done with the changes needed to the group5 entry, so a bunch of the recursive calls clean up and return. We end up back in one of our memberof_test_membership_callback() calls, but this call thinks that we still have more memberOf values in our candidate list to delete. This ends up trying to delete the "memberOf=group1" value again, with all of it's fixup and check calls just like we did above. We end up returning up a few more frames we recursed into, until we reach the previous memberof_test_membership_callback() call. This frame thinks we still have to delete the group1 and group3 memberOf values from the group5 entry, so it goes through the attempt to delete those non-existent values and the check operations. We then return up through some frames until we land in our previous call to memberof_modop_one_replace_r() that was attempting to remove the "memberOf=group3" value from group4. This call finished recursing through the group4 entry, so it goes through it's processing that it does for non-group entries, which is to check if group4 is a member of group3 still with memberof_is_member(), perform the actual modify operation, then do the test membership call to fix up abandoned groups. This fixup will have to deal with group1, group2, and group3 calues to clean up, so it ends up doing the exact same stuff that we had to do for the group5 entry. Once we are done with group4, the job is finished and we release the lock.
Created attachment 310628 [details] CVS Diffs I still want to stress these changes under replication load, but I thought I'd get them attached here ahead of time. The basic approach of this fix is to just regenerate the entire memberOf attribute when we're updating a member entry for a group deletion or add modification. Using this method is much less error prone than trying to be smart and handle only the changes needed for every case. With these changes, I've found that I can delete a group that is a part of a 100 group deep loop and the fixup is finished in 2 seconds on my machine. Attempting to do this previously made ns-slapd consume nearly all of my CPU for 3-4 hours before I killed it. I'll descibe the changes in more detail after I finish stressing memberOf updates with replication.
After such a major change of the algorithm it would be also nice to re-verify that the previous bugs of memberOf (memory leaks/inconsistencies) do not reappear in the new version. Though i think that this simplification tends to reduce the complexity of the code and eliminates the origin of these bugs.
(In reply to comment #3) > After such a major change of the algorithm it would be also nice to re-verify > that the previous bugs of memberOf (memory leaks/inconsistencies) do not > reappear in the new version. Though i think that this simplification tends to > reduce the complexity of the code and eliminates the origin of these bugs. Yep, I've already re-verified all of those issues as well as extensive valgrind testing. You are correct about these changes eliminating the complexity of the code, and with it, the source of most of the previous bugs.
Looks good. In the case of recursive membership detected - https://bugzilla.redhat.com/attachment.cgi?id=310628&action=diff#ldap/servers/plugins/memberof/memberof.c_sec11 Do we print out a detailed message somewhere e.g. group member dn [%s] is recursively included in group [%s] - please edit your group configuration ? That is, it would be nice to tell admins which member is recursive in which group.
Looks good to me, too! It'd be nice, if you could add a short comment on how the entire procedure is protected from other threads!
There is something that i don't quite like about the memberOf plug-in as it is today - it changes the 'modifiersName' attribute to the plug-in name (and 'modifyTimestamp'). I know it should be like that but if we could have an option in the plug-in config to disable writing into these operational atributes it would be a nice feature (it would simplify the auditing of who was the last person to touch the entry).
(In reply to comment #7) > There is something that i don't quite like about the memberOf plug-in as it is > today - it changes the 'modifiersName' attribute to the plug-in name (and > 'modifyTimestamp'). I know it should be like that but if we could have an option > in the plug-in config to disable writing into these operational atributes it > would be a nice feature (it would simplify the auditing of who was the last > person to touch the entry). We have this issue in general with operations that invoke other internal operations - see https://bugzilla.redhat.com/show_bug.cgi?id=249327 I'd rather solve this issue in a general purpose way than to do something specific for memberOf.
(In reply to comment #8) > (In reply to comment #7) > > There is something that i don't quite like about the memberOf plug-in as it is > > today - it changes the 'modifiersName' attribute to the plug-in name (and > > 'modifyTimestamp'). I know it should be like that but if we could have an option > > in the plug-in config to disable writing into these operational atributes it > > would be a nice feature (it would simplify the auditing of who was the last > > person to touch the entry). > > We have this issue in general with operations that invoke other internal > operations - see https://bugzilla.redhat.com/show_bug.cgi?id=249327 > > I'd rather solve this issue in a general purpose way than to do something > specific for memberOf. Agreed. This is a general server issue that needs to be addressed in a general purpose way. Andrey - Would you open a bug to track this side issue?
Created attachment 310698 [details] Revised Diffs Here's a minor revision to the previous patch. The only changes from the previous patch are a bit of refactoring of a structure (added memberof_ prefix) and a change in the log-level for a message about detecting recursive groups.
(In reply to comment #5) > Looks good. > > In the case of recursive membership detected - > https://bugzilla.redhat.com/attachment.cgi?id=310628&action=diff#ldap/servers/plugins/memberof/memberof.c_sec11 > > Do we print out a detailed message somewhere e.g. group member dn [%s] is > recursively included in group [%s] - please edit your group configuration ? > > That is, it would be nice to tell admins which member is recursive in which group. I'll have to see if there's an easy way to do that. We basically keep a linked list that we just add groups to as we process them, but we don't stash any info away about how we came across that group the first time around. I can't think of an easy way to get that data without adding it to the struct we use for the linked list, but I'll take a look to see if there's some other way.
Ok. I just worry about trying to support this, if a user complains that "the error log says group X has a recursive definition but I don't know where to begin looking".
(In reply to comment #12) > Ok. I just worry about trying to support this, if a user complains that "the > error log says group X has a recursive definition but I don't know where to > begin looking". I don't see an easy way to deal with this in the existing code, particularly in memberof_get_groups_callback() since we use a Slapi_ValueSet instead of the plug-in's own linked list struct that we could add a parentGroup member to. It should be fairly easy for someone to find the source of the recursive grouping from the information we currently log though. We log the DN of the group we encounter twice, so one could search upwards in the group hierarchy by searching for "member=<group>", then searching for "member=<match>" where "<match>" is each matching group from the first search. If this chain is followed, you will eventually run into the original group again, which would allow you to determine where the grouping recurses.
Created attachment 310721 [details] Re-Revised Diffs Rich's question made me think about one of the log messages about detecting recursion. I've modified the log message to say that a possible group recursion was detected in one spot since that code can be hit any case where we try to process a group a second time. This can happen when an entry is a member of a group through multiple paths. The other two spots where we print about detecting a recursive group can remain since they only get triggered by true group recursion.
My replication stress tests didn't expose any problems. I wanted to expand a bit on the description of the new behaviour. The old code did many individual modify operations. We basically did a separate operation for every memberOf value change in an entry. This gives the entry a half-completed membership result if a client tries to query for membership when a memberOf operation is in progress (which is very likely with the poor performance issues). The new code basically figures out all of the memberOf values that should be present in an entry by checking member values throughout the tree. Once this list is generated, a single replace operation is performed to replace any previous memberOf values in an entry. The approach of using a single modify operation per member entry is one of the reasons that this approach performs well. It also prevents any clients from getting a partially updated entry when a memberOf operation is in progress. This method also has the side effect of fixing up member entries as membership changes related to that entry are made. This will fix up any membership inconsistency that previously existed in entries.
Checking in ldap/ldif/template-dse.ldif.in; /cvs/dirsec/ldapserver/ldap/ldif/template-dse.ldif.in,v <-- template-dse.ldif.in new revision: 1.9; previous revision: 1.8 done Checking in ldap/servers/plugins/memberof/memberof.c; /cvs/dirsec/ldapserver/ldap/servers/plugins/memberof/memberof.c,v <-- memberof.c new revision: 1.11; previous revision: 1.10 done Checking in ldap/servers/plugins/memberof/memberof.h; /cvs/dirsec/ldapserver/ldap/servers/plugins/memberof/memberof.h,v <-- memberof.h new revision: 1.2; previous revision: 1.1 done Checking in ldap/servers/plugins/memberof/memberof_config.c; /cvs/dirsec/ldapserver/ldap/servers/plugins/memberof/memberof_config.c,v <-- memberof_config.c new revision: 1.2; previous revision: 1.1 done
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0643.html