Bug 1378254 - User and Group List on large LDAP data sets overloads Keystone
Summary: User and Group List on large LDAP data sets overloads Keystone
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-keystone
Version: 8.0 (Liberty)
Hardware: All
OS: Linux
unspecified
unspecified
Target Milestone: ---
: 10.0 (Newton)
Assignee: John Dennis
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-21 23:55 UTC by Sean Lee
Modified: 2020-09-07 21:44 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-24 17:29:07 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Sean Lee 2016-09-21 23:55:17 UTC
In environments with large amount of ldap users/groups (>50k entries), "openstack user/group list --domain MyAD" takes several minutes to return. Even though actual ldap query takes only a few seconds, keystone spends several minutes mapping each user to a public_id in keystone.id_mapping table.

It's a known issue and there is a patch upstream that uses cache to speed up subsequent lookups:

https://bugs.launchpad.net/keystone/+bug/1582585
https://bugs.launchpad.net/mos/+bug/1496840

If customers' ldap entries already contain globally unique attributes (e.g. employeeGUID, homegrownUUID etc), there should be an option to disable ID mapping completely. I'd like to propose the following patch:


--- /usr/lib/python2.7/site-packages/keystone/common/config.py.a
+++ /usr/lib/python2.7/site-packages/keystone/common/config.py.b
@@ -133,6 +133,14 @@
                          'values specific to the domain need to be specified '
                          'in this manner. This feature is disabled by '
                          'default; set to true to enable.'),
+        cfg.BoolOpt('disable_domain_id_mapping',
+                    default=False,
+                    help='Domain ID mapping is resource intensive '
+                         'and time consuming. If globally unique '
+                         'attributes are available, set user_id_attribute and '
+                         'group_id_attribute to globally unique attributes, '
+                         'and set this option to true to disable domain id '
+                         'mapping.'),
         cfg.BoolOpt('domain_configurations_from_database',
                     default=False,
                     help='Extract the domain specific configuration options '

--- /usr/lib/python2.7/site-packages/keystone/identity/core.py.a
+++ /usr/lib/python2.7/site-packages/keystone/identity/core.py.b
@@ -572,7 +572,7 @@
         """
         conf = CONF.identity
 
-        if not self._needs_post_processing(driver):
+        if not self._needs_post_processing(driver) or conf.disable_domain_id_mapping is True:
             # a classic case would be when running with a single SQL driver
             return ref
 

--- /etc/keystone/keystone.conf.a
+++ /etc/keystone/keystone.conf.b
@@ -819,6 +819,12 @@
 # if domain_specific_drivers_enabled is set to true. (string value)
 #domain_config_dir = /etc/keystone/domains
 
+# Domain ID mapping is resource intensive and time consuming. If globally
+# unique attributes are available, set user_id_attribute and 
+# group_id_attribute to globally unique attributes, and set this option to
+# true to disable domain id mapping. (boolean value)
+#disable_domain_id_mapping = false
+
 # Entrypoint for the identity backend driver in the keystone.identity
 # namespace. Supplied drivers are ldap and sql. (string value)
 #driver = sql

Comment 2 Sean Lee 2016-09-22 00:11:34 UTC
Without ID mapping:

# time openstack user list --domain MyAD

real	0m19.821s
user	0m12.711s
sys	0m0.215s

# time openstack group list --domain MyAD

real	0m11.525s
user	0m7.999s
sys	0m0.165s

(>70k users ; >20k groups)

With ID mapping, queries would take 3 to 5 minutes.

Comment 3 Adam Young 2016-09-22 17:18:20 UTC
Would it make more sense to disable user and group lists for LDAP backends?  

Many LDAP servers limit the number of results to 200 or so to avoid getting a denial-of-service attack on each query.  This is how other places worked around this issue.  I don't know if it is possible to put a window-size command like that into the LDAP query from the Keystone side, though.

It seems to me that user and group list are not useful functions with large datasets.  Disabling the ID mapping, while making it somewhat less painful, would have serious repercussions elsewhere in the OpenStack projects, as UserIds might then expand beyond the size of the columns used to store them.

Comment 4 Nathan Kinder 2016-09-22 20:20:31 UTC
(In reply to Adam Young from comment #3)
> Would it make more sense to disable user and group lists for LDAP backends?  
> 
> Many LDAP servers limit the number of results to 200 or so to avoid getting
> a denial-of-service attack on each query.  This is how other places worked
> around this issue.  I don't know if it is possible to put a window-size
> command like that into the LDAP query from the Keystone side, though.

Exceeding an LDAP sizelimit is treated like an error (and is what the customer was first encountering).  I don't think we can rely on using an LDAP sizelimit.

> 
> It seems to me that user and group list are not useful functions with large
> datasets.  Disabling the ID mapping, while making it somewhat less painful,
> would have serious repercussions elsewhere in the OpenStack projects, as
> UserIds might then expand beyond the size of the columns used to store them.

I definitely agree with this.  User enumeration is not valuable with large numbers of users.  What I am not sure of is how Horizon and CloudForms will deal with getting no users back from a functionality standpoint.  Ideally, they would have the ability to search for users via filters to limit the results instead of trying to list them all.  While this is something that can be explored for those front-end applications, Keystone will still need to allow user-list requests that can filter results.  The user-list API call does have a "name" request parameter that it says can filter the results, but it's unclear if it uses that as a substring search value or not.

I see a few possibilities (that are not mutually exclusive):

--------------------------------------------------------------------------
1 - Use a patch similar to what was proposed in this bug to improve the performance with the way the Keystone API and front-end applications work today.

2 - Allow Keystone to disable user-list operations that don't use a filter (perhaps by just returning 0 results, or some response that indicates that it refuses to service the request).  This includes ensuring that the filter functionality allows substring searching for users.

3 - Modify frontend applications (Horizon & CloudForms) to use filtered searches for listing users.
--------------------------------------------------------------------------


It seems like option 1 is viable in the short term, while options 2 and 3 are better medium-term solutions.

Comment 5 Adam Young 2016-11-01 14:32:20 UTC
The outcome of the design summit discussion was to allow two new configuration driven options in the Keystone Identity backend.

1.  Disable listing users for a given identity source.  This will not be LDAP specific, but will be per domain-specific backend, and thus LDAP will be treated separately from SQL.

2.  Even with the above option set, allow filtered listing of users if the LDAP backend is configured with a Filter.  This should support role assignments for users by searching for users when called from Horizon, CLoudForms etc.


Note You need to log in before you can comment on or make changes to this bug.