Bug 846333 - IPA system acting lethargic meaning slow, unresponsive and unusable at times running Command: "ipa sudorule-add-user sudo_rulename --groups=usergroupname"
IPA system acting lethargic meaning slow, unresponsive and unusable at times ...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ipa (Show other bugs)
7.0
x86_64 Linux
medium Severity unspecified
: rc
: ---
Assigned To: Rob Crittenden
Michael Gregg
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-07 09:28 EDT by baiesi
Modified: 2014-08-05 07:18 EDT (History)
5 users (show)

See Also:
Fixed In Version: ipa-3.2.1-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 05:37:57 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description baiesi 2012-08-07 09:28:43 EDT
Issue:
IPA system acting lethargic meaning slow, unresponsive and unusable at times after running Command: "ipa sudorule-add-user sudo_rulename --groups=usergroupname"

Overview:
I'm currently setting up a test environment to perform some authentication, administration and runtime load against an IPA test environment on some higher end machines. Once the test environment gets setup the plan is to run these tasks 24/7 at various load levels to test the reliability of IPA system under test.  While building the Sudo config environment for IPA the system gets into this slow then unresponsive and unusable state. The issue is inadvertently causing dos issues.  Is it possible the amount of users in my user/groups are causing the problems and reducing the amount from 1000 to 100 may solve the issue temporarily and allow me to continue?  Can an administrator increase the query limits to subside the error messages posted indicating ""ERROR: limits exceeded for this query"

Task Script:
The script causing the ipa server havic is responsible for building the proper sudo objects to get the system configured.  The script is single threaded python script calling the IPA cli in a sequence defined below.  The command that delays, generates errors when executing is "ipa sudorule-add-user".  The command initially had 5 user/groups defined in the command line so I reduced it to 1 and called it 5 separate times, one for each new user group.  Made no difference...  

rpm -qi 389-ds-base
Name        : 389-ds-base                  Relocations: (not relocatable)
Version     : 1.2.10.2                          Vendor: Red Hat, Inc.
Release     : 19.el6_3                      Build Date: Wed 20 Jun 2012 05:11:42 PM EDT
Install Date: Wed 01 Aug 2012 11:00:01 AM EDT      Build Host: x86-008.build.bos.redhat.com
Group       : System Environment/Daemons    Source RPM: 389-ds-base-1.2.10.2-19.el6_3.src.rpm
Size        : 4854889                          License: GPLv2 with exceptions
Signature   : (none)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL         : http://port389.org/
Summary     : 389 Directory Server (base)

-Env Preconditions:
10K users exist
10 User groups exist  (1k per group)
1550 sudo commands (/usr/bin/*)
100 sudo groups

Script that build the Sudo rules via cli"
for 1 to 10:
-Add SudoRule
-Add SudoRule Hosts (qty2)
-Add SudoRule UserGroups1
-Add SudoRule UserGroups2
-Add SudoRule UserGroups3
-Add SudoRule UserGroups4
-Add SudoRule UserGroups5
-Add SudoRule Allow Command Groups (qty5)
-Add SudoRule Deny Command Groups (qty5)


Is it Repeatable:
Yes consistent

Symptoms:
-Sudo cli client script:
Once the scipt starts it will inevitably have issues running the command "ipa sudorule-add-user".  A delay of up to 7 minutes may incur and messages indicating "ERROR: limits exceeded for this query" get generated.  I never made it past the creation of 7 sudo rules since it was taking to long and errors were posting themselves, so I terminated the script.

-IPA User Interface:
Once the script starts the UI inevitably becomes unusable.  Have been getting ui dialogs indicating, limits exceeded for this query and Internal server errors.  At this point the UI is inoperable.

-Kinit:
Kinit fails to connect to allow me to authenticate in this state. kinit: Cannot contact any KDC for realm 'TESTRELM.COM' while getting initial credentials.

-CPU:
Once the scipt starts the ns-slapd process jumps up to and over 100% at the target Ipa master and master rep server.
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                    
20151 dirsrv    20   0 3069m 650m  20m S 99.2  4.1 172:11.83 ns-slapd    

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                    
1114 dirsrv    20   0 3069m 404m  20m S 589.6  2.5 125:13.71 ns-slapd                                   

-IPA Master DirSec Error Log Snip:
06/Aug/2012:14:47:56 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 110 (Connection timed out)
[06/Aug/2012:14:47:56 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server)
[06/Aug/2012:14:47:56 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ((null))
[06/Aug/2012:14:50:51 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth resumed
[06/Aug/2012:14:53:02 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Timed out sending update operation to consumer (uniqueid c258d222-dc2f11e1-b897d99e-aafdf81b, CSN 501fe82f000400040000): Timeout.
[06/Aug/2012:14:58:04 -0400] - repl5_inc_waitfor_async_results timed out waiting for responses: 4437 4795
[06/Aug/2012:15:00:04 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Warning: unable to send endReplication extended operation (Timed out)
[06/Aug/2012:15:00:27 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: An invalid name was supplied (Hostname cannot be canonicalized)) errno 110 (Connection timed out)
[06/Aug/2012:15:00:27 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -2 (Local error)
[06/Aug/2012:15:00:27 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: An invalid name was supplied (Hostname cannot be canonicalized))
[06/Aug/2012:15:00:28 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth resumed
[06/Aug/2012:15:03:33 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 110 (Connection timed out)
[06/Aug/2012:15:03:33 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server)
[06/Aug/2012:15:03:33 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth failed: LDAP error -1 (Can't contact LDAP server) ((null))
[06/Aug/2012:15:06:45 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 110 (Connection timed out)
[06/Aug/2012:15:06:45 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP server)
[06/Aug/2012:15:11:53 -0400] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Cannot contact any KDC for realm 'TESTRELM.COM')) errno 115 (Operation now in progress)
[06/Aug/2012:15:11:53 -0400] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: error -2 (Local error)
[06/Aug/2012:15:11:53 -0400] NSMMReplicationPlugin - agmt="cn=meTosti-high-2.testrelm.com" (sti-high-2:389): Replication bind with GSSAPI auth failed: LDAP error -2 (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure.  Minor code may provide more information (Cannot contact any KDC for realm 'TESTRELM.COM'))
...
...

Work Around the issues:
If there is not a magical config setting to resolve this issue, I will attempt to work around the issue by reducing the amount of users existing in a users group.  I reduce the users from 1000 to 100 and give it a go.


**Test Hardware:
Ipa Server 1&2 = Linux sti-high-1.testrelm.com 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
Ipa Client 1&2 = Linux sti-high-3.testrelm.com 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    4
CPU socket(s):         2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Stepping:              2
CPU MHz:               1596.000
BogoMIPS:              4787.82
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15

Memory:
             total       used       free     shared    buffers     cached
Mem:      16316084    4678128   11637956          0     199440    2701480
-/+ buffers/cache:    1777208   14538876
Swap:      8224760          0    8224760
Comment 2 Dmitri Pal 2012-08-07 09:53:15 EDT
Upstream ticket:
https://fedorahosted.org/freeipa/ticket/2978
Comment 3 Kaleem 2012-08-13 04:26:31 EDT
I am also facing similar behaviour with "ipa-migrate-ds" with 10000 users and 12 groups.Encountered following error message

"ipa: ERROR: limits exceeded for this query". 

extract from automation log:
===========================

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Migration 10000 users and 12 groups
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [11:05:51] ::  slapd pid : 3479
:: [11:05:51] ::  Before migration free memory : 1011798016
:: [11:05:51] ::  Before migration slapd VmRSS : 31932kB
:: [11:05:51] ::  Before migration slapd VmHWM : 43176kB
:: [11:05:52] ::  ======================= Migration started: Tue Aug  7 11:05:52 EDT 2012 ========================
:: [11:05:52] ::  EXECUTING: time -p echo ######### | ipa migrate-ds --with-compat ldap://f17-ipa4.testrelm.com:389
ipa: ERROR: limits exceeded for this query
:: [   FAIL   ] :: Migration did not complete successfully.
:: [11:07:30] ::  ======================= Migration finished: Tue Aug  7 11:07:30 EDT 2012 ========================
:: [11:07:30] ::  slapd pid : 3479
:: [11:07:30] ::  After migration free memory : 720154624
:: [11:07:31] ::  After migration slapd VmRSS : 47196kB
:: [11:07:31] ::  After migration slapd VmHWM : 47196kB

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

I have uploaded access/error logs of dirsrv to http://10.65.201.199/ds_logs/
Comment 8 Rob Crittenden 2012-11-26 13:27:49 EST
The expectation is that this will be resolved by 389-ds 1.3 with transactions enabled.

Fixed upstream by enabling transactions by default.

master: f1f1b4e7f2e9c1838ad7ec76002b78ca0c2a3c46
Comment 12 Michael Gregg 2014-02-17 19:14:59 EST
Verified against ipa-server-3.3.3-17.el7.x86_64

I have a system that has been set up to run for sti tests. 

The machine had 10k users on it those users into 10 user groups. There are 1127 sudo commands from /bin.  

The machine has been running since Friday, with periodic user-add and user-mod commands running. 


On Monday the systems seems to be preforming fairly well:

[root@nu1 shm]# time ipa user-find uub3218
--------------
1 user matched
--------------
  User login: uub3218
  First name: f3218
  Last name: l3218
  Home directory: /home/uub3218
  Login shell: /bin/sh
  Email address: uub3218@testrelm.test
  UID: 1365604252
  GID: 1365604252
  Account disabled: False
  Password: False
  Kerberos keys available: False
----------------------------
Number of entries returned 1
----------------------------

real	0m0.571s
user	0m0.320s
sys	0m0.046s
Comment 13 Ludek Smid 2014-06-13 05:37:57 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.