Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1075132

Summary: Excessive LDAP calls by ipa-sam during file operations to samba file share on freeipa master cause high CPU and slow performance
Product: Red Hat Enterprise Linux 7 Reporter: Dmitri Pal <dpal>
Component: ipaAssignee: Martin Kosek <mkosek>
Status: CLOSED CURRENTRELEASE QA Contact: Namita Soman <nsoman>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.0CC: abokovoy, asn, devel, mkosek, rcritten, spoore
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: ipa-3.3.3-22.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1074314 Environment:
Last Closed: 2014-06-13 13:04:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1074314    
Bug Blocks: 1073810    

Comment 2 Dmitri Pal 2014-03-11 20:07:23 UTC
Upstream ticket:
https://fedorahosted.org/freeipa/ticket/4241

Comment 5 Scott Poore 2014-03-14 20:25:13 UTC
Need to get confirmation that this is a valid verification test:

ON VERSION ipa-server-3.3.3-19.el7.x86_64:

/[root@master freeipa]# rm -rf /bz1075132/*
[root@master freeipa]# net cache flush
[root@master freeipa]# net conf setparm global 'log level' 10
[root@master freeipa]# for i in /var/log/samba/log.*; do > $i; done
[root@master freeipa]# cd /source/freeipa
[root@master freeipa]# time echo -e 'recurse\nprompt n\nmput *'|\
>     smbclient -k //master.ipa1.example.test/share > /tmp/smbclient.out 2>&1

real	0m55.811s
user	0m0.493s
sys	0m0.665s
[root@master freeipa]# grep ldap_search /var/log/samba/* 2>/dev/null | wc -l
9778

Large number of ldap_search entries in logs.

Now, I upgrade ipa-server to newer/fixed version:

ON VERSION ipa-server-3.3.3-25.el7.x86_64:

[root@master freeipa]# rm -rf /bz1075132/*
[root@master freeipa]# net cache flush
Couldn't delete entry! key = IDMAP/SID2XID/S-1-5-2
Couldn't delete entry! key = IDMAP/SID2XID/S-1-5-32-545
Couldn't delete entry! key = IDMAP/SID2XID/S-1-1-0
[root@master freeipa]# net conf setparm global 'log level' 10
[root@master freeipa]# for i in /var/log/samba/log.*; do > $i; done
[root@master freeipa]# cd /source/freeipa
[root@master freeipa]# time echo -e 'recurse\nprompt n\nmput *'|\
>     smbclient -k //master.ipa1.example.test/share > /tmp/smbclient.out 2>&1

real	0m50.562s
user	0m0.508s
sys	0m0.676s
[root@master freeipa]# grep ldap_search /var/log/samba/* 2>/dev/null | wc -l
98

About 1% of the number of ldap_search entries found.  Running time not much different but, I'm running it on a limited VM.

Comment 6 Jason Woods 2014-03-14 20:46:23 UTC
Hi Scott,
CPU was very high maybe that's easier to measure? Or not heh. For me the slow performance was as a direct result of the CPU hitting the roof.
Maybe iostat while running test?
For me after the patch the CPU never went above 5% where before it was mostly 80-100% on a single core.
Jason

Comment 7 Scott Poore 2014-03-14 21:56:42 UTC
Jason,

How large a sample for file transfer did you use?  

Did the sample included files owned by many different users?

CPU didn't spike when I tried this on the 3.3.3-19 version.  My sample for the transfer was a clone of the freeipa git repo.  That was only 60MB though I think and I only had one non-root user own it with a different group.

Maybe that's my problem?  Too small a sample and too few users/groups to lookup?

Thanks,
Scott

Comment 8 Jason Woods 2014-03-16 19:58:53 UTC
Hi Scott,

Maybe 3.3.3 proved slapd? I was on 3.0.0-37 EL6_4 and slapd was using lots of CPU processing the LDAP queries.

I was using Mac OS X 10.9 client copying 200+MB of various git repo data.
Slapd on server would spike and be constantly high CPU 80-100.

To be fair I guess it can be said if LDAP queries are 1% of previous it's enough to justify some (even if minimal) improvement to performance.

Thanks for all the help integrating the patch!

Jason

Comment 9 Jason Woods 2014-03-16 19:59:38 UTC
"...3.3.3 improved** slapd"

Comment 10 Martin Kosek 2014-03-17 08:43:00 UTC
I asked Alexander about reproduction of this bug last week and he advised running:

# time for i in `seq 0 150` ; do echo  "lookupnames admin" ; done | rpcclient -k `hostname` >/dev/null

The shoud be a difference (roughly 3-4 times more) in time spent in ns-slapd and sssd in the older case.

As for your current reproduction scenario, I will let Alexander to decide.

Comment 11 Jason Woods 2014-03-17 09:52:05 UTC
It could also be noted that I was connected as a user which was a member of 2 groups - each group a member of another group - so 4 groups.
user - group1,group2,group1_parent,group2_parent

Comment 12 Alexander Bokovoy 2014-03-17 14:06:23 UTC
Scott,

I think what you can use is my performance testing script. 

# cat testperformance.sh 
#!/bin/bash
REALM=MY-REALM
net cache flush
net conf setparm global 'log level' 10
for i in /var/log/dirsrv/slapd-$REALM/access /var/log/samba/log.*; do > $i; done

USERCRED="admin%Test1234"
DOMSID=`net getdomainsid`
COUNTER=0
while [  $COUNTER -lt 80 ]; do
	echo "lookupsid ${DOMSID}-$((1000 + COUNTER))"
	echo "lookupnames user${COUNTER} users${COUNTER}"
	let COUNTER=COUNTER+1 
done | rpcclient -U${USERCRED} `hostname` >/dev/null

net conf setparm global 'log level' 1
echo -n "Samba ldap_search operations:"
grep ldap_search /var/log/samba/* 2>/dev/null | wc -l
# Sleep a bit because DS pushes logs with a delay
sleep 5
echo -n "DS ldap_search operations:"
grep SRCH /var/log/dirsrv/slapd-*/access 2>/dev/null | wc -l


# kinit admin
....
# for i in `seq 1 30` ; do ipa user-add --first="Test" --last="User ${i}" user${i}  ; done
....
# for i in `seq 1 30` ; do ipa group-add --desc="Test Group ${i}" users${i}  ; done
# time sh testperformance.sh

With this test you would have 30 users and 30 groups and ask rpcclient to lookup SIDs for groups and users.

Comment 13 Scott Poore 2014-03-17 16:03:05 UTC
Again, I don't see a big difference in performance but, I do see the big search  count difference:

Test with version ipa-server-3.3.3-19.el7.x86_64:

[root@master ~]# time sh testperformance.sh 
Samba ldap_search operations:800
DS ldap_search operations:0

real	0m9.411s
user	0m0.360s
sys	0m0.180s

Test with version ipa-server-3.3.3-25.el7.x86_64:

[root@master ~]# time sh testperformance.sh 
Couldn't delete entry! key = IDMAP/SID2XID/S-1-5-2
Couldn't delete entry! key = IDMAP/SID2XID/S-1-5-32-545
Couldn't delete entry! key = IDMAP/SID2XID/S-1-1-0
Samba ldap_search operations:8
DS ldap_search operations:0

real	0m8.965s
user	0m0.342s
sys	0m0.192s

It should be noted that the above test for -19 release was done after an rpm downgrade from -25.  I'll reload my VM and start from scratch to get a clean run to see if anything differs.

Thanks,
Scott

Comment 14 Alexander Bokovoy 2014-03-17 16:21:12 UTC
Ok. Make sure you adjust 'sleep 5' to some usable value so that DS has chance to flush its logs to the file. It depends on your VM, in general it may tak around 5-10 seconds before you'll get right data in the access log.

Comment 15 Scott Poore 2014-03-17 21:56:17 UTC
Ok, with the rebuild I'm seeing different results.  I don't know if my previous results were maybe due to other configurations and/or uninstall/re-install executions.  

Now, I'm seeing similar results regardless.  

What exactly should I see with this test on RHEL7?   faster response times?  fewer log entries?   right?

ipa-server-3.3.3-19.el7.x86_64:

# time sh testperformance.sh 
Samba ldap_search operations:16
DS ldap_search operations:979

real	0m38.769s
user	0m0.350s
sys	0m0.281s


ipa-server-3.3.3-22.el7.x86_64:

# time sh testperformance.sh 
Samba ldap_search operations:14
DS ldap_search operations:1002

real	0m39.079s
user	0m0.287s
sys	0m0.270s

Note above I have set the sleep delay to 30 seconds to try to make sure I catch anything I might have missed.

Now, that being said, I re-ran the smbclient tests and those still do look the same.  Some kind of caching going on with the other test?  Or is that test just a generic perf test and not specific to this bug?

ipa-server-3.3.3-19.el7.x86_64:

# time echo -e 'recurse\nprompt n\nmput *'|\
>     smbclient -k //$(hostname)/share > /tmp/smbclient.out 2>&1
real	0m26.538s
user	0m0.351s
sys	0m0.509s

# grep ldap_search /var/log/samba/* 2>/dev/null | wc -l
10912

# grep SRCH /var/log/dirsrv/slapd-*/access 2>/dev/null | wc -l
4568


ipa-server-3.3.3-22.el7.x86_64:

# time echo -e 'recurse\nprompt n\nmput *'|\
>     smbclient -k //$(hostname)/share > /tmp/smbclient.out 2>&1
real	0m21.921s
user	0m0.341s
sys	0m0.542s

# grep ldap_search /var/log/samba/* 2>/dev/null | wc -l
82

# grep SRCH /var/log/dirsrv/slapd-*/access 2>/dev/null | wc -l
51

And yes, I waited between the greps.  So, What am I seeing above with the rpcclient tests showing about the same results regardless of patch?

Thanks,
Scott

Comment 16 Alexander Bokovoy 2014-03-17 22:41:09 UTC
Ok, so what we are seeing in rpcclient tests is probably an influence of the caching in the Samba layers.

When copying files there are a lot of calls that attempt to set and verify permissions and calculate access masks, etc. They lead to fetching domain SID and other operations which are not really about name2sid/sid2name/sid2id operations so having multiple requests towards the identity store is expected.

However, since your file store is largerly fixed, there are probably not many identities who own the files and thus, less to resolve. Samba caches most of these requests and adding caching of idmap in ipasam helps to reduce number of repetitive calls in access masks and setting permissions, I guess. After all, you'd need to perform it for every file that is copied which multiplies the number of idmap requests.

Comment 17 Scott Poore 2014-03-18 00:38:03 UTC
If the rpcclient tests are being affected by some samba caching, can we use those to verify this bug or its fix?

If smbclient tests are making calls that are irrelevant here, why is there such a drastic difference in count between the two system running the different versions?

Thanks,
Scott

Comment 18 Scott Poore 2014-03-18 17:50:30 UTC
After some review by dev, it was determined that we cannot at this time use rpcclient for the test.

"with the patch POSIX ID related queries are cached and rpclient does not know about POSIX IDs"

So, I'm doing the verification with smbclient.

Comment 19 Scott Poore 2014-03-18 17:52:20 UTC
First, the reproduced results:

[root@cloud-qe-18 ~]# # add user/group
[root@cloud-qe-18 ~]# service sssd stop; rm -rf /var/lib/sss/{db,mc}/*; service sssd start
Redirecting to /bin/systemctl stop  sssd.service
Redirecting to /bin/systemctl start  sssd.service
[root@cloud-qe-18 ~]# kdestroy -A ; echo Secret123 | kinit admin
Password for admin: 
[root@cloud-qe-18 ~]# ipa user-del bz1075132
ipa: ERROR: bz1075132: user not found
[root@cloud-qe-18 ~]# ipa user-add bz1075132 --first=f --last=l 
----------------------
Added user "bz1075132"
----------------------
  User login: bz1075132
  First name: f
  Last name: l
  Full name: f l
  Display name: f l
  Initials: fl
  Home directory: /home/bz1075132
  GECOS: f l
  Login shell: /bin/sh
  Kerberos principal: bz1075132
  Email address: bz1075132
  UID: 1810200254
  GID: 1810200254
  Password: False
  Member of groups: ipausers
  Kerberos keys available: False
[root@cloud-qe-18 ~]# echo -e 'redhat\nredhat' |ipa passwd bz1075132
----------------------------------------------------
Changed password for "bz1075132"
----------------------------------------------------
[root@cloud-qe-18 ~]# echo -e 'redhat\nSecret123\nSecret123'|kinit bz1075132
Password for bz1075132: 
Password expired.  You must change it now.
Enter new password: 
Enter it again: 
[root@cloud-qe-18 ~]# BZUID=$(getent passwd bz1075132|cut -f3 -d:)
[root@cloud-qe-18 ~]# 
[root@cloud-qe-18 ~]# # create source dir
[root@cloud-qe-18 ~]# rm -rf /tmp/source
[root@cloud-qe-18 ~]# mkdir /tmp/source
[root@cloud-qe-18 ~]# cd /tmp/source
[root@cloud-qe-18 source]# cd /usr/bin
[root@cloud-qe-18 bin]# find . ! -type l -exec cp {} /tmp/source \;
cp: omitting directory ‘.’
[root@cloud-qe-18 bin]# chown -R bz1075132:bz1075132 /tmp/source
[root@cloud-qe-18 bin]# 
[root@cloud-qe-18 bin]# # create target dir
[root@cloud-qe-18 bin]# rm -rf /tmp/target
[root@cloud-qe-18 bin]# mkdir /tmp/target
[root@cloud-qe-18 bin]# chcon -t samba_share_t /tmp/target
[root@cloud-qe-18 bin]# chown bz1075132:bz1075132 /tmp/target
[root@cloud-qe-18 bin]# 
[root@cloud-qe-18 bin]# # setup samba share
[root@cloud-qe-18 bin]# net conf delshare 'share'
[root@cloud-qe-18 bin]# net conf setparm 'share' 'comment' 'Trust test share'
[root@cloud-qe-18 bin]# net conf setparm 'share' 'read only' 'no'
[root@cloud-qe-18 bin]# net conf setparm 'share' 'path' '/tmp/target'
[root@cloud-qe-18 bin]# 
[root@cloud-qe-18 bin]# # setup for logging
[root@cloud-qe-18 bin]# I1=$(hostname -d|sed 's/\./-/g')
[root@cloud-qe-18 bin]# INSTANCE=${I1^^}
[root@cloud-qe-18 bin]# net cache flush
[root@cloud-qe-18 bin]# net conf setparm global 'log level' 10
[root@cloud-qe-18 bin]# for i in /var/log/dirsrv/slapd-$INSTANCE/access /var/log/samba/log.*; do > $i; done
[root@cloud-qe-18 bin]# 
[root@cloud-qe-18 bin]# # transfer files
[root@cloud-qe-18 bin]# cd /tmp/source
[root@cloud-qe-18 source]# echo 'Secret123'|kinit bz1075132
Password for bz1075132: 
[root@cloud-qe-18 source]# echo -e 'recurse\nprompt n\nmput *'| smbclient -k //$(hostname)/share > /dev/null 2>&1
[root@cloud-qe-18 source]# 
[root@cloud-qe-18 source]# # check log counts
[root@cloud-qe-18 source]# sleep 30
[root@cloud-qe-18 source]# FILECOUNT=$(find /tmp/target|wc -l)
[root@cloud-qe-18 source]# FILECOUNT=$(( FILECOUNT += 10 ))
[root@cloud-qe-18 source]# LOGCOUNT1=$(grep ldap_search.*$BZUID /var/log/samba/* 2>/dev/null | wc -l)
[root@cloud-qe-18 source]# LOGCOUNT2=$(grep SRCH.*$BZUID /var/log/dirsrv/slapd-*/access 2>/dev/null | wc -l)
[root@cloud-qe-18 source]# 
[root@cloud-qe-18 source]# echo "$FILECOUNT $LOGCOUNT1 $LOGCOUNT2"
1146 2272 2272
[root@cloud-qe-18 source]# 
[root@cloud-qe-18 source]# if [ $LOGCOUNT1 -gt $FILECOUNT ]; then
>     echo FAIL
> else
>     echo PASS
> fi
FAIL

I increase the FILECOUNT check variable by 10 above to give some room for the test  in case of unexpected log entries.

Comment 20 Scott Poore 2014-03-18 18:08:13 UTC
Verified.

Version ::

ipa-server-3.3.3-22.el7.x86_64

RESULTS ::

[root@cloud-qe-6 ~]# # add user/group
[root@cloud-qe-6 ~]# service sssd stop; rm -rf /var/lib/sss/{db,mc}/*; service sssd start
Redirecting to /bin/systemctl stop  sssd.service
Redirecting to /bin/systemctl start  sssd.service
[root@cloud-qe-6 ~]# kdestroy -A ; echo Secret123 | kinit admin
Password for admin: 
[root@cloud-qe-6 ~]# ipa user-del bz1075132
ipa: ERROR: bz1075132: user not found
[root@cloud-qe-6 ~]# ipa user-add bz1075132 --first=f --last=l 
----------------------
Added user "bz1075132"
----------------------
  User login: bz1075132
  First name: f
  Last name: l
  Full name: f l
  Display name: f l
  Initials: fl
  Home directory: /home/bz1075132
  GECOS: f l
  Login shell: /bin/sh
  Kerberos principal: bz1075132
  Email address: bz1075132
  UID: 597800254
  GID: 597800254
  Password: False
  Member of groups: ipausers
  Kerberos keys available: False
[root@cloud-qe-6 ~]# echo -e 'redhat\nredhat' |ipa passwd bz1075132
----------------------------------------------------
Changed password for "bz1075132"
----------------------------------------------------
[root@cloud-qe-6 ~]# echo -e 'redhat\nSecret123\nSecret123'|kinit bz1075132
Password for bz1075132: 
Password expired.  You must change it now.
Enter new password: 
Enter it again: 
[root@cloud-qe-6 ~]# BZUID=$(getent passwd bz1075132|cut -f3 -d:)
[root@cloud-qe-6 ~]# 
[root@cloud-qe-6 ~]# # create source dir
[root@cloud-qe-6 ~]# rm -rf /tmp/source
[root@cloud-qe-6 ~]# mkdir /tmp/source
[root@cloud-qe-6 ~]# cd /tmp/source
[root@cloud-qe-6 source]# cd /usr/bin
[root@cloud-qe-6 bin]# find . ! -type l -exec cp {} /tmp/source \;
cp: omitting directory ‘.’
[root@cloud-qe-6 bin]# chown -R bz1075132:bz1075132 /tmp/source
[root@cloud-qe-6 bin]# 
[root@cloud-qe-6 bin]# # create target dir
[root@cloud-qe-6 bin]# rm -rf /tmp/target
[root@cloud-qe-6 bin]# mkdir /tmp/target
[root@cloud-qe-6 bin]# chcon -t samba_share_t /tmp/target
[root@cloud-qe-6 bin]# chown bz1075132:bz1075132 /tmp/target
[root@cloud-qe-6 bin]# 
[root@cloud-qe-6 bin]# # setup samba share
[root@cloud-qe-6 bin]# net conf delshare 'share'
[root@cloud-qe-6 bin]# net conf setparm 'share' 'comment' 'Trust test share'
[root@cloud-qe-6 bin]# net conf setparm 'share' 'read only' 'no'
[root@cloud-qe-6 bin]# net conf setparm 'share' 'path' '/tmp/target'
[root@cloud-qe-6 bin]# 
[root@cloud-qe-6 bin]# # setup for logging
[root@cloud-qe-6 bin]# I1=$(hostname -d|sed 's/\./-/g')
[root@cloud-qe-6 bin]# INSTANCE=${I1^^}
[root@cloud-qe-6 bin]# net cache flush
[root@cloud-qe-6 bin]# net conf setparm global 'log level' 10
[root@cloud-qe-6 bin]# for i in /var/log/dirsrv/slapd-$INSTANCE/access /var/log/samba/log.*; do > $i; done
[root@cloud-qe-6 bin]# 
[root@cloud-qe-6 bin]# # transfer files
[root@cloud-qe-6 bin]# cd /tmp/source
[root@cloud-qe-6 source]# echo 'Secret123'|kinit bz1075132
Password for bz1075132: 
[root@cloud-qe-6 source]# echo -e 'recurse\nprompt n\nmput *'| smbclient -k //$(hostname)/share > /dev/null 2>&1
[root@cloud-qe-6 source]# 
[root@cloud-qe-6 source]# # check log counts
[root@cloud-qe-6 source]# sleep 30
[root@cloud-qe-6 source]# FILECOUNT=$(find /tmp/target|wc -l)
[root@cloud-qe-6 source]# FILECOUNT=$(( FILECOUNT += 10 ))
[root@cloud-qe-6 source]# LOGCOUNT1=$(grep ldap_search.*$BZUID /var/log/samba/* 2>/dev/null | wc -l)
[root@cloud-qe-6 source]# LOGCOUNT2=$(grep SRCH.*$BZUID /var/log/dirsrv/slapd-*/access 2>/dev/null | wc -l)
[root@cloud-qe-6 source]# 
[root@cloud-qe-6 source]# echo "$FILECOUNT $LOGCOUNT1 $LOGCOUNT2"
1155 1145 1145
[root@cloud-qe-6 source]# 
[root@cloud-qe-6 source]# if [ $LOGCOUNT1 -gt $FILECOUNT ]; then
>     echo FAIL
> else
>     echo PASS
> fi
PASS

And it should be noted that when the same test is run with admin:admins instead of new user:group, results were more dramatic:

Unpatched:

[root@cloud-qe-18 source]# BZUID=$(getent passwd admin|cut -f3 -d:)
[root@cloud-qe-18 source]# LOGCOUNT1=$(grep ldap_search.*$BZUID /var/log/samba/* 2>/dev/null | wc -l)
[root@cloud-qe-18 source]# LOGCOUNT2=$(grep SRCH.*$BZUID /var/log/dirsrv/slapd-*/access 2>/dev/null | wc -l)
[root@cloud-qe-18 source]# 
[root@cloud-qe-18 source]# echo "$LOGCOUNT1 $LOGCOUNT2"
2272 2272
[root@cloud-qe-18 source]# find /tmp/target|wc -l
1136

Patched:

[root@cloud-qe-6 source]# BZUID=$(getent passwd admin|cut -f3 -d:)
[root@cloud-qe-6 source]# LOGCOUNT1=$(grep ldap_search.*$BZUID /var/log/samba/* 2>/dev/null | wc -l)
[root@cloud-qe-6 source]# LOGCOUNT2=$(grep SRCH.*$BZUID /var/log/dirsrv/slapd-*/access 2>/dev/null | wc -l)
[root@cloud-qe-6 source]# echo "$LOGCOUNT1 $LOGCOUNT2"
0 0
[root@cloud-qe-6 source]# find /tmp/target|wc -l
1145

Comment 21 Ludek Smid 2014-06-13 13:04:01 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.