Bug 1118057

Summary: online import crashes server if using verbose error logging
Product: Red Hat Enterprise Linux 7 Reporter: Noriko Hosoi <nhosoi>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 7.0CC: mreynolds, nkinder, rmeggins
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.3.1-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 09:36:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Noriko Hosoi 2014-07-09 22:43:43 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47827

If you enable "trace function calls" error logging, and perform an online import(import task) the server will crash.  The logging changes the internal timing, and exposes a race condition between the task event queue and the main import thread.

Comment 2 Sankar Ramalingam 2015-01-20 20:12:40 UTC
Running ldapmodify to change the nsslapd-errorlog-level is taking more than 10 secs to complete the operation.

[root@sparks ~]# time ldapmodify -x -p 1289 -h localhost -D "cn=Directory manager" -w Secret123 << EOF
dn: cn=config
changetype: modify
replace: nsslapd-errorlog-level
nsslapd-errorlog-level: 1
EOF

modifying entry "cn=config"

real	0m12.268s
user	0m0.001s
sys	0m0.009s

After changing the error log level to 1(trace function calls), the error logs printing "pagedresults timeout" error continuously.
[20/Jan/2015:15:10:49 -0500] - <-- pagedresults_is_timedout: -
[20/Jan/2015:15:10:49 -0500] - --> pagedresults_is_timedout
[20/Jan/2015:15:10:49 -0500] - <-- pagedresults_is_timedout: -
[20/Jan/2015:15:10:49 -0500] - --> pagedresults_is_timedout

Comment 3 mreynolds 2015-01-20 21:06:34 UTC
(In reply to Sankar Ramalingam from comment #2)
> Running ldapmodify to change the nsslapd-errorlog-level is taking more than
> 10 secs to complete the operation.
> 
> [root@sparks ~]# time ldapmodify -x -p 1289 -h localhost -D "cn=Directory
> manager" -w Secret123 << EOF
> dn: cn=config
> changetype: modify
> replace: nsslapd-errorlog-level
> nsslapd-errorlog-level: 1
> EOF
> 
> modifying entry "cn=config"
> 
> real	0m12.268s
> user	0m0.001s
> sys	0m0.009s

This is expected, as the log level is "trace function calls".  It is a very expensise log level.  As it's being set, it starts logging right away(before the result is returned to the client).

> 
> After changing the error log level to 1(trace function calls), the error
> logs printing "pagedresults timeout" error continuously.
> [20/Jan/2015:15:10:49 -0500] - <-- pagedresults_is_timedout: -
> [20/Jan/2015:15:10:49 -0500] - --> pagedresults_is_timedout
> [20/Jan/2015:15:10:49 -0500] - <-- pagedresults_is_timedout: -
> [20/Jan/2015:15:10:49 -0500] - --> pagedresults_is_timedout

Expected as well.

Comment 4 Sankar Ramalingam 2015-01-21 09:41:04 UTC
I tried adding a new suffux and few new users to the directory server instance which is configured with errorlog-level: 1. No response from the server for a long time. Its hung for more than 10 mins, then I killed the scripts.

This server is no longer responds to any ldap operation. Then, I tried restarting the server instance. But, no response.

[root@sparks MMR_WINSYNC]# ./AddEntry.sh Users 1389 "ou=people,dc=passsync,dc=com" wwwwwwta 999 localhost
AddinAdding 999 Users to 
ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

./AddSuffix.sh "dc=testsnew,dc=com" 1389
adding new entry "cn=testsnew1389,cn=ldbm database,cn=plugins,cn=config"
ldap_result: Can't contact LDAP server (-1)

Comment 5 mreynolds 2015-01-21 14:15:22 UTC
Did the server crash, or is it just unresponisve?

If it crashed by just adding new users then you found a new bug.  Please gather the core file/stacktrace, and open a new bug.

If it's just unrepsonsive, then "trace function calls" is too expensive to run on your particular system.  So try a different log level, like plugin logging:

dn: cn=config
changetype: modify
replace: nsslapd-errorlog-level
nsslapd-errorlog-level: 65536

Comment 8 Sankar Ramalingam 2015-01-25 05:53:13 UTC
[root@intel-s3eb1-03 slapd-M1]# time ldapmodify -x -p 1189 -h localhost -D "cn=Directory manager" -w Secret123 << EOF
dn: cn=config
changetype: modify
replace: nsslapd-errorlog-level
nsslapd-errorlog-level: 1
EOF

modifying entry "cn=config"


[root@intel-s3eb1-03 slapd-M1]# cat /tmp/testusr1.ldif 
dn: uid=mmrtestctt333,ou=people,dc=passsync,dc=com
telephoneNumber: 98989813339
mail: mmrtestctt333
uid: mmrtestctt333
givenName: mmrtestctt333
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetorgperson
objectclass: posixAccount
sn: mmrtestctt333
cn: mmrtestctt333
homeDirectory: /home/mmrtestctt333
loginShell: /bin/bash
userPassword: Secret123
uidNumber: 500333
gidNumber: 500333

[root@intel-s3eb1-03 slapd-M1]# ldif2db.pl -Z M1 -D "cn=Directory Manager" -w Secret123 -n passsync1189 -i /tmp/testusr1.ldif 
Successfully added task entry "cn=import_2015_1_25_0_33_32, cn=import, cn=tasks, cn=config"

[25/Jan/2015:00:33:32 -0500] conn=18 fd=64 slot=64 connection from 2620:52:0:102f:230:48ff:fece:f634 to 2620:52:0:102f:230:48ff:fece:f634
[25/Jan/2015:00:33:32 -0500] conn=18 op=0 EXT oid="1.3.6.1.4.1.1466.20037" name="startTLS"
[25/Jan/2015:00:33:32 -0500] conn=18 op=0 RESULT err=0 tag=120 nentries=0 etime=0
[25/Jan/2015:00:33:32 -0500] conn=18 TLS1.2 128-bit AES
[25/Jan/2015:00:33:32 -0500] conn=18 op=1 BIND dn="cn=Directory Manager" method=128 version=3
[25/Jan/2015:00:33:33 -0500] conn=18 op=1 RESULT err=0 tag=97 nentries=0 etime=1 dn="cn=directory manager"
[25/Jan/2015:00:33:33 -0500] conn=18 op=2 ADD dn="cn=import_2015_1_25_0_33_32,cn=import,cn=tasks,cn=config"
[25/Jan/2015:00:34:01 -0500] conn=18 op=2 RESULT err=0 tag=105 nentries=0 etime=28
[25/Jan/2015:00:34:03 -0500] conn=18 op=3 UNBIND
[25/Jan/2015:00:34:03 -0500] conn=18 op=3 fd=64 closed - U1

[root@intel-s3eb1-03 slapd-M1]# ldapsearch -LLL -x -p 1189 -D "cn=Directory Manager" -w Secret123 -h localhost -b "uid=mmrtestctt333,ou=people,dc=passsync,dc=com"
Operations error (1)

[root@intel-s3eb1-03 slapd-M1]# ldapsearch -LLL -x -p 1189 -D "cn=Directory Manager" -w Secret123 -h localhost -b "ou=people,dc=passsync,dc=com"
Operations error (1)
[root@intel-s3eb1-03 MMR_WINSYNC]# ./AddEntry.sh Users 1189 "ou=people,dc=passsync,dc=com" importtests 1 localhost
No of entries added will be 1
Adding 1 Users to 
adding new entry "uid=importtests1,ou=people,dc=passsync,dc=com"
ldap_add: Operations error (1)

[root@intel-s3eb1-03 MMR_WINSYNC]# ./AddEntry.sh Users 1189 "dc=passsync,dc=com" importtests 1 localhost
No of entries added will be 1
Adding 1 Users to 
adding new entry "uid=importtests1,dc=passsync,dc=com"
ldap_add: Operations error (1)

[root@intel-s3eb1-03 MMR_WINSYNC]# 
[root@intel-s3eb1-03 MMR_WINSYNC]# ./AddEntry.sh Users 1189 "dc=example,dc=com" importtests 1 localhost
No of entries added will be 1
Adding 1 Users to 
adding new entry "uid=importtests1,dc=example,dc=com"


It throws ldap operations error when trying to add a new user or search an entry in the same database. However, the server is not crashing.

Comment 9 mreynolds 2015-01-25 15:13:40 UTC
The import is still running, that's why you are getting error 1.  You just need to wait for the import task to complete before verifying the server is up and running.  You can either query the import task, or grep the error log for "Import complete".

Comment 10 Sankar Ramalingam 2015-01-27 09:43:22 UTC
[root@intel-s3eb1-03 slapd-M1]# ldif2db.pl -Z M1 -D "cn=Directory Manager" -w Secret123 -n passsync1189 -i /tmp/testusr1.ldif 
Successfully added task entry "cn=import_2015_1_27_4_32_22, cn=import, cn=tasks, cn=config"

[root@intel-s3eb1-03 slapd-M1]# ldapsearch -LLL -x -p 1189 -D "cn=Directory Manager" -w Secret123 -h localhost -b "cn=import_2015_1_27_4_32_22, cn=import, cn=tasks, cn=config"

[root@intel-s3eb1-03 slapd-M1]# ldapsearch -LLL -x -p 1189 -D "cn=Directory Manager" -w Secret123 -h localhost -b "cn=import, cn=tasks, cn=config"
dn: cn=import,cn=tasks,cn=config
objectClass: top
objectClass: extensibleObject
cn: import

[root@intel-s3eb1-03 slapd-M1]# ldapsearch -LLL -x -p 1189 -D "cn=Directory Manager" -w Secret123 -h localhost -b "uid=mmrtestctt333,ou=people,dc=passsync,dc=com"
Operations error (1)

The above ldapsearches shows that the import task is completed. But, the ldapsearch for the user still fails with Operations error.

Comment 11 mreynolds 2015-01-27 14:09:46 UTC
This works fine for me.  I'm not sure what is going wrong with your setup.  Here is the audit log showing the testcase:

time: 20150127085858
dn: cn=config
changetype: modify
replace: nsslapd-errorlog-level
nsslapd-errorlog-level: 1

time: 20150127085909
dn: cn=import1422367148711,cn=import,cn=tasks,cn=config
changetype: add
objectClass: top
objectClass: extensibleObject
cn: import1422367148711
ttl: 4
nsfilename: /tmp/export.ldif
nsinstance: userroot

time: 20150127085923
dn: cn=import1422367148711,cn=import,cn=tasks,cn=config
changetype: delete
modifiersname: cn=server,cn=plugins,cn=config

time: 20150127085941
dn: uid=mreynolds,dc=example,dc=com
changetype: add
uid: mreynolds
givenName: mark
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetorgperson
sn: reynolds
cn: mark reynolds


I tested 1.3.3 and 1.3.4(master) using a fresh instance, and it works fine.

Is this a plain instance, or have you configured other plugins?  Or setup replication?

Technically the bug is verified if the import completes without crashing the server, but I can not explain why you are getting error 1 when trying to add an entry after the import completes.

Comment 12 Sankar Ramalingam 2015-01-27 14:33:49 UTC
My test environment has a 4 way MMR. I will test it with a standalone instance and mark the bug as Verified. I will open a new bug if it sounds reproducible consistently with MMR setup.

Comment 13 mreynolds 2015-01-27 14:46:21 UTC
(In reply to Sankar Ramalingam from comment #12)
> My test environment has a 4 way MMR. I will test it with a standalone
> instance and mark the bug as Verified. I will open a new bug if it sounds
> reproducible consistently with MMR setup.

Sounds good.  It's possible that there is so much logging going on because of all the replication agreements(4), that it might be temporarily causing the error 1 after the import.  You could try turning off verbose logging before attempting the add operation.  

Did you see anything in the error log when you got the error 1 when trying to add an entry?

Comment 14 Sankar Ramalingam 2015-01-28 13:16:06 UTC
I could successfully add few users after setting up the errorlog-level to 1. Hence, marking the bug as Verified. 
This is on a single instance, not on a replication setup. I will further test it with MMR setup and open bugs if needed.

[root@intel-s3eb1-03 MMR_WINSYNC]# ldapmodify -x -p 9589 -h localhost -D "cn=Directory manager" -w Secret123 << EOF
dn: cn=config
changetype: modify
replace: nsslapd-errorlog-level
nsslapd-errorlog-level: 1
EOF

[root@intel-s3eb1-03 MMR_WINSYNC]# ./AddEntry.sh Users 9589 "dc=modrdn2,dc=com" newteasasasst 9 localhost
No of entries added will be 9
Adding 9 Users to 
adding new entry "uid=newteasasasst9,dc=modrdn2,dc=com"

Comment 16 errata-xmlrpc 2015-03-05 09:36:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0416.html