Bug 983522 - migration from 389-ds to ipa is failing
migration from 389-ds to ipa is failing
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ipa (Show other bugs)
7.0
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Martin Kosek
Namita Soman
: TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-11 07:13 EDT by Kaleem
Modified: 2015-01-09 09:12 EST (History)
5 users (show)

See Also:
Fixed In Version: ipa-3.3.0-0.2.beta2.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 06:44:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
access and errors log file (9.02 MB, application/x-tar)
2013-07-11 07:13 EDT, Kaleem
no flags Details
apache error log (6.14 MB, text/plain)
2013-07-12 05:51 EDT, Kaleem
no flags Details
latest 389-ds logs files (14.12 MB, application/x-tar)
2013-07-15 10:06 EDT, Kaleem
no flags Details
latest apache error log file (6.15 MB, text/plain)
2013-07-15 10:07 EDT, Kaleem
no flags Details

  None (edit)
Description Kaleem 2013-07-11 07:13:07 EDT
Created attachment 772170 [details]
access and errors log file

Description of problem:
While debugging the beaker failures of ds-migration-performance test suite, i found this. Here we are migrating 10000 users and 12 groups from 389-ds to IPA.

Version-Release number of selected component (if applicable):
[root@rhel70-ipa-master ~]# rpm -q ipa-server 389-ds-base
ipa-server-3.2.1-1.el7.x86_64
389-ds-base-1.3.1.2-1.el7.x86_64
[root@rhel70-ipa-master ~]#

How reproducible:
Always

Steps to Reproduce:
1.Setup a 389-ds instance and fill it with 10000 users and 12 groups
2.Now from IPA machine, migrate the user/group data from 389-ds into IPA

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: ipa-migration-performance-001: Migration 10000 users and 12 groups
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   LOG    ] :: slapd pid : 5538
:: [   LOG    ] :: Before migration free memory : 198266880
:: [   LOG    ] :: Before migration slapd VmRSS : 28012kB
:: [   LOG    ] :: Before migration slapd VmHWM : 28044kB
:: [   LOG    ] :: ======================= Migration started: Thu Jul 11 13:15:18 IST 2013 ========================
:: [   LOG    ] :: EXECUTING: time -p echo Secret123 | ipa migrate-ds --with-compat ldap://rhel70-ipa-client.testrelm.com:389
:: [   FAIL   ] :: Migration did not complete successfully. 
:: [   LOG    ] :: ======================= Migration finished: Thu Jul 11 15:51:52 IST 2013 ========================
:: [   LOG    ] :: slapd pid : 5538
:: [   LOG    ] :: After migration free memory : 79360000
:: [   LOG    ] :: After migration slapd VmRSS : 145848kB
:: [   LOG    ] :: After migration slapd VmHWM : 152916kB
:: [   LOG    ] :: Duration: 2h 36m 35s
:: [   LOG    ] :: Assertions: 0 good, 1 bad
:: [   FAIL   ] :: RESULT: ipa-migration-performance-001: Migration 10000 users and 12 groups

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: CLEANUP FUNCTIONAL TESTING
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Actual results:
Migration failed

Expected results:
Migration should be successful.

Additional info:
1. Please find the attached tar which contains access and errors log files of IPA ds instance.
Comment 2 Rob Crittenden 2013-07-11 09:35:46 EDT
What exactly is failing here? The test must be doing something to evaluate success/failure, what is it?
Comment 3 Kaleem 2013-07-11 10:31:55 EDT
script snippet
=================================
echo $ADMINPW | ipa migrate-ds --with-compat ldap://$CLIENT:389
 if [ $? -ne 0 ] ; then
  rlFail "Migration did not complete successfully."
=================================

It is evaluating the return code from "ipa migrate-ds" command.

Do let me know if any other info i can provide.
Comment 4 Rob Crittenden 2013-07-11 11:36:07 EDT
I need to see the Apache error log from the IPA master.
Comment 5 Kaleem 2013-07-12 05:51:58 EDT
Created attachment 772624 [details]
apache error log

Please find the attached apache's error log.
Comment 6 Martin Kosek 2013-07-15 06:04:27 EDT
The logs are now disconnected so I cannot pair errors in apache errors log and DS logs, but I see following issue in the apache error log:

[Fri Jul 12 15:03:13.099213 2013] [:error] [pid 3275] ipa: INFO: admin@TESTRELM.COM: migrate_ds(u'ldap://rhel70-ipa-client.testrelm.com:389', u'********', binddn=u'cn=directory manager', usercontainer=u'ou=people', groupcontainer=u'ou=groups', userobjectclass=(u'person',), groupobjectclass=(u'groupOfUniqueNames', u'groupOfNames'), userignoreobjectclass=None, userignoreattribute=None, groupignoreobjectclass=None, groupignoreattribute=None, groupoverwritegid=False, schema=u'RFC2307bis', continue=False, compat=True, version=u'2.58', exclude_groups=None, exclude_users=None): NetworkError
[Fri Jul 12 15:03:33.920432 2013] [:error] [pid 3276] ipa: INFO: admin@TESTRELM.COM: config_mod(ipamigrationenabled=False, rights=False, all=False, raw=False, version=u'2.58'): SUCCESS

NetworkError would indicate that there is some problem communicating with either local or remote LDAP. In the time of failure, are you still able to run for example ldapsearch with both local IPA LDAP and remote migrated LDAP?
Comment 7 Kaleem 2013-07-15 09:57:25 EDT
I can perform ldapsearch successfully remote/l


IPA Master machine 
==================

[root@rhel70-ipa-master ~]# ldapsearch -x -h localhost -D "cn=Directory Manager" -w xxxxxx -b "ou=sudoers,dc=testrelm,dc=com"
# extended LDIF
#
# LDAPv3
# base <ou=sudoers,dc=testrelm,dc=com> with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# sudoers, testrelm.com
dn: ou=sudoers,dc=testrelm,dc=com
objectClass: extensibleObject
ou: sudoers

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1
[root@rhel70-ipa-master ~]#


389-DS machine
==============

[root@rhel70-ipa-client ~]# ldapsearch -x -h localhost -D "cn=Directory Manager" -w xxxxx -b "cn=Lorenzo Wilczewski,ou=People,dc=example,dc=com"
# extended LDIF
#
# LDAPv3
# base <cn=Lorenzo Wilczewski,ou=People,dc=example,dc=com> with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# Lorenzo Wilczewski, People, example.com
dn: cn=Lorenzo Wilczewski,ou=People,dc=example,dc=com
carLicense: RGEA6VX
cn: Lorenzo Wilczewski
departmentNumber: 1100
description: This is Lorenzo Wilczewski's description
employeeType: Manager
facsimileTelephoneNumber: +1 818 801-3209
givenName: Lorenzo
homePhone: +1 804 719-9267
initials: L. W.
l: Cambridge
mail: Lorenzo_Wilczewski@example.com
manager: cn=Timmie Pittner,ou=People,dc=example,dc=com
mobile: +1 510 207-4985
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
objectClass: posixAccount
ou: Accounting
pager: +1 213 158-7237
postalAddress: example.com, Accounting Dept #510, Room#152
roomNumber: 5873
secretary: cn=Lillie Alteen,ou=People,dc=example,dc=com
sn: Wilczewski
telephoneNumber: +1 408 546-7074
title: Junior Accounting Director
uid: Lorenzo_Wilczewski
uidNumber: 19998
gidNumber: 29998
homeDirectory: /home/Lorenzo_Wilczewski
userPassword:: e1NTSEF9WnhiSkVhemhPcVZadVFJNVQ5RkI0MHZ4bi9WV3FIaWIrVmZqb0E9PQ=
 =

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1
[root@rhel70-ipa-client ~]#
Comment 8 Kaleem 2013-07-15 10:06:06 EDT
Created attachment 773769 [details]
latest 389-ds logs files
Comment 9 Kaleem 2013-07-15 10:07:34 EDT
Created attachment 773772 [details]
latest apache error log file
Comment 10 Martin Kosek 2013-07-17 04:39:04 EDT
Thanks. I see that the times of the related errors indeed correlate:

DIRSRV:
[15/Jul/2013:18:57:06 +051800] - sasl_io_recv failed to decode packet for connection 5

IPA:
[Mon Jul 15 18:57:06.711945 2013] [:error] [pid 9065] ipa: INFO: admin@TESTRELM.COM: migrate_ds(u'ldap://rhel70-ipa-client.testrelm.com:389', u'********', binddn=u'cn=directory manager', usercontainer=u'ou=people', groupcontainer=u'ou=groups', userobjectclass=(u'person',), groupobjectclass=(u'groupOfUniqueNames', u'groupOfNames'), userignoreobjectclass=None, userignoreattribute=None, groupignoreobjectclass=None, groupignoreattribute=None, groupoverwritegid=False, schema=u'RFC2307bis', continue=False, compat=True, version=u'2.58', exclude_groups=None, exclude_users=None): NetworkError

Rich, any idea what could cause this sasl_io_recv decoding error? I know we already investigated similar issue with UNBIND packet and max SASL IO size.
Comment 11 Rich Megginson 2013-07-17 12:00:48 EDT
(In reply to Martin Kosek from comment #10)
> Thanks. I see that the times of the related errors indeed correlate:
> 
> DIRSRV:
> [15/Jul/2013:18:57:06 +051800] - sasl_io_recv failed to decode packet for
> connection 5
> 
> IPA:
> [Mon Jul 15 18:57:06.711945 2013] [:error] [pid 9065] ipa: INFO:
> admin@TESTRELM.COM: migrate_ds(u'ldap://rhel70-ipa-client.testrelm.com:389',
> u'********', binddn=u'cn=directory manager', usercontainer=u'ou=people',
> groupcontainer=u'ou=groups', userobjectclass=(u'person',),
> groupobjectclass=(u'groupOfUniqueNames', u'groupOfNames'),
> userignoreobjectclass=None, userignoreattribute=None,
> groupignoreobjectclass=None, groupignoreattribute=None,
> groupoverwritegid=False, schema=u'RFC2307bis', continue=False, compat=True,
> version=u'2.58', exclude_groups=None, exclude_users=None): NetworkError
> 
> Rich, any idea what could cause this sasl_io_recv decoding error? I know we
> already investigated similar issue with UNBIND packet and max SASL IO size.

I don't know.  It could be related.  Unfortunately we don't have the return code of sasl_decode() which is what is failing.
Comment 12 Martin Kosek 2013-07-18 06:52:11 EDT
(In reply to Rich Megginson from comment #11)
> (In reply to Martin Kosek from comment #10)
...
> I don't know.  It could be related.  Unfortunately we don't have the return
> code of sasl_decode() which is what is failing.

Ok, maybe it would help if Kaleem let you check his VM with the failing connection, it may get us closer to solution - whatever works for you.

I checked the access log again, the error seems to be produced by the local VM:

[15/Jul/2013:16:19:35 +051800] conn=5 fd=66 slot=66 connection from 10.65.207.0 to 10.65.207.0
...
[15/Jul/2013:18:57:00 +051800] conn=5 op=60117 RESULT err=0 tag=105 nentries=0 etime=1
[15/Jul/2013:18:57:06 +051800] conn=5 op=-1 fd=66 closed - I/O function error.

Can the error be caused by a connection being open for too long? After all, conn=5 is held open for almost 3 hours.
Comment 13 Rich Megginson 2013-07-18 09:50:27 EDT
(In reply to Martin Kosek from comment #12)
> (In reply to Rich Megginson from comment #11)
> > (In reply to Martin Kosek from comment #10)
> ...
> > I don't know.  It could be related.  Unfortunately we don't have the return
> > code of sasl_decode() which is what is failing.
> 
> Ok, maybe it would help if Kaleem let you check his VM with the failing
> connection, it may get us closer to solution - whatever works for you.

Ok.  Kaleem, please contact me directly - rmeggins@redhat.com

> 
> I checked the access log again, the error seems to be produced by the local
> VM:
> 
> [15/Jul/2013:16:19:35 +051800] conn=5 fd=66 slot=66 connection from
> 10.65.207.0 to 10.65.207.0
> ...
> [15/Jul/2013:18:57:00 +051800] conn=5 op=60117 RESULT err=0 tag=105
> nentries=0 etime=1
> [15/Jul/2013:18:57:06 +051800] conn=5 op=-1 fd=66 closed - I/O function
> error.
> 
> Can the error be caused by a connection being open for too long? After all,
> conn=5 is held open for almost 3 hours.

No, there is no age limit on connections, only an idle timeout, and the connection is not being closed due to being idle.
Comment 15 Martin Kosek 2013-07-18 12:25:52 EDT
Adding needsinfo? on Kaleem to provide a reproduction environment.
Comment 16 Kaleem 2013-07-19 11:47:28 EDT
reproduction environment provided.
Comment 17 Rich Megginson 2013-07-19 16:17:32 EDT
The answer was in /var/log/messages:

Jul 19 23:21:16 dhcp207-172 ns-slapd: encoded packet size too big (599441 > 65536)

Please increase nsslapd-sasl-max-buffer-size and try again:

dn: cn=config
nsslapd-sasl-max-buffer-size: 2097152

use 2MB to be safe
Comment 18 Martin Kosek 2013-07-22 02:52:13 EDT
Rich, thanks for investigation! Would you advise IPA to configure nsslapd-sasl-max-buffer-size to 2M by default? Or is the default in DS sufficient?

Kaleem, did the increase of nsslapd-sasl-max-buffer-size fixed the issue for you? Can we close this bug?
Comment 19 Kaleem 2013-07-30 08:21:15 EDT
Updating bugzilla with observations seen so far.

Tried with increased nsslapd-sasl-max-buffer-size but result was same on VM (1 core, 1 GB RAM and 10 GB disk space) but it went successful on another VM (4 core, 4 GB RAM and 30 GB disk space).

I am trying for the least machine configuration where this works.

I did not found any IPA documentation for least machine configuration required for migration.

Also do we need to set nsslapd-sasl-max-buffer-size to 2M by default through IPA or required to change when migration has to take place.
Comment 20 Rich Megginson 2013-07-30 11:13:34 EDT
(In reply to Kaleem from comment #19)
> Updating bugzilla with observations seen so far.
> 
> Tried with increased nsslapd-sasl-max-buffer-size but result was same on VM
> (1 core, 1 GB RAM and 10 GB disk space) but it went successful on another VM
> (4 core, 4 GB RAM and 30 GB disk space).
> 
> I am trying for the least machine configuration where this works.

Please talk to Jenny about how to handle this.
 
> I did not found any IPA documentation for least machine configuration
> required for migration.

There probably isn't any.
 
> Also do we need to set nsslapd-sasl-max-buffer-size to 2M by default through
> IPA or required to change when migration has to take place.

In the short term - yes - you will need to set nsslapd-sasl-max-buffer-size to 2M by default through IPA

In the longer term - https://fedorahosted.org/389/ticket/47457
Comment 21 Martin Kosek 2013-07-30 11:36:24 EDT
(In reply to Rich Megginson from comment #20)
> (In reply to Kaleem from comment #19)
...
> > Also do we need to set nsslapd-sasl-max-buffer-size to 2M by default through
> > IPA or required to change when migration has to take place.
> 
> In the short term - yes - you will need to set nsslapd-sasl-max-buffer-size
> to 2M by default through IPA
> 
> In the longer term - https://fedorahosted.org/389/ticket/47457

Ok. Thanks both for thorough investigation! I will create upstream ticket to configure nsslapd-sasl-max-buffer-size to 2M by default in IPA.
Comment 22 Martin Kosek 2013-07-30 11:39:56 EDT
Upstream ticket:
https://fedorahosted.org/freeipa/ticket/3826
Comment 23 Martin Kosek 2013-08-07 08:20:51 EDT
Fixed upstream:
master: https://fedorahosted.org/freeipa/changeset/f5ef2fb146247b7fea86eb3b8dce3947c1a37833

cn=config's nsslapd-sasl-max-buffer-size attribute is now set to 2097152.
Comment 25 Namita Soman 2013-11-13 08:25:32 EST
Verified using:
ipa-server-3.3.3-3.el7.x86_64
389-ds-base-1.3.1.6-8.el7.x86_64

Was able to run the ds-migration-performance test suite successfully - enabling and disabling compat

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: ipa-migration-performance-001: Migration 10000 users and 12 groups
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 15:03:35 ] ::  slapd pid : 22566
:: [ 15:03:35 ] ::  Before migration free memory : 2072907776
:: [ 15:03:35 ] ::  Before migration slapd VmRSS : 33348kB
:: [ 15:03:35 ] ::  Before migration slapd VmHWM : 33900kB
:: [ 15:03:35 ] ::  ======================= Migration started: Tue Nov 12 15:03:35 EST 2013 ========================
:: [ 15:03:36 ] ::  EXECUTING: time -p echo Secret123 | ipa migrate-ds --with-compat ldap://hp-sl4545g7-01.testrelm.com:389

<..snip..>

:: [   PASS   ] :: Migration completed successfully. 
:: [ 15:14:57 ] ::  ======================= Migration finished: Tue Nov 12 15:14:57 EST 2013 ========================
:: [ 15:14:57 ] ::  slapd pid : 22566
:: [ 15:14:57 ] ::  After migration free memory : 1293557760
:: [ 15:14:57 ] ::  After migration slapd VmRSS : 183968kB
:: [ 15:14:57 ] ::  After migration slapd VmHWM : 192596kB
'de279007-494c-42e7-9c55-4eb9fe39e30a'
ipa-migration-performance-001 result: PASS
Comment 26 Ludek Smid 2014-06-13 06:44:58 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.
Comment 28 Tru Huynh 2015-01-09 09:03:29 EST
can you pull the 2MB size config back into RHEL6.6 ipa which seems to suffer the same issue?

Note You need to log in before you can comment on or make changes to this bug.