983522 – migration from 389-ds to ipa is failing

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 983522 - migration from 389-ds to ipa is failing

Summary: migration from 389-ds to ipa is failing

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	ipa
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Martin Kosek
QA Contact:	Namita Soman
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-07-11 11:13 UTC by Kaleem
Modified:	2015-01-09 14:12 UTC (History)
CC List:	5 users (show)
Fixed In Version:	ipa-3.3.0-0.2.beta2.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-06-13 10:44:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
access and errors log file (9.02 MB, application/x-tar) 2013-07-11 11:13 UTC, Kaleem	no flags	Details
apache error log (6.14 MB, text/plain) 2013-07-12 09:51 UTC, Kaleem	no flags	Details
latest 389-ds logs files (14.12 MB, application/x-tar) 2013-07-15 14:06 UTC, Kaleem	no flags	Details
latest apache error log file (6.15 MB, text/plain) 2013-07-15 14:07 UTC, Kaleem	no flags	Details
Show Obsolete (2) View All

Description Kaleem 2013-07-11 11:13:07 UTC

Created attachment 772170 [details]
access and errors log file

Description of problem:
While debugging the beaker failures of ds-migration-performance test suite, i found this. Here we are migrating 10000 users and 12 groups from 389-ds to IPA.

Version-Release number of selected component (if applicable):
[root@rhel70-ipa-master ~]# rpm -q ipa-server 389-ds-base
ipa-server-3.2.1-1.el7.x86_64
389-ds-base-1.3.1.2-1.el7.x86_64
[root@rhel70-ipa-master ~]#

How reproducible:
Always

Steps to Reproduce:
1.Setup a 389-ds instance and fill it with 10000 users and 12 groups
2.Now from IPA machine, migrate the user/group data from 389-ds into IPA

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: ipa-migration-performance-001: Migration 10000 users and 12 groups
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   LOG    ] :: slapd pid : 5538
:: [   LOG    ] :: Before migration free memory : 198266880
:: [   LOG    ] :: Before migration slapd VmRSS : 28012kB
:: [   LOG    ] :: Before migration slapd VmHWM : 28044kB
:: [   LOG    ] :: ======================= Migration started: Thu Jul 11 13:15:18 IST 2013 ========================
:: [   LOG    ] :: EXECUTING: time -p echo Secret123 | ipa migrate-ds --with-compat ldap://rhel70-ipa-client.testrelm.com:389
:: [   FAIL   ] :: Migration did not complete successfully. 
:: [   LOG    ] :: ======================= Migration finished: Thu Jul 11 15:51:52 IST 2013 ========================
:: [   LOG    ] :: slapd pid : 5538
:: [   LOG    ] :: After migration free memory : 79360000
:: [   LOG    ] :: After migration slapd VmRSS : 145848kB
:: [   LOG    ] :: After migration slapd VmHWM : 152916kB
:: [   LOG    ] :: Duration: 2h 36m 35s
:: [   LOG    ] :: Assertions: 0 good, 1 bad
:: [   FAIL   ] :: RESULT: ipa-migration-performance-001: Migration 10000 users and 12 groups

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: CLEANUP FUNCTIONAL TESTING
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Actual results:
Migration failed

Expected results:
Migration should be successful.

Additional info:
1. Please find the attached tar which contains access and errors log files of IPA ds instance.

Comment 2 Rob Crittenden 2013-07-11 13:35:46 UTC

What exactly is failing here? The test must be doing something to evaluate success/failure, what is it?

Comment 3 Kaleem 2013-07-11 14:31:55 UTC

script snippet
=================================
echo $ADMINPW | ipa migrate-ds --with-compat ldap://$CLIENT:389
 if [ $? -ne 0 ] ; then
  rlFail "Migration did not complete successfully."
=================================

It is evaluating the return code from "ipa migrate-ds" command.

Do let me know if any other info i can provide.

Comment 4 Rob Crittenden 2013-07-11 15:36:07 UTC

I need to see the Apache error log from the IPA master.

Comment 5 Kaleem 2013-07-12 09:51:58 UTC

Created attachment 772624 [details]
apache error log

Please find the attached apache's error log.

Comment 6 Martin Kosek 2013-07-15 10:04:27 UTC

The logs are now disconnected so I cannot pair errors in apache errors log and DS logs, but I see following issue in the apache error log:

[Fri Jul 12 15:03:13.099213 2013] [:error] [pid 3275] ipa: INFO: admin: migrate_ds(u'ldap://rhel70-ipa-client.testrelm.com:389', u'********', binddn=u'cn=directory manager', usercontainer=u'ou=people', groupcontainer=u'ou=groups', userobjectclass=(u'person',), groupobjectclass=(u'groupOfUniqueNames', u'groupOfNames'), userignoreobjectclass=None, userignoreattribute=None, groupignoreobjectclass=None, groupignoreattribute=None, groupoverwritegid=False, schema=u'RFC2307bis', continue=False, compat=True, version=u'2.58', exclude_groups=None, exclude_users=None): NetworkError
[Fri Jul 12 15:03:33.920432 2013] [:error] [pid 3276] ipa: INFO: admin: config_mod(ipamigrationenabled=False, rights=False, all=False, raw=False, version=u'2.58'): SUCCESS

NetworkError would indicate that there is some problem communicating with either local or remote LDAP. In the time of failure, are you still able to run for example ldapsearch with both local IPA LDAP and remote migrated LDAP?

Comment 7 Kaleem 2013-07-15 13:57:25 UTC

I can perform ldapsearch successfully remote/l


IPA Master machine 
==================

[root@rhel70-ipa-master ~]# ldapsearch -x -h localhost -D "cn=Directory Manager" -w xxxxxx -b "ou=sudoers,dc=testrelm,dc=com"
# extended LDIF
#
# LDAPv3
# base <ou=sudoers,dc=testrelm,dc=com> with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# sudoers, testrelm.com
dn: ou=sudoers,dc=testrelm,dc=com
objectClass: extensibleObject
ou: sudoers

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1
[root@rhel70-ipa-master ~]#


389-DS machine
==============

[root@rhel70-ipa-client ~]# ldapsearch -x -h localhost -D "cn=Directory Manager" -w xxxxx -b "cn=Lorenzo Wilczewski,ou=People,dc=example,dc=com"
# extended LDIF
#
# LDAPv3
# base <cn=Lorenzo Wilczewski,ou=People,dc=example,dc=com> with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# Lorenzo Wilczewski, People, example.com
dn: cn=Lorenzo Wilczewski,ou=People,dc=example,dc=com
carLicense: RGEA6VX
cn: Lorenzo Wilczewski
departmentNumber: 1100
description: This is Lorenzo Wilczewski's description
employeeType: Manager
facsimileTelephoneNumber: +1 818 801-3209
givenName: Lorenzo
homePhone: +1 804 719-9267
initials: L. W.
l: Cambridge
mail: Lorenzo_Wilczewski
manager: cn=Timmie Pittner,ou=People,dc=example,dc=com
mobile: +1 510 207-4985
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
objectClass: posixAccount
ou: Accounting
pager: +1 213 158-7237
postalAddress: example.com, Accounting Dept #510, Room#152
roomNumber: 5873
secretary: cn=Lillie Alteen,ou=People,dc=example,dc=com
sn: Wilczewski
telephoneNumber: +1 408 546-7074
title: Junior Accounting Director
uid: Lorenzo_Wilczewski
uidNumber: 19998
gidNumber: 29998
homeDirectory: /home/Lorenzo_Wilczewski
userPassword:: e1NTSEF9WnhiSkVhemhPcVZadVFJNVQ5RkI0MHZ4bi9WV3FIaWIrVmZqb0E9PQ=
 =

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1
[root@rhel70-ipa-client ~]#

Comment 8 Kaleem 2013-07-15 14:06:06 UTC

Created attachment 773769 [details]
latest 389-ds logs files

Comment 9 Kaleem 2013-07-15 14:07:34 UTC

Created attachment 773772 [details]
latest apache error log file

Comment 10 Martin Kosek 2013-07-17 08:39:04 UTC

Thanks. I see that the times of the related errors indeed correlate:

DIRSRV:
[15/Jul/2013:18:57:06 +051800] - sasl_io_recv failed to decode packet for connection 5

IPA:
[Mon Jul 15 18:57:06.711945 2013] [:error] [pid 9065] ipa: INFO: admin: migrate_ds(u'ldap://rhel70-ipa-client.testrelm.com:389', u'********', binddn=u'cn=directory manager', usercontainer=u'ou=people', groupcontainer=u'ou=groups', userobjectclass=(u'person',), groupobjectclass=(u'groupOfUniqueNames', u'groupOfNames'), userignoreobjectclass=None, userignoreattribute=None, groupignoreobjectclass=None, groupignoreattribute=None, groupoverwritegid=False, schema=u'RFC2307bis', continue=False, compat=True, version=u'2.58', exclude_groups=None, exclude_users=None): NetworkError

Rich, any idea what could cause this sasl_io_recv decoding error? I know we already investigated similar issue with UNBIND packet and max SASL IO size.

Comment 11 Rich Megginson 2013-07-17 16:00:48 UTC

(In reply to Martin Kosek from comment #10)
> Thanks. I see that the times of the related errors indeed correlate:
> 
> DIRSRV:
> [15/Jul/2013:18:57:06 +051800] - sasl_io_recv failed to decode packet for
> connection 5
> 
> IPA:
> [Mon Jul 15 18:57:06.711945 2013] [:error] [pid 9065] ipa: INFO:
> admin: migrate_ds(u'ldap://rhel70-ipa-client.testrelm.com:389',
> u'********', binddn=u'cn=directory manager', usercontainer=u'ou=people',
> groupcontainer=u'ou=groups', userobjectclass=(u'person',),
> groupobjectclass=(u'groupOfUniqueNames', u'groupOfNames'),
> userignoreobjectclass=None, userignoreattribute=None,
> groupignoreobjectclass=None, groupignoreattribute=None,
> groupoverwritegid=False, schema=u'RFC2307bis', continue=False, compat=True,
> version=u'2.58', exclude_groups=None, exclude_users=None): NetworkError
> 
> Rich, any idea what could cause this sasl_io_recv decoding error? I know we
> already investigated similar issue with UNBIND packet and max SASL IO size.

I don't know.  It could be related.  Unfortunately we don't have the return code of sasl_decode() which is what is failing.

Comment 12 Martin Kosek 2013-07-18 10:52:11 UTC

(In reply to Rich Megginson from comment #11)
> (In reply to Martin Kosek from comment #10)
...
> I don't know.  It could be related.  Unfortunately we don't have the return
> code of sasl_decode() which is what is failing.

Ok, maybe it would help if Kaleem let you check his VM with the failing connection, it may get us closer to solution - whatever works for you.

I checked the access log again, the error seems to be produced by the local VM:

[15/Jul/2013:16:19:35 +051800] conn=5 fd=66 slot=66 connection from 10.65.207.0 to 10.65.207.0
...
[15/Jul/2013:18:57:00 +051800] conn=5 op=60117 RESULT err=0 tag=105 nentries=0 etime=1
[15/Jul/2013:18:57:06 +051800] conn=5 op=-1 fd=66 closed - I/O function error.

Can the error be caused by a connection being open for too long? After all, conn=5 is held open for almost 3 hours.

Comment 13 Rich Megginson 2013-07-18 13:50:27 UTC

(In reply to Martin Kosek from comment #12)
> (In reply to Rich Megginson from comment #11)
> > (In reply to Martin Kosek from comment #10)
> ...
> > I don't know.  It could be related.  Unfortunately we don't have the return
> > code of sasl_decode() which is what is failing.
> 
> Ok, maybe it would help if Kaleem let you check his VM with the failing
> connection, it may get us closer to solution - whatever works for you.

Ok.  Kaleem, please contact me directly - rmeggins

> 
> I checked the access log again, the error seems to be produced by the local
> VM:
> 
> [15/Jul/2013:16:19:35 +051800] conn=5 fd=66 slot=66 connection from
> 10.65.207.0 to 10.65.207.0
> ...
> [15/Jul/2013:18:57:00 +051800] conn=5 op=60117 RESULT err=0 tag=105
> nentries=0 etime=1
> [15/Jul/2013:18:57:06 +051800] conn=5 op=-1 fd=66 closed - I/O function
> error.
> 
> Can the error be caused by a connection being open for too long? After all,
> conn=5 is held open for almost 3 hours.

No, there is no age limit on connections, only an idle timeout, and the connection is not being closed due to being idle.

Comment 15 Martin Kosek 2013-07-18 16:25:52 UTC

Adding needsinfo? on Kaleem to provide a reproduction environment.

Comment 16 Kaleem 2013-07-19 15:47:28 UTC

reproduction environment provided.

Comment 17 Rich Megginson 2013-07-19 20:17:32 UTC

The answer was in /var/log/messages:

Jul 19 23:21:16 dhcp207-172 ns-slapd: encoded packet size too big (599441 > 65536)

Please increase nsslapd-sasl-max-buffer-size and try again:

dn: cn=config
nsslapd-sasl-max-buffer-size: 2097152

use 2MB to be safe

Comment 18 Martin Kosek 2013-07-22 06:52:13 UTC

Rich, thanks for investigation! Would you advise IPA to configure nsslapd-sasl-max-buffer-size to 2M by default? Or is the default in DS sufficient?

Kaleem, did the increase of nsslapd-sasl-max-buffer-size fixed the issue for you? Can we close this bug?

Comment 19 Kaleem 2013-07-30 12:21:15 UTC

Updating bugzilla with observations seen so far.

Tried with increased nsslapd-sasl-max-buffer-size but result was same on VM (1 core, 1 GB RAM and 10 GB disk space) but it went successful on another VM (4 core, 4 GB RAM and 30 GB disk space).

I am trying for the least machine configuration where this works.

I did not found any IPA documentation for least machine configuration required for migration.

Also do we need to set nsslapd-sasl-max-buffer-size to 2M by default through IPA or required to change when migration has to take place.

Comment 20 Rich Megginson 2013-07-30 15:13:34 UTC

(In reply to Kaleem from comment #19)
> Updating bugzilla with observations seen so far.
> 
> Tried with increased nsslapd-sasl-max-buffer-size but result was same on VM
> (1 core, 1 GB RAM and 10 GB disk space) but it went successful on another VM
> (4 core, 4 GB RAM and 30 GB disk space).
> 
> I am trying for the least machine configuration where this works.

Please talk to Jenny about how to handle this.
 
> I did not found any IPA documentation for least machine configuration
> required for migration.

There probably isn't any.
 
> Also do we need to set nsslapd-sasl-max-buffer-size to 2M by default through
> IPA or required to change when migration has to take place.

In the short term - yes - you will need to set nsslapd-sasl-max-buffer-size to 2M by default through IPA

In the longer term - https://fedorahosted.org/389/ticket/47457

Comment 21 Martin Kosek 2013-07-30 15:36:24 UTC

(In reply to Rich Megginson from comment #20)
> (In reply to Kaleem from comment #19)
...
> > Also do we need to set nsslapd-sasl-max-buffer-size to 2M by default through
> > IPA or required to change when migration has to take place.
> 
> In the short term - yes - you will need to set nsslapd-sasl-max-buffer-size
> to 2M by default through IPA
> 
> In the longer term - https://fedorahosted.org/389/ticket/47457

Ok. Thanks both for thorough investigation! I will create upstream ticket to configure nsslapd-sasl-max-buffer-size to 2M by default in IPA.

Comment 22 Martin Kosek 2013-07-30 15:39:56 UTC

Upstream ticket:
https://fedorahosted.org/freeipa/ticket/3826

Comment 23 Martin Kosek 2013-08-07 12:20:51 UTC

Fixed upstream:
master: https://fedorahosted.org/freeipa/changeset/f5ef2fb146247b7fea86eb3b8dce3947c1a37833

cn=config's nsslapd-sasl-max-buffer-size attribute is now set to 2097152.

Comment 25 Namita Soman 2013-11-13 13:25:32 UTC

Verified using:
ipa-server-3.3.3-3.el7.x86_64
389-ds-base-1.3.1.6-8.el7.x86_64

Was able to run the ds-migration-performance test suite successfully - enabling and disabling compat

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: ipa-migration-performance-001: Migration 10000 users and 12 groups
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [ 15:03:35 ] ::  slapd pid : 22566
:: [ 15:03:35 ] ::  Before migration free memory : 2072907776
:: [ 15:03:35 ] ::  Before migration slapd VmRSS : 33348kB
:: [ 15:03:35 ] ::  Before migration slapd VmHWM : 33900kB
:: [ 15:03:35 ] ::  ======================= Migration started: Tue Nov 12 15:03:35 EST 2013 ========================
:: [ 15:03:36 ] ::  EXECUTING: time -p echo Secret123 | ipa migrate-ds --with-compat ldap://hp-sl4545g7-01.testrelm.com:389

<..snip..>

:: [   PASS   ] :: Migration completed successfully. 
:: [ 15:14:57 ] ::  ======================= Migration finished: Tue Nov 12 15:14:57 EST 2013 ========================
:: [ 15:14:57 ] ::  slapd pid : 22566
:: [ 15:14:57 ] ::  After migration free memory : 1293557760
:: [ 15:14:57 ] ::  After migration slapd VmRSS : 183968kB
:: [ 15:14:57 ] ::  After migration slapd VmHWM : 192596kB
'de279007-494c-42e7-9c55-4eb9fe39e30a'
ipa-migration-performance-001 result: PASS

Comment 26 Ludek Smid 2014-06-13 10:44:58 UTC

This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Comment 28 Tru Huynh 2015-01-09 14:03:29 UTC

can you pull the 2MB size config back into RHEL6.6 ipa which seems to suffer the same issue?

Comment 29 Tru Huynh 2015-01-09 14:12:17 UTC

http://blog.christophersmart.com/articles/freeipa-how-to-fedora/

Note You need to log in before you can comment on or make changes to this bug.