Bug 1045500

Summary: broken IPA update from RHEL6.4 to RHEL6.5
Product: Red Hat Enterprise Linux 6 Reporter: Konstantin Lepikhov <klepikho>
Component: ipaAssignee: Martin Kosek <mkosek>
Status: CLOSED NOTABUG QA Contact: Namita Soman <nsoman>
Severity: high Docs Contact:
Priority: high    
Version: 6.5CC: dpal, jbuchta, jcholast, klepikho, mkosek, mreynolds, nalin, rcritten
Target Milestone: rcFlags: mkosek: needinfo? (klepikho)
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-30 08:09:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1056252, 1061410    

Description Konstantin Lepikhov 2013-12-20 15:28:35 UTC
Description of problem:

IPA updater wasn't successful during IPA upgrade from RHEL6.4 to RHEL6.5:

    [11/Dec/2013:10:17:59 +0000] - 389-Directory/1.2.11.15 B2013.238.2155 starting up
    [11/Dec/2013:10:17:59 +0000] - Detected Disorderly Shutdown last time Directory Server was running, recovering database.
    [11/Dec/2013:10:18:01 +0000] schema-compat-plugin - warning: no entries set up under cn=computers, cn=compat,dc=ipa,dc=optimus,dc=pt
    [11/Dec/2013:10:18:05 +0000] NSACLPlugin - ACL PARSE ERR(rv=-8): (target = "ldap:///cn=trusts,dc=ipa,dc=XXX,dc=YYY
    [11/Dec/2013:10:18:05 +0000] NSACLPlugin - Error: This  ((target = "ldap:///cn=trusts,dc=ipa,dc=XXX,dc=YYY")(targetattr = "ipaNTTrustType || ipaNTTrustAttributes || ipaNTTrustDirection || ipaNTTrustPartner || ipaNTFlatName || ipaNTTrustAuthOutgoing || ipaNTTrustAuthIncoming || ipaNTSecurityIdentifier || ipaNTTrustForestTrustInfo || ipaNTTrustPosixOffset || ipaNTSupportedEncryptionTypes || krbPrincipalName || krbLastPwdChange || krbTicketFlags || krbLoginFailedCount || krbExtraData || krbPrincipalKey")(version 3.0;acl "Allow trust system user to create and delete trust accounts and cross realm principals"; allow (read,write,add,delete) groupdn="ldap:///cn=adtrust agents,cn=sysaccounts,cn=etc,dc=ipa,dc=XXX,dc=YYY";)) ACL will not be considered for evaluation because of syntax errors.
    [11/Dec/2013:10:18:05 +0000] NSACLPlugin - ACL PARSE ERR(rv=-8): (target = "ldap:///cn=trusts,dc=ipa,dc=XXX,dc=YYY
    [11/Dec/2013:10:18:05 +0000] NSACLPlugin - Error: This  ((target = "ldap:///cn=trusts,dc=ipa,dc=XXX,dc=YYY")(targetattr = "ipaNTTrustType || ipaNTTrustAttributes || ipaNTTrustDirection || ipaNTTrustPartner
     || ipaNTFlatName || ipaNTTrustAuthOutgoing || ipaNTTrustAuthIncoming || ipaNTSecurityIdentifier || ipaNTTrustForestTrustInfo || ipaNTTrustPosixOffset || ipaNTSupportedEncryptionTypes")(version 3.0;acl "Allow trust admins manage trust accounts"; allow (read,write,add,delete) groupdn="ldap:///cn=trust admins,cn=groups,cn=accounts,dc=XXX,dc=YYY";)) ACL will not be considered for evaluation because of syntax errors.
    [11/Dec/2013:10:18:22 +0000] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 6 ldap://xxxxxxxx:389} 4ef10caa001c00060000 4ef10cb8000300060000] which is present in RUV [database RUV]
    [11/Dec/2013:10:18:22 +0000] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: for replica dc=ipa,dc=XXX,dc=YYY there were some differences between the changelog max RUV and the database RUV.
      If there are obsolete elements in the database RUV, you should remove them using the CLEANALLRUV task.
      If they are not obsolete, you should check their status to see why there are no changes from those servers in the changelog.
    [11/Dec/2013:10:18:22 +0000] slapi_ldap_bind - Error: could not send startTLS request: error -1 (Can't contact LDAP server) errno 0 (Success)
    [11/Dec/2013:10:18:22 +0000] NSMMReplicationPlugin - agmt="cn=XXXXXXXXX" (XXXXXXXXX:389): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) ((null))
    [11/Dec/2013:10:18:22 +0000] set_krb5_creds - Could not get initial credentials for principal [ldap/XXXXXXXX] in keytab [FILE:/etc/dirsrv/ds.keytab]: -1765328324 (Generic error (see e-text))
    [11/Dec/2013:10:18:22 +0000] slapd_ldap_sasl_interactive_bind - Error: could not perform interactive bind for id [] mech [GSSAPI]: LDAP error -1 (Can't contact LDAP server) ((null)) errno 107 (Transport endpoint is not connected)
    [11/Dec/2013:10:18:22 +0000] slapi_ldap_bind - Error: could not perform interactive bind for id [] mech[GSSAPI]: error -1 (Can't contact LDAP server)


Version-Release number of selected component (if applicable):
ipa-admintools-3.0.0-37.el6.x86_64
ipa-client-3.0.0-37.el6.x86_64
ipa-pki-ca-theme-9.0.3-7.el6.noarch
ipa-pki-common-theme-9.0.3-7.el6.noarch
ipa-python-3.0.0-37.el6.x86_64
ipa-server-3.0.0-37.el6.x86_64
ipa-server-selinux-3.0.0-37.el6.x86_64

How reproducible:

It's reproduced in customer environment.

Steps to Reproduce:
Customer performed IPA farm upgrade from RHEL6.4 to RHEL6.5

Actual results:

ACL plugin in dirsrv requires that the entry used as a target must exist before ACL could be made.
It looks like cn=trusts,$SUFFIX is missing so some ordering on upgrade was wrong. That's mean cn=trusts,$SUFFIX does not exist and dirsrv denies ACI construct that targets non-existing DN.

Because there is replication error, entry is not correct, so KDC does interpret wrongly what decisions to take when issuing a Kerberos ticket.

Expected results:

Seamless IPA upgrade from previous RHEL6 to RHEL6.5

Additional info:

Comment 1 Alexander Bokovoy 2013-12-20 15:39:41 UTC
Out of changes between RHEL6.4 and RHEL6.5 one has changed how updates are applied -- they are now applied in batches. We need to look deeper at how the changes got arranged with ACI entries being applied before the container cn=trusts,$SUFFIX was created.

Comment 3 Martin Kosek 2014-01-02 13:12:43 UTC
I am wondering, is the reported error really that severe? Given the ACI error message "ACL will not be considered for evaluation because of syntax errors." suggests that this ACI would be just ignore until the cn=trusts object is added.

It seems to me that there are 2 distinct issues:

(In reply to Konstantin Lepikhov from comment #0)
...
> Actual results:
> 
> ACL plugin in dirsrv requires that the entry used as a target must exist
> before ACL could be made.
> It looks like cn=trusts,$SUFFIX is missing so some ordering on upgrade was
> wrong. That's mean cn=trusts,$SUFFIX does not exist and dirsrv denies ACI
> construct that targets non-existing DN.

Potentially benign error message, caused by a wrong order of updates.

> 
> Because there is replication error, entry is not correct, so KDC does
> interpret wrongly what decisions to take when issuing a Kerberos ticket.

Another root cause, possibly due to certificate renewal issues mentioned in the case?

Comment 6 Rob Crittenden 2014-02-19 14:43:49 UTC
The renewal appears to be configured to happen on another master. This is the CA_WORKING state. The certs should be in a shared location in the IPA LDAP database, in cn=ca_renewal,cn=ipa,cn=etc,$SUFFIX. Of course if replication is broken then this data will never become available.

There are ways to work around this but the best way is to get the LDAP database back into sync. Adding cc to Mark to evaluate the RUV errors.

This does not appear to be an issue with certmonger.

Comment 7 Rob Crittenden 2014-02-19 15:14:41 UTC
A workaround to at least get the services somewhat working again might be something like:

1. Go to a master with a renewed ipaCert

2. # certutil -L -d /etc/httpd/alias -n ipaCert -a > /tmp/agent.crt

3. copy that file to the non-working master

4. # certutil -A -d /etc/httpd/alias -n ipaCert -t u,u,u -a -i /tmp/agent.crt

5. Go back in time to when the server certs are still valid.

6. # service httpd restart

7. # service certmonger restart

8. Reset time.

That should renew the Apache and 389-ds server certificates. The ipaCert will probably still show as CA_WORKING but we manually renewed it for now.

Comment 8 mreynolds 2014-02-19 15:50:29 UTC
(In reply to Rob Crittenden from comment #6)
> The renewal appears to be configured to happen on another master. This is
> the CA_WORKING state. The certs should be in a shared location in the IPA
> LDAP database, in cn=ca_renewal,cn=ipa,cn=etc,$SUFFIX. Of course if
> replication is broken then this data will never become available.
> 
> There are ways to work around this but the best way is to get the LDAP
> database back into sync. Adding cc to Mark to evaluate the RUV errors.
> 

Errors:

[11/Dec/2013:10:18:22 +0000] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 6 ldap://xxxxxxxx:389} 4ef10caa001c00060000 4ef10cb8000300060000] which is present in RUV [database RUV]
[11/Dec/2013:10:18:22 +0000] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: for replica dc=ipa,dc=XXX,dc=YYY there were some differences between the changelog max RUV and the database RUV.  If there are obsolete elements in the database RUV, you should remove them using the CLEANALLRUV task.  If they are not obsolete, you should check their status to see why there are no changes from those servers in the changelog.

These errors should not be interfering with replication, and are just informative messages that there is a mismatch in the changelog and database RUV's - but it's not halting replication.

The real question is who is replica 6?  Is this an old replica, or an active one?  If it active, then why are there no changes in the changelog from this replica?  Maybe the replication agreement/status should be checked from replica 6 to this local server.

Comment 21 Martin Kosek 2014-06-30 08:09:52 UTC
As noted in a (private) comment, the customer case is closed and we have not found a bug in the FreeIPA code. The problems were caused by expired PKI CA certificates and then a variation of upgrade/downgrade/replication issues caused by the expired certificates.

I am closing this Bugzilla as well.