RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 720461 - Need TLS/SSL error messages in repl status and errors log
Summary: Need TLS/SSL error messages in repl status and errors log
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: 389-ds-base
Version: 6.2
Hardware: i386
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Rich Megginson
QA Contact:
URL:
Whiteboard:
Depends On: 691313
Blocks: 690318
TreeView+ depends on / blocked
 
Reported: 2011-07-11 17:03 UTC by Rich Megginson
Modified: 2011-12-06 17:55 UTC (History)
5 users (show)

Fixed In Version: 389-ds-base-1.2.9.11-1.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 691313
Environment:
Last Closed: 2011-12-06 17:55:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:1711 0 normal SHIPPED_LIVE 389-ds-base bug fix and enhancement update 2011-12-06 01:02:20 UTC

Description Rich Megginson 2011-07-11 17:03:22 UTC
+++ This bug was initially created as a clone of Bug #691313 +++

Description of problem:
My directory server have a permanent 'incremental update' failure witch stay throught after a 'send updates now' from the administration console.

When I set the replication log to see whats happening I get this:

[24/Mar/2011:17:31:49 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): State: stop_fatal_error -> stop_fatal_error
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): State: stop_fatal_error -> stop_normal_termination
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): State: stop_normal_termination -> stop_normal_termination
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): No linger to cancel on the connection
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): Disconnected from the consumer
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): No linger to cancel on the connection
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): Disconnected from the consumer
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): State: start -> ready_to_acquire_replica
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): Trying secure slapi_ldap_init_ext
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): binddn = cn=Replication Manager, cn=config,  passwd = {DES}xxxxxxxxxxxxxxxxxxx==
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): No linger to cancel on the connection
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): Beginning linger on the connection
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): State: ready_to_acquire_replica -> stop_fatal_error
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): Incremental update failed and requires administrator action
[24/Mar/2011:17:31:50 +0100] NSMMReplicationPlugin - agmt="cn=x32" (server:636): State: stop_fatal_error -> stop_fatal_error

The log is not very clear about the problem, I tried to look for the error in the code but there was many ways for that problem. Finally I try to do a ldapsearch between the two servers in the replication getting this error (from the supplier to the consumer):

'hostname does not match CN in peer certificate'

To solve the problem I have deleted the entry for the consumer server into the /etc/hosts file of the supplier server. (I don't know why but that worked)

Version-Release number of selected component (if applicable):
CentOS 5.5
389DS 1.2.5

How reproducible:
Actually I don't know. I have get this error for first time and all my servers have about the same configuration.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

--- Additional comment from rmeggins on 2011-03-28 09:29:47 EDT ---

I think the problem here is that the replication status and the errors log should contain the 'hostname does not match CN in peer certificate' message, or any other relevant message that could help diagnose connection problems.

--- Additional comment from mbarperoi on 2011-03-29 02:21:47 EDT ---

I think so too, because the 'stop_fatal_error' is very general to get the problem easy.

--- Additional comment from nkinder on 2011-07-08 14:09:47 EDT ---

With replication level logging, I see the following in the errors log on my system the first time the certificate hostname mismatch error is encountered:

---------------------------------------------------------------------------
[08/Jul/2011:11:03:21 -0700] NSMMReplicationPlugin - agmt="cn=meTolocalhost.localdomain1120" (localhost:2636): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) (TLS: hostname does not match CN in peer certificate)
---------------------------------------------------------------------------

After this first error, replication goes into backoff mode.  This message is not logged a second time.  This is a different state that "fatal stop", so the original scenario must be a bit different than my test case.  I'm not sure why my test doesn't exactly match up with the original reported case though.

--- Additional comment from nkinder on 2011-07-08 14:18:45 EDT ---

When we encounter an error during the bind stage in a replication session, we only log the error the first time we encounter it.  We do this to prevent filling the logs with the same error over and over.

From looking at the code, I believe that the certificate hostname error was likely logged on the reporters system when it first tried to start the replication session.  The message is then suppressed until we get a different return code (or the DS instance is restarted).

--- Additional comment from nkinder on 2011-07-08 16:08:01 EDT ---

Created attachment 511991 [details]
Patch

--- Additional comment from nkinder on 2011-07-08 16:18:27 EDT ---

Pushed to master.  Thanks to Noriko for her review!

Counting objects: 15, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (8/8), done.
Writing objects: 100% (8/8), 1.08 KiB, done.
Total 8 (delta 6), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
   354249c..a24d9cc  master -> master

Comment 3 Chandrasekar Kannan 2011-09-16 21:33:48 UTC
ds-replication is no longer a component of rhel. folding back to 389-ds-base.

Comment 5 Jenny Severance 2011-09-27 18:46:33 UTC
Please add steps to reproduce / verify this bug.  Thanks

Comment 6 Rich Megginson 2011-09-27 19:24:03 UTC
(In reply to comment #5)
> Please add steps to reproduce / verify this bug.  Thanks

see https://bugzilla.redhat.com/show_bug.cgi?id=691313#c8

Comment 7 Amita Sharma 2011-11-08 12:57:41 UTC
[08/Nov/2011:17:03:40 +051800] NSMMReplicationPlugin - agmt="cn=mmr2" (snmaptest:636): Replication bind with SIMPLE auth failed: LDAP error -1 (Can't contact LDAP server) (TLS: hostname does not match CN in peer certificate)

Hence marking as VERIFIED

Comment 8 errata-xmlrpc 2011-12-06 17:55:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1711.html


Note You need to log in before you can comment on or make changes to this bug.