952533 – Number of retrying attempts should be smaller by one with push replication

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 952533 - Number of retrying attempts should be smaller by one with push replication

Summary: Number of retrying attempts should be smaller by one with push replication

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	openldap
Sub Component:
Version:	6.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Matus Honek
QA Contact:	BaseOS QE Security Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1022576
TreeView+	depends on / blocked

Reported:	2013-04-16 06:53 UTC by David Spurek
Modified:	2015-07-23 08:56 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1022576 (view as bug list)
Environment:
Last Closed:	2015-06-10 07:11:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
reproducer (10.00 KB, application/x-compressed-tar) 2013-04-16 06:54 UTC, David Spurek	no flags	Details
View All

Description David Spurek 2013-04-16 06:53:32 UTC

Description of problem:
Number of retrying attempts should be smaller by one with push replication between servers. Retry count is configured as retry="5 5 300 +".

I think that problem is somewhere in "300 +" part, because if I use for example 
retry="5 5 300 1", number of retrying attempts looks ok.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. run runtest.sh from attachement

  
Actual results:
Retrying entries in slapd log are increased by 1 now
Expected results:
Retrying entries in slapd log should be similiar as retry count in config file

Additional info:
Retry count is configured as retry="5 5 300 +". After 35s I see with `cat slapd_master.log | grep --binary-files=text 'do_syncrepl.*retrying'`:

do_syncrepl: rid=001 rc 52 retrying (4 retries left)
do_syncrepl: rid=001 rc 52 retrying (3 retries left)
do_syncrepl: rid=001 rc 52 retrying (2 retries left)
do_syncrepl: rid=001 rc 52 retrying (1 retries left)
do_syncrepl: rid=001 rc 52 retrying
do_syncrepl: rid=001 rc 52 retrying

I think that output should look like:

do_syncrepl: rid=001 rc 52 retrying (4 retries left)
do_syncrepl: rid=001 rc 52 retrying (3 retries left)
do_syncrepl: rid=001 rc 52 retrying (2 retries left)
do_syncrepl: rid=001 rc 52 retrying (1 retries left)
do_syncrepl: rid=001 rc 52 retrying

Comment 1 David Spurek 2013-04-16 06:54:48 UTC

Created attachment 736228 [details]
reproducer

Comment 3 RHEL Program Management 2013-10-13 23:41:55 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 7 Matus Honek 2015-06-09 13:11:59 UTC

I have investigated this issue using the latest slapd build and I believe this is not a bug, see my reasoning below.

With `olcLogLevel: sync`, I get:
- for `retry="2 5 16 +"`:
> 5576d3df slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3df do_syncrepl: rid=001 rc -1 retrying (4 retries left)
> 5576d3e1 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3e1 do_syncrepl: rid=001 rc -1 retrying (3 retries left)
> 5576d3e3 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3e3 do_syncrepl: rid=001 rc -1 retrying (2 retries left)
> 5576d3e5 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3e5 do_syncrepl: rid=001 rc -1 retrying (1 retries left)
> 5576d3e7 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3e7 do_syncrepl: rid=001 rc -1 retrying (fail==0)
> 5576d3e9 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3e9 do_syncrepl: rid=001 rc -1 retrying (fail==-1)
> 5576d3f9 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3f9 do_syncrepl: rid=001 rc -1 retrying (fail==-1)
... and so on,

- for `retry="2 5 16 3"`:
> 5576d8b6 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8b6 do_syncrepl: rid=001 rc -1 retrying (4 retries left)
> 5576d8b8 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8b8 do_syncrepl: rid=001 rc -1 retrying (3 retries left)
> 5576d8ba slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8ba do_syncrepl: rid=001 rc -1 retrying (2 retries left)
> 5576d8bc slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8bc do_syncrepl: rid=001 rc -1 retrying (1 retries left)
> 5576d8be slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8be do_syncrepl: rid=001 rc -1 retrying (fail==0)
> 5576d8c0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8c0 do_syncrepl: rid=001 rc -1 retrying (2 retries left)
> 5576d8d0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8d0 do_syncrepl: rid=001 rc -1 retrying (1 retries left)
> 5576d8e0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8e0 do_syncrepl: rid=001 rc -1 retrying (fail==0)
> 5576d8f0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8f0 do_syncrepl: rid=001 rc -1 quitting (fail==-2)

Note 1: hex number on the far left is an unix timestamp (in seconds).
Note 2: the `fail==<number>` parts were added for debugging purposes by me and they add no significant difference to the code.

My reasoning: As one can see, the two cases above have basically the same pattern. If one counts the gaps of the same delays then one finds out it precisely corresponds to the `retry` settings for both cases. That means the behaviour is correct.

Presumption: Due to a lack of actual connection attempts' log-lines in reporter's log, one might believe the `do_syncrepl: rid=001 rc -1 retrying` statement is printed out on actual retrial, but it is not - it is printed out immediately after the connection attempt. This might lead to a confusion regarding count of retrials when one lacks the log-lines.
Actually, I believe the last line of so-said buggy output in reporters log, is actually a first attempt of the second group of the repeating (the `300 +` one).

Based on the above, I incline to close this bug as NOTABUG.

Adding needinfo from jsynacek to have him validate my reasoning.

Comment 8 Jan Synacek 2015-06-10 06:25:15 UTC

Your reasoning looks valid. Feel free to close this one.

Comment 9 Matus Honek 2015-06-10 07:11:13 UTC

Closing as NOTABUG. See comments #c7 and #c8 for explanation.

Comment 10 Patrik Kis 2015-07-23 08:56:16 UTC

Let me add my notes here. I'm not fully convinced that all is correct here. The main issue IMHO is in the number of sync attempts. They are simply increased by one. For example in the case below there are 9 attempts instead of expected 8:
- for `retry="2 5 16 3"`:
> 5576d8b6 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8b6 do_syncrepl: rid=001 rc -1 retrying (4 retries left)
> 5576d8b8 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8b8 do_syncrepl: rid=001 rc -1 retrying (3 retries left)
> 5576d8ba slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8ba do_syncrepl: rid=001 rc -1 retrying (2 retries left)
> 5576d8bc slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8bc do_syncrepl: rid=001 rc -1 retrying (1 retries left)
> 5576d8be slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8be do_syncrepl: rid=001 rc -1 retrying (fail==0)
> 5576d8c0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8c0 do_syncrepl: rid=001 rc -1 retrying (2 retries left)
> 5576d8d0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8d0 do_syncrepl: rid=001 rc -1 retrying (1 retries left)
> 5576d8e0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8e0 do_syncrepl: rid=001 rc -1 retrying (fail==0)
> 5576d8f0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8f0 do_syncrepl: rid=001 rc -1 quitting (fail==-2)

The configured logic (2 5 16 3) suggest that there should be 5 attempts delayed by 2 seconds then 3 delayed by 16 seconds. Instead of that there are 6 attempts delayed by 2 seconds and then 3 delayed by 16 seconds.
I've tried similar configuration to check if there are real retry attempts (monitoring the traffic by tcpdump) and the logs corresponds with real traffic.

Nevertheless, it seems that this behaviour is there for some time already and people probably get used to it, so this can not be considered as regression. Also, this is kind of cosmetic issue with low priority, so I'm not going to re-open it.
But maybe it would be worth to consult the "problem" with upstream.

Note You need to log in before you can comment on or make changes to this bug.