Bug 952533

Summary: Number of retrying attempts should be smaller by one with push replication
Product: Red Hat Enterprise Linux 6 Reporter: David Spurek <dspurek>
Component: openldapAssignee: Matus Honek <mhonek>
Status: CLOSED NOTABUG QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: low Docs Contact:
Priority: low    
Version: 6.5CC: ebenes, jsynacek, pkis
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1022576 (view as bug list) Environment:
Last Closed: 2015-06-10 07:11:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1022576    
Attachments:
Description Flags
reproducer none

Description David Spurek 2013-04-16 06:53:32 UTC
Description of problem:
Number of retrying attempts should be smaller by one with push replication between servers. Retry count is configured as retry="5 5 300 +".

I think that problem is somewhere in "300 +" part, because if I use for example 
retry="5 5 300 1", number of retrying attempts looks ok.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. run runtest.sh from attachement

  
Actual results:
Retrying entries in slapd log are increased by 1 now
Expected results:
Retrying entries in slapd log should be similiar as retry count in config file

Additional info:
Retry count is configured as retry="5 5 300 +". After 35s I see with `cat slapd_master.log | grep --binary-files=text 'do_syncrepl.*retrying'`:

do_syncrepl: rid=001 rc 52 retrying (4 retries left)
do_syncrepl: rid=001 rc 52 retrying (3 retries left)
do_syncrepl: rid=001 rc 52 retrying (2 retries left)
do_syncrepl: rid=001 rc 52 retrying (1 retries left)
do_syncrepl: rid=001 rc 52 retrying
do_syncrepl: rid=001 rc 52 retrying

I think that output should look like:

do_syncrepl: rid=001 rc 52 retrying (4 retries left)
do_syncrepl: rid=001 rc 52 retrying (3 retries left)
do_syncrepl: rid=001 rc 52 retrying (2 retries left)
do_syncrepl: rid=001 rc 52 retrying (1 retries left)
do_syncrepl: rid=001 rc 52 retrying

Comment 1 David Spurek 2013-04-16 06:54:48 UTC
Created attachment 736228 [details]
reproducer

Comment 3 RHEL Program Management 2013-10-13 23:41:55 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 7 Matus Honek 2015-06-09 13:11:59 UTC
I have investigated this issue using the latest slapd build and I believe this is not a bug, see my reasoning below.

With `olcLogLevel: sync`, I get:
- for `retry="2 5 16 +"`:
> 5576d3df slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3df do_syncrepl: rid=001 rc -1 retrying (4 retries left)
> 5576d3e1 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3e1 do_syncrepl: rid=001 rc -1 retrying (3 retries left)
> 5576d3e3 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3e3 do_syncrepl: rid=001 rc -1 retrying (2 retries left)
> 5576d3e5 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3e5 do_syncrepl: rid=001 rc -1 retrying (1 retries left)
> 5576d3e7 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3e7 do_syncrepl: rid=001 rc -1 retrying (fail==0)
> 5576d3e9 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3e9 do_syncrepl: rid=001 rc -1 retrying (fail==-1)
> 5576d3f9 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d3f9 do_syncrepl: rid=001 rc -1 retrying (fail==-1)
... and so on,

- for `retry="2 5 16 3"`:
> 5576d8b6 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8b6 do_syncrepl: rid=001 rc -1 retrying (4 retries left)
> 5576d8b8 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8b8 do_syncrepl: rid=001 rc -1 retrying (3 retries left)
> 5576d8ba slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8ba do_syncrepl: rid=001 rc -1 retrying (2 retries left)
> 5576d8bc slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8bc do_syncrepl: rid=001 rc -1 retrying (1 retries left)
> 5576d8be slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8be do_syncrepl: rid=001 rc -1 retrying (fail==0)
> 5576d8c0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8c0 do_syncrepl: rid=001 rc -1 retrying (2 retries left)
> 5576d8d0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8d0 do_syncrepl: rid=001 rc -1 retrying (1 retries left)
> 5576d8e0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8e0 do_syncrepl: rid=001 rc -1 retrying (fail==0)
> 5576d8f0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8f0 do_syncrepl: rid=001 rc -1 quitting (fail==-2)

Note 1: hex number on the far left is an unix timestamp (in seconds).
Note 2: the `fail==<number>` parts were added for debugging purposes by me and they add no significant difference to the code.

My reasoning: As one can see, the two cases above have basically the same pattern. If one counts the gaps of the same delays then one finds out it precisely corresponds to the `retry` settings for both cases. That means the behaviour is correct.

Presumption: Due to a lack of actual connection attempts' log-lines in reporter's log, one might believe the `do_syncrepl: rid=001 rc -1 retrying` statement is printed out on actual retrial, but it is not - it is printed out immediately after the connection attempt. This might lead to a confusion regarding count of retrials when one lacks the log-lines.
Actually, I believe the last line of so-said buggy output in reporters log, is actually a first attempt of the second group of the repeating (the `300 +` one).

Based on the above, I incline to close this bug as NOTABUG.

Adding needinfo from jsynacek to have him validate my reasoning.

Comment 8 Jan Synacek 2015-06-10 06:25:15 UTC
Your reasoning looks valid. Feel free to close this one.

Comment 9 Matus Honek 2015-06-10 07:11:13 UTC
Closing as NOTABUG. See comments #c7 and #c8 for explanation.

Comment 10 Patrik Kis 2015-07-23 08:56:16 UTC
Let me add my notes here. I'm not fully convinced that all is correct here. The main issue IMHO is in the number of sync attempts. They are simply increased by one. For example in the case below there are 9 attempts instead of expected 8:
- for `retry="2 5 16 3"`:
> 5576d8b6 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8b6 do_syncrepl: rid=001 rc -1 retrying (4 retries left)
> 5576d8b8 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8b8 do_syncrepl: rid=001 rc -1 retrying (3 retries left)
> 5576d8ba slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8ba do_syncrepl: rid=001 rc -1 retrying (2 retries left)
> 5576d8bc slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8bc do_syncrepl: rid=001 rc -1 retrying (1 retries left)
> 5576d8be slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8be do_syncrepl: rid=001 rc -1 retrying (fail==0)
> 5576d8c0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8c0 do_syncrepl: rid=001 rc -1 retrying (2 retries left)
> 5576d8d0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8d0 do_syncrepl: rid=001 rc -1 retrying (1 retries left)
> 5576d8e0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8e0 do_syncrepl: rid=001 rc -1 retrying (fail==0)
> 5576d8f0 slap_client_connect: URI=ldap://localhost:9011/ DN="cn=config,cn=slave" ldap_sasl_bind_s failed (-1)
> 5576d8f0 do_syncrepl: rid=001 rc -1 quitting (fail==-2)

The configured logic (2 5 16 3) suggest that there should be 5 attempts delayed by 2 seconds then 3 delayed by 16 seconds. Instead of that there are 6 attempts delayed by 2 seconds and then 3 delayed by 16 seconds.
I've tried similar configuration to check if there are real retry attempts (monitoring the traffic by tcpdump) and the logs corresponds with real traffic.

Nevertheless, it seems that this behaviour is there for some time already and people probably get used to it, so this can not be considered as regression. Also, this is kind of cosmetic issue with low priority, so I'm not going to re-open it.
But maybe it would be worth to consult the "problem" with upstream.