RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1836428 - Directory Server ds-replcheck RFE to add a timeout command-line arg/value to wait longer when connecting to a replica server
Summary: Directory Server ds-replcheck RFE to add a timeout command-line arg/value to ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: 389-ds-base
Version: 8.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: mreynolds
QA Contact: RHDS QE
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-15 21:05 UTC by Dave
Modified: 2023-09-14 06:00 UTC (History)
6 users (show)

Fixed In Version: 389-ds-1.4-8030020200605214214.618f7055
Doc Type: Enhancement
Doc Text:
Feature: Add a time-out option to the ds-replcheck CLI tool, and set the default time-out to be unlimited. Reason: Over a WAN the searches the tool issues can time-out and cause the tool to fail. The tool previously had a hardcoded time-out which was not configurable Result: The tool performs its task without timing out.
Clone Of:
Environment:
Last Closed: 2020-11-04 03:07:52 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 4155 0 None closed Directory Server ds-replcheck RFE to add a timeout command-line arg/value to wait longer when connecting to a replica se... 2020-11-10 21:47:38 UTC
Red Hat Product Errata RHEA-2020:4695 0 None None None 2020-11-04 03:08:12 UTC

Comment 4 mreynolds 2020-05-19 16:00:34 UTC
Moving to RHEL 8 since we are no longer doing enhancements to RHEL 7

Comment 5 mreynolds 2020-05-19 16:07:38 UTC
Looks like there is a hardcoded timeout of 5 seconds, does that match up with what the customer is seeing?

Comment 6 mreynolds 2020-05-19 17:59:01 UTC
https://pagure.io/389-ds-base/issue/51102

Comment 7 mreynolds 2020-05-20 17:52:38 UTC
This is now fixed upstream and will be in RHEL 8.3, but for now the customer can simply edit the script and remove the timeouts:

 	

@@ -968,12 +971,12 @@ 
      replica = SimpleLDAPObject(ruri)

      # Set timeouts
-     master.set_option(ldap.OPT_NETWORK_TIMEOUT,5.0)
-     master.set_option(ldap.OPT_TIMEOUT,5.0)
-     replica.set_option(ldap.OPT_NETWORK_TIMEOUT,5.0)
-     replica.set_option(ldap.OPT_TIMEOUT,5.0)
+     master.set_option(ldap.OPT_NETWORK_TIMEOUT, -1)
+     master.set_option(ldap.OPT_TIMEOUT, -1)
+     replica.set_option(ldap.OPT_NETWORK_TIMEOUT, -1)
+     replica.set_option(ldap.OPT_TIMEOUT, -1)

Comment 8 Dave 2020-05-29 16:38:45 UTC
made the change to a copy for testing.. we would like to not get alerted just for timeouts

# diff /usr/bin/ds-replcheck ds-replcheck
870,873c870,873
<     master.set_option(ldap.OPT_NETWORK_TIMEOUT,5.0)
<     master.set_option(ldap.OPT_TIMEOUT,5.0)
<     replica.set_option(ldap.OPT_NETWORK_TIMEOUT,5.0)
<     replica.set_option(ldap.OPT_TIMEOUT,5.0)
---
>     master.set_option(ldap.OPT_NETWORK_TIMEOUT,-1)
>     master.set_option(ldap.OPT_TIMEOUT,-1)
>     replica.set_option(ldap.OPT_NETWORK_TIMEOUT,-1)
>     replica.set_option(ldap.OPT_TIMEOUT,-1)

however, this seemed to break something (on 7.7)
# rpm -qf /usr/bin/ds-replcheck
389-ds-base-1.3.9.1-13.el7_7.x86_64

./ds-replcheck -D "cn=directory manager" -y ~/.dsp -m ldap://$MASTER -r ldap://$REPLICA -b $DOMAIN --ignore memberof,idnssoaserial,krblastsuccessfulauth,krblastfailedauth,krbloginfailedcount'

Performing online report...
Connecting to servers...
Traceback (most recent call last):
  File "./ds-replcheck", line 1394, in <module>
    main()
  File "./ds-replcheck", line 1386, in main
    do_online_report(opts, OUTPUT_FILE)
  File "./ds-replcheck", line 1107, in do_online_report
    master, replica, opts = connect_to_replicas(opts)
  File "./ds-replcheck", line 903, in connect_to_replicas
    master.simple_bind_s(opts['binddn'], opts['bindpw'])
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 208, in simple_bind_s
    resp_type, resp_data, resp_msgid, resp_ctrls = self.result3(msgid,all=1,timeout=self.timeout)
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 469, in result3
    resp_ctrl_classes=resp_ctrl_classes
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 476, in result4
    ldap_result = self._ldap_call(self._l.result4,msgid,all,timeout,add_ctrls,add_intermediates,add_extop)
  File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 99, in _ldap_call
    result = func(*args,**kwargs)
ValueError: option error


I suppose now that we know where it gets set, we can try increasing the 5.0 to something else

Comment 9 mreynolds 2020-05-29 16:52:07 UTC
Hhmm, according to the docs https://www.python-ldap.org/en/latest/reference/ldap.html#ldap.OPT_NETWORK_TIMEOUT -1 or None should set it to infinite, but only in python3-ldap-3.x   I think on RHEL 7 it's still using python-ldap-2.x, so the behavior must be different.  Can you confirm the python-ldap version in RHEL 7?  Anyway you can probably just remove the set_option() functions altogether as the default should be "no timeout".

Comment 10 Dave 2020-06-01 15:55:08 UTC
# rpm -qa | grep python | grep ldap
python-ldap-2.4.15-2.el7.x86_64

will try removing those set_option lines altogether

Comment 13 Dave 2020-06-08 14:38:30 UTC
(In reply to mreynolds from comment #7)
> This is now fixed upstream and will be in RHEL 8.3, but for now the customer
> can simply edit the script and remove the timeouts:

so, we had increased the timeouts (recently removed the set lines entirely), but, still having reporting issues, and now getting these:

Cannot connect to 'ldap://awsw-p-aci-prdipa12:389/'

soo.. not sure, maybe open RFE for retries, or? 

suggestions?

Regards..

Comment 14 mreynolds 2020-06-08 14:52:39 UTC
(In reply to Dave from comment #13)
> (In reply to mreynolds from comment #7)
> > This is now fixed upstream and will be in RHEL 8.3, but for now the customer
> > can simply edit the script and remove the timeouts:
> 
> so, we had increased the timeouts (recently removed the set lines entirely),
> but, still having reporting issues, and now getting these:
> 
> Cannot connect to 'ldap://awsw-p-aci-prdipa12:389/'

Is it already returning paged results and it breaks in the middle of the search?

Anything in DS access log showing these connections?  Why is it being closed/aborted?


> 
> soo.. not sure, maybe open RFE for retries, or? 
> 

I don't think a retry is useful because it would just start over from the beginning.  In that case the they should write script around it to retry if it fails.  The tool itself should not retry on its own.  But the real issue here is why it's not connecting for failing to finish the operation.  Are there network issues?  DS errors?  All the script is doing a essentially calling ldapsearch via python-ldap.  I don't think we can harden the tool any more than it already is.  When the tool reports it can not connect, can they run a ldapsearch to verify if it's actually accessible?  Also how many entries are in the database? 

But first check the DS access log and find out what is happening with the ds-replcheck connection/search, and we'll go from there...

Comment 15 Dave 2020-06-09 16:17:29 UTC
(In reply to mreynolds from comment #14)
> (In reply to Dave from comment #13)
> > (In reply to mreynolds from comment #7)

> But first check the DS access log and find out what is happening with the
> ds-replcheck connection/search, and we'll go from there...

we had a few of these to research, but, due to activity, the access logs are primarily from a subset of the current day's activity (w/debug levels/etc)
I found some things in errors logs (which have a couple weeks), but the ones there (that we could find) were mostly pointing at restart periods
This will actually take a bit more time to dig into.. we'd have to dig into the next future occurrence and attempt to get what's in the access log.
BTW, we are running a command-line ldapsearch around the same time (kinda legacy, from before ds-replcheck was working so awesomely :)

Regards..

Comment 16 Dave 2020-07-02 15:13:05 UTC
still no connections issues showing

Comment 17 bsmejkal 2020-08-05 14:41:14 UTC
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.6.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python3.6
cachedir: .pytest_cache
metadata: {'Python': '3.6.8', 'Platform': 'Linux-4.18.0-228.el8.x86_64-x86_64-with-redhat-8.3-Ootpa', 'Packages': {'pytest': '6.0.1', 'py': '1.9.0', 'pluggy': '0.13.1'}, 'Plugins': {'metadata': '1.10.0', 'html': '2.1.1', 'libfaketime': '0.1.2'}}
389-ds-base: 1.4.3.8-4.module+el8.3.0+7193+dfd1e8ad
nss: 3.44.0-15.el8
nspr: 4.21.0-2.el8_0
openldap: 2.4.46-15.el8
cyrus-sasl: 2.1.27-5.el8
FIPS: disabled
rootdir: /mnt/tests/rhds/tests/upstream/ds/dirsrvtests, configfile: pytest.ini
plugins: metadata-1.10.0, html-2.1.1, libfaketime-0.1.2
collected 10 items / 9 deselected / 1 selected                                                                                                                                                                    

dirsrvtests/tests/suites/ds_tools/replcheck_test.py::test_dsreplcheck_timeout_connection_mechanisms PASSED                                                                                                  [100%]

============================================================================= 1 passed, 9 deselected in 179.27s (0:02:59) ========================================================================================

Marking as VERIFIED.

Comment 20 errata-xmlrpc 2020-11-04 03:07:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds:1.4 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4695

Comment 21 Red Hat Bugzilla 2023-09-14 06:00:19 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.