Bug 1836428
| Summary: | Directory Server ds-replcheck RFE to add a timeout command-line arg/value to wait longer when connecting to a replica server | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Dave <dsimes> |
| Component: | 389-ds-base | Assignee: | mreynolds |
| Status: | CLOSED ERRATA | QA Contact: | RHDS QE <ds-qe-bugs> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 8.2 | CC: | bsmejkal, dsimes, pasik, spichugi, tbordaz, vashirov |
| Target Milestone: | rc | Keywords: | FutureFeature |
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | sync-to-jira | ||
| Fixed In Version: | 389-ds-1.4-8030020200605214214.618f7055 | Doc Type: | Enhancement |
| Doc Text: |
Feature: Add a time-out option to the ds-replcheck CLI tool, and set the default time-out to be unlimited.
Reason: Over a WAN the searches the tool issues can time-out and cause the tool to fail. The tool previously had a hardcoded time-out which was not configurable
Result: The tool performs its task without timing out.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-11-04 03:07:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 4
mreynolds
2020-05-19 16:00:34 UTC
Looks like there is a hardcoded timeout of 5 seconds, does that match up with what the customer is seeing? This is now fixed upstream and will be in RHEL 8.3, but for now the customer can simply edit the script and remove the timeouts:
@@ -968,12 +971,12 @@
replica = SimpleLDAPObject(ruri)
# Set timeouts
- master.set_option(ldap.OPT_NETWORK_TIMEOUT,5.0)
- master.set_option(ldap.OPT_TIMEOUT,5.0)
- replica.set_option(ldap.OPT_NETWORK_TIMEOUT,5.0)
- replica.set_option(ldap.OPT_TIMEOUT,5.0)
+ master.set_option(ldap.OPT_NETWORK_TIMEOUT, -1)
+ master.set_option(ldap.OPT_TIMEOUT, -1)
+ replica.set_option(ldap.OPT_NETWORK_TIMEOUT, -1)
+ replica.set_option(ldap.OPT_TIMEOUT, -1)
made the change to a copy for testing.. we would like to not get alerted just for timeouts
# diff /usr/bin/ds-replcheck ds-replcheck
870,873c870,873
< master.set_option(ldap.OPT_NETWORK_TIMEOUT,5.0)
< master.set_option(ldap.OPT_TIMEOUT,5.0)
< replica.set_option(ldap.OPT_NETWORK_TIMEOUT,5.0)
< replica.set_option(ldap.OPT_TIMEOUT,5.0)
---
> master.set_option(ldap.OPT_NETWORK_TIMEOUT,-1)
> master.set_option(ldap.OPT_TIMEOUT,-1)
> replica.set_option(ldap.OPT_NETWORK_TIMEOUT,-1)
> replica.set_option(ldap.OPT_TIMEOUT,-1)
however, this seemed to break something (on 7.7)
# rpm -qf /usr/bin/ds-replcheck
389-ds-base-1.3.9.1-13.el7_7.x86_64
./ds-replcheck -D "cn=directory manager" -y ~/.dsp -m ldap://$MASTER -r ldap://$REPLICA -b $DOMAIN --ignore memberof,idnssoaserial,krblastsuccessfulauth,krblastfailedauth,krbloginfailedcount'
Performing online report...
Connecting to servers...
Traceback (most recent call last):
File "./ds-replcheck", line 1394, in <module>
main()
File "./ds-replcheck", line 1386, in main
do_online_report(opts, OUTPUT_FILE)
File "./ds-replcheck", line 1107, in do_online_report
master, replica, opts = connect_to_replicas(opts)
File "./ds-replcheck", line 903, in connect_to_replicas
master.simple_bind_s(opts['binddn'], opts['bindpw'])
File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 208, in simple_bind_s
resp_type, resp_data, resp_msgid, resp_ctrls = self.result3(msgid,all=1,timeout=self.timeout)
File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 469, in result3
resp_ctrl_classes=resp_ctrl_classes
File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 476, in result4
ldap_result = self._ldap_call(self._l.result4,msgid,all,timeout,add_ctrls,add_intermediates,add_extop)
File "/usr/lib64/python2.7/site-packages/ldap/ldapobject.py", line 99, in _ldap_call
result = func(*args,**kwargs)
ValueError: option error
I suppose now that we know where it gets set, we can try increasing the 5.0 to something else
Hhmm, according to the docs https://www.python-ldap.org/en/latest/reference/ldap.html#ldap.OPT_NETWORK_TIMEOUT -1 or None should set it to infinite, but only in python3-ldap-3.x I think on RHEL 7 it's still using python-ldap-2.x, so the behavior must be different. Can you confirm the python-ldap version in RHEL 7? Anyway you can probably just remove the set_option() functions altogether as the default should be "no timeout". # rpm -qa | grep python | grep ldap python-ldap-2.4.15-2.el7.x86_64 will try removing those set_option lines altogether (In reply to mreynolds from comment #7) > This is now fixed upstream and will be in RHEL 8.3, but for now the customer > can simply edit the script and remove the timeouts: so, we had increased the timeouts (recently removed the set lines entirely), but, still having reporting issues, and now getting these: Cannot connect to 'ldap://awsw-p-aci-prdipa12:389/' soo.. not sure, maybe open RFE for retries, or? suggestions? Regards.. (In reply to Dave from comment #13) > (In reply to mreynolds from comment #7) > > This is now fixed upstream and will be in RHEL 8.3, but for now the customer > > can simply edit the script and remove the timeouts: > > so, we had increased the timeouts (recently removed the set lines entirely), > but, still having reporting issues, and now getting these: > > Cannot connect to 'ldap://awsw-p-aci-prdipa12:389/' Is it already returning paged results and it breaks in the middle of the search? Anything in DS access log showing these connections? Why is it being closed/aborted? > > soo.. not sure, maybe open RFE for retries, or? > I don't think a retry is useful because it would just start over from the beginning. In that case the they should write script around it to retry if it fails. The tool itself should not retry on its own. But the real issue here is why it's not connecting for failing to finish the operation. Are there network issues? DS errors? All the script is doing a essentially calling ldapsearch via python-ldap. I don't think we can harden the tool any more than it already is. When the tool reports it can not connect, can they run a ldapsearch to verify if it's actually accessible? Also how many entries are in the database? But first check the DS access log and find out what is happening with the ds-replcheck connection/search, and we'll go from there... (In reply to mreynolds from comment #14) > (In reply to Dave from comment #13) > > (In reply to mreynolds from comment #7) > But first check the DS access log and find out what is happening with the > ds-replcheck connection/search, and we'll go from there... we had a few of these to research, but, due to activity, the access logs are primarily from a subset of the current day's activity (w/debug levels/etc) I found some things in errors logs (which have a couple weeks), but the ones there (that we could find) were mostly pointing at restart periods This will actually take a bit more time to dig into.. we'd have to dig into the next future occurrence and attempt to get what's in the access log. BTW, we are running a command-line ldapsearch around the same time (kinda legacy, from before ds-replcheck was working so awesomely :) Regards.. still no connections issues showing =============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.6.8, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python3.6
cachedir: .pytest_cache
metadata: {'Python': '3.6.8', 'Platform': 'Linux-4.18.0-228.el8.x86_64-x86_64-with-redhat-8.3-Ootpa', 'Packages': {'pytest': '6.0.1', 'py': '1.9.0', 'pluggy': '0.13.1'}, 'Plugins': {'metadata': '1.10.0', 'html': '2.1.1', 'libfaketime': '0.1.2'}}
389-ds-base: 1.4.3.8-4.module+el8.3.0+7193+dfd1e8ad
nss: 3.44.0-15.el8
nspr: 4.21.0-2.el8_0
openldap: 2.4.46-15.el8
cyrus-sasl: 2.1.27-5.el8
FIPS: disabled
rootdir: /mnt/tests/rhds/tests/upstream/ds/dirsrvtests, configfile: pytest.ini
plugins: metadata-1.10.0, html-2.1.1, libfaketime-0.1.2
collected 10 items / 9 deselected / 1 selected
dirsrvtests/tests/suites/ds_tools/replcheck_test.py::test_dsreplcheck_timeout_connection_mechanisms PASSED [100%]
============================================================================= 1 passed, 9 deselected in 179.27s (0:02:59) ========================================================================================
Marking as VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (389-ds:1.4 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4695 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |