Bug 1877647
| Summary: | ERR - vlv_build_idl - Can't follow db cursor (err -30988) | ||
|---|---|---|---|
| Product: | Red Hat Directory Server | Reporter: | Marc Sauton <msauton> |
| Component: | 389-ds-base | Assignee: | thierry bordaz <tbordaz> |
| Status: | CLOSED ERRATA | QA Contact: | RHDS QE <ds-qe-bugs> |
| Severity: | high | Docs Contact: | Marc Muehlfeld <mmuehlfe> |
| Priority: | high | ||
| Version: | 11.5 | CC: | afarley, aheverle, bsmejkal, jachapma, mreynolds, spichugi, tbordaz, tmihinto, vanmeeuwen+fedora, vvanhaft |
| Target Milestone: | DS11.5 | Keywords: | Triaged |
| Target Release: | dirsrv-11.6 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | sync-to-jira | ||
| Fixed In Version: | redhat-ds-11-8070020220729181859-54d36466 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-14 08:53:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Please have an initial investigation. Expectations - acknowledge if it is a valid bug - impact on customer, frequency, dangerous - complexity
- The RC of this bug is an improper computation of the #record in the vlv index
During a vlv search, the #records is computed in vlv_find_index_by_filter_txn/vlvIndex_get_indexlength
The #records is then used in vlv_build_idl to iterate over the index.
If the index is updated (removal of records) between the time the #record is computed
and the time the index is walked through, then vlv_build_idl can try to access a record
that does no longer exist
- A reproducer testcase
Create a vlv index (like cn=vlvIdx,cn=vlvSrch in suites/vlv/regression_test.py)
Provision 10K users
Then run in a search bench [1] and a deletion bench (to shrink the index) [2]
It logs messages like
[08/Apr/2022:13:44:47.184996442 +0200] - ERR - vlv_build_idl - Can't follow db cursor (err -12797)
[08/Apr/2022:13:44:47.185800492 +0200] - ERR - vlv_find_index_by_filter_txn - vlv find index: err 1
Err return code in vlv_build_idl differs from CU on master
- The error does not reflect an index corruption
The index is valid, just because of the dynamic of the server #record can
fluctuates
- This bug triggers
- failure of the VLV search that hits this mismatch
when computed #records > real #records
- VLV search may return improper result set
when computed #records < real #record
- The error logs are false Error and should not be logged
In Conclusion:
This bug impacts VLV SRCH that fail during decrease of VLV values
or report improper result set during increase of VLV values
The logged message is a false alarming message.
I see no reason to reindex when such messages are logged
Next:
- working on a fix
[1]
#!/bin/bash
while :
do
ldapsearch -LLL -D "cn=directory manager" -w password -b "dc=example,dc=com" -E vlv='0/5/100/0' -z 4 -E sss=cn '(|(objectclass=*)(objectclass=ldapsubentry))' dn cn sn ou &
ldapsearch -LLL -D "cn=directory manager" -w password -b "dc=example,dc=com" -E vlv='0/5/100/0' -z 4 -E sss=cn '(|(objectclass=*)(objectclass=ldapsubentry))' dn cn sn ou &
..
ldapsearch -LLL -D "cn=directory manager" -w password -b "dc=example,dc=com" -E vlv='0/5/100/0' -z 4 -E sss=cn '(|(objectclass=*)(objectclass=ldapsubentry))' dn cn sn ou &
ldapsearch -LLL -D "cn=directory manager" -w password -b "dc=example,dc=com" -E vlv='0/5/100/0' -z 4 -E sss=cn '(|(objectclass=*)(objectclass=ldapsubentry))' dn cn sn ou
done
[2]
For CPT=0..10K
ldapdelete -D "cn=directory manager" -w password "uid=user${CPT},ou=people,dc=example,dc=com"
Fix pushed upstream => POST As per comment #c14, marking as VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (redhat-ds:11 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7929 |
Description of problem: we have those events: [14/Aug/2020:22:08:06.331297223 -0500] - ERR - vlv_build_idl - Can't follow db cursor (err -30988) [14/Aug/2020:22:08:06.331938667 -0500] - ERR - vlv_find_index_by_filter_txn - vlv find index: err 1 matching some failed ldapsearches stack trace and core file from the attachments called sf..loop.263.2020-09-03-113401.507312375.core.bin.29771 sf..loop.263.2020-09-03-113401.507312375.stacktrace.txt have a thread of interest: thread 27 #2 0x00007fa5b6aebcb8 in vslapd_log_error (fp=0x564239570c90, sev_level=sev_level@entry=22, subsystem=subsystem@entry=0x7fa5a9e0cfad "vlv_build_idl", fmt=fmt@entry=0x7fa5a9e0cb68 "Can't follow db cursor (err %d)\n", ap=ap@entry=0x7fa48b67a1d0, locked=locked@entry=1) at ldap/servers/slapd/log.c:2445 the overall information is c_connid = 3473104 o_opid = 156 c_dn = cn=unixproxy,cn=config c_ipaddr = edited ldapsearch base = ou=people,dc=edited,dc=edited database = userroot / cn=config,cn=userRoot,cn=ldbm database,cn=plugins,cn=config ldapsearch filter = "(&(uid=edited)(objectClass=posixAccount)(&(uidNumber=*)(!(uidNumber=0))))" ldapsearch attributes = objectClass uid userPassword uidNumber gidNumber gecos homeDirectory loginShell krbprincipalname cn memberOf modifyTimestamp modifyTimestamp shadowLastChange shadowMin shadowMax shadowWarning shadowI"... userRoot/vlv#vlvpeopleusridsort.db uidNumber the matching access log looks like a SSSD client: [03/Sep/2020:11:34:01.440136237 -0500] conn=3473104 op=156 SRCH base=\"ou=people,dc=edited,dc=edited\" scope=2 filter=\"(&(uid=edited)(objectClass=posixAccount)(&(uidNumber=*)(!(uidNumber=0))))\" attrs=\"objectClass uid userPassword uidNumber gidNumber gecos homeDirectory loginShell krbprincipalname cn memberOf modifyTimestamp modifyTimestamp shadowLastChange shadowMin shadowMax shadowWarning shadowInactive shadowExpire shadowFlag krblastpwdchange krbpasswordexpiration pwdattribute authorizedService accountexpires useraccountcontrol nsAccountLock host logindisabled loginexpirationtime loginallowedtimemap sshPublicKey\" this is a ldapsearch, but I do not seem to see the LDAP controls requesting a VLV and server side sort of uidnumber to end up in the vlvpeopleusridsort.db , why the search ends up in the frame #14 0x00007fa5a9de3082 in ldbm_back_search (pb=0x564244d5e240) at ldap/servers/slapd/back-ldbm/ldbm_search.c:619 rc = <optimized out> expire_time = {tv_sec = 0, tv_nsec = 0} vlv_response_control = {targetPosition = 964808128, contentCount = 22082, result = 0} the content count seem reasonable at 22082, but the VLV target position seem out of bound to me, this does not look correct. there is some significant search traffic on the system, may be some memory corruption? I need a review of the stack trace Version-Release number of selected component (if applicable): RHEL-7.8 389-ds-base-1.3.10.1-14.el7_8.x86_64 RHDS-10.5 How reproducible: N/A Steps to Reproduce: 1. event hard to catch in customer's environment, needed hundreds of core files 2. 3. Actual results: failing ldapsearch with error 1 Can't follow db cursor (err 1) Expected results: yes Additional info: frame 14 p *operation $32 = {o_ber = 0x564280af4500, o_msgid = 157, o_tag = 99, o_hr_time_rel = {tv_sec = 7136702, tv_nsec = 862461888}, o_hr_time_utc = {tv_sec = 1599150841, tv_nsec = 439942088}, o_isroot = 0, o_sdn = {flag = 10 '\n', udn = 0x56426aa46ec0 "cn=edited,cn=config", dn = 0x5642dc654aa0 "cn=edited,cn=config", ndn = 0x0, ndn_len = 22}, o_authtype = 0x5642dec8cee0 "simple", o_ssf = 256, o_opid = 156, o_connid = 3473104, o_handler_data = 0x0, o_result_handler = 0x0, o_search_entry_handler = 0x0, o_search_referral_handler = 0x0, o_csngen_handler = 0x0, o_replica_attr_handler = 0x0, o_next = 0x564258e56960, o_status = 0, o_searchattrs = 0x56427addde60, o_flags = 960, o_extension = 0x5642bf682660, o_target_spec = 0x56427562a600, o_target_entry = 0x5642c6954ee0, o_target_entry_id = 3, o_abandoned_op = 0, o_params = {operation_type = 4, target_address = {udn = 0x564264b1b740 "ou=people,dc=edited,dc=edited", uniqueid = 0x0, sdn = 0x56425ecb1ad0}, csn = 0x0, request_controls = 0x0, p = {p_add = {target_entry = 0x2, parentuniqueid = 0xe10ffffffff <Address 0xe10ffffffff out of bounds>}, p_bind = {bind_method = 2, bind_creds = 0xe10ffffffff, bind_saslmechanism = 0x5642dd0c7080 "", bind_ret_saslcreds = 0x5642588685b0}, p_compare = {compare_ava = {ava_type = 0x2 <Address 0x2 out of bounds>, ava_value = {bv_len = 15466177232895, bv_val = 0x5642dd0c7080 ""}, ava_private = 0x5642588685b0}}, p_modify = {modify_mods = 0x2}, p_modrdn = {modrdn_newrdn = 0x2 <Address 0x2 out of bounds>, modrdn_deloldrdn = -1, modrdn_newsuperior_address = {udn = 0x5642dd0c7080 "", uniqueid = 0x5642588685b0 "(&(uid=s_pmon)(objectClass=posixAccount)(&(uidNumber=*)(!(uidNumber=0))))", sdn = 0x5642456ccfc0}, modrdn_mods = 0x100000000}, p_search = {search_scope = 2, search_deref = 0, search_sizelimit = -1, search_timelimit = 3600, search_filter = 0x5642dd0c7080, search_strfilter = 0x5642588685b0 "(&(uid=edited)(objectClass=posixAccount)(&(uidNumber=*)(!(uidNumber=0))))", search_attrs = 0x5642456ccfc0, search_attrsonly = 0, search_is_and = 1, search_gerattrs = 0x0}, p_abandon = {abandon_targetmsgid = 2}, p_extended = {exop_oid = 0x2 <Address 0x2 out of bounds>, exop_value = 0xe10ffffffff}}}, o_results = {operation_type = 0, opreturn = 0, result_controls = 0x0, result_code = 0, result_text = 0x0, result_matched = 0x0, r = {r_bind = {bind_ret_saslcreds = 0x564254f2fb80}, r_search = { search_result_set = 0x564254f2fb80, search_result_entry = 0x0, opaque_backend_ptr = 0x0, nentries = 0, search_referrals = 0x0, estimate = 0}, r_extended = {exop_ret_oid = 0x564254f2fb80 "", exop_ret_value = 0x0}}}, o_pagedresults_sizelimit = -1, o_reverse_search_state = 0} (gdb)