Bug 1278567

Summary: SimplePagedResults -- abandon could happen between the abandon check and sending results
Product: Red Hat Enterprise Linux 7 Reporter: Noriko Hosoi <nhosoi>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: urgent Docs Contact: Petr Bokoc <pbokoc>
Priority: urgent    
Version: 7.0CC: jkurik, msauton, nhosoi, nkinder, pbokoc, rmeggins, spichugi
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.5.2-1.el7 Doc Type: Bug Fix
Doc Text:
Abandon requests for simple paged results searches no longer cause a crash Prior to this update, Directory Server could receive an abandon request for a simple paged results search after the abandon check was completed but before the results were fully sent. In this case, the abandon request was processed while the results were being sent, which caused Directory Server to crash. This update adds a lock which prevents abandon requests from being processed while the results are already being sent, and the crash no longer occurs.
Story Points: ---
Clone Of:
: 1278730 (view as bug list) Environment:
Last Closed: 2016-11-03 20:37:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1278730    

Description Noriko Hosoi 2015-11-05 19:53:49 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/48338

Bug description:
Problematic code in op_shared_search (opshared.c):
703  pr_search_result = pagedresults_get_search_result(pb->pb_conn, operation, pr_idx);
704  if (pr_search_result) {
705      if (pagedresults_is_abandoned_or_notavailable(pb->pb_conn, pr_idx)) {
706          pagedresults_unlock(pb->pb_conn, pr_idx);
707          /* Previous operation was abandoned and the simplepaged object is not in use. */
708          send_ldap_result(pb, 0, NULL, "Simple Paged Results Search abandoned", 0, NULL);
709          rc = LDAP_SUCCESS;
710          goto free_and_return;
711      } else {
712          slapi_pblock_set( pb, SLAPI_SEARCH_RESULT_SET, pr_search_result );
713          rc = send_results_ext (pb, 1, &pnentries, pagesize, &pr_stat);
714   
715          /* search result could be reset in the backend/dse */
716          slapi_pblock_get(pb, SLAPI_SEARCH_RESULT_SET, &sr);
717          pagedresults_set_search_result(pb->pb, operation, sr, 0, pr_idx);
718      }

The crash occurs in send_results_ext at line 713.
#5  0x00007f5e7711b6a7 in send_results_ext (pb=0x1402660, nentries=0x7f5e4bdef384, pagesize=1000,
    pr_stat=0x7f5e4bdef378, send_result=1) at ldap/servers/slapd/opshared.c:1706
#6  0x00007f5e7711c3c0 in op_shared_search (pb=0x1402660, send_result=1) at ldap/servers/slapd/opshared.c:713

where the pr_search_result_set in the simple paged results handle in the connection is already freed (pr_serearch_result_set == 0x0) as well as pr_flags has CONN_FLAG_PAGEDRESULTS_ABANDONE (== 512) bit as follows:
(gdb) p *pb->pb_conn->c_pagedresults->prl_list
$8 = {pr_current_be = 0xe8b460, pr_search_result_set = 0x0, pr_search_result_count = 1000,
  pr_search_result_set_size_estimate = 0, pr_sort_result_code = 0, pr_timelimit = 0, pr_flags = 640,
  pr_msgid = 5, pr_mutex = 0x7f5db4b12aa0}

but pr_search_result retrieved at line 703 has non-NULL value:
(gdb) p pr_search_result
$9 = (void *) 0x7f5db5e222a0

Also, pagedresults_is_abandoned_or_notavailable does not return CONN_FLAG_PAGEDRESULTS_ABANDONE was set.

That being said, the abandon happened between the line 705 and 713. 

Fix description:
The code (from line703 through 717) has to be protected.

Comment 1 Marc Sauton 2015-11-05 22:40:35 UTC
added "Internal Whiteboard" set to GSSApproved for rhel-7.2.z

Comment 3 Mike McCune 2016-03-28 23:13:32 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 5 Simon Pichugin 2016-06-29 14:35:38 UTC
Hi Noriko,

could you please help me with a reproducing the bug?

I've checked bz1247792 and tickets 48192 and 48338 but found nothing that can help to reproduce properly.

I can write a python script (it would be lower level then using the openldap tools), if you'd advise me the proper order of actions for a reproducing.

Comment 6 Noriko Hosoi 2016-06-29 22:10:24 UTC
Hi Simon,

Thanks for trying to verify this bug fix.

Actually, this bug for the DS on rhel-6 was the original.
https://bugzilla.redhat.com/show_bug.cgi?id=1247792

As we talked before, asynchronous Simple Paged Results searches from SSSD was freaky and the combination crashed the server quite often.  When something goes wrong in the DS, SSSD sends an abandon request on the previous operation which was one of the causes of the crash.  And #48338 fixed one case.  The tough part is it's highly depends upon the timing...

To verify 6.8 bug 1247792, Sankar ran the basic Simple Paged Results test cases.  As it did not show any regressions, we asked him to set VERIFIED.  We could do the same thing here, too.  I remember you completed porting the Simple Paged Results test cases to 389 python tests with more cases.  So, I think we are in better shape.  If it passes all of the cases, I believe we could say VERIFIED.

Thanks!

Comment 7 Simon Pichugin 2016-06-30 09:48:58 UTC
Thank you, Noriko!

Yes, I've ported all bash test suites and add some new ones. All of them passed.

Build tested:
389-ds-base-1.3.5.6-1.el7.x86_64

============================= test session starts =============================
platform linux2 -- Python 2.7.5, pytest-2.9.2, py-1.4.31, pluggy-0.3.1 -- /usr/bin/python
cachedir: suites/paged_results/.cache
rootdir: /export/tests/suites/paged_results, inifile:
plugins: html-1.8.1, cov-2.2.1
collected 24 items

suites/paged_results/paged_results_test.py::test_search_success[6-5] PASSED
suites/paged_results/paged_results_test.py::test_search_success[5-5] PASSED
suites/paged_results/paged_results_test.py::test_search_success[5-25] PASSED
suites/paged_results/paged_results_test.py::test_search_limits_fail[50-200-cn=config,cn=ldbm database,cn=plugins,cn=config-nsslapd-idlistscanlimit-100-UNWILLING_TO_PERFORM] PASSED
suites/paged_results/paged_results_test.py::test_search_limits_fail[5-15-cn=config-nsslapd-timelimit-20-UNAVAILABLE_CRITICAL_EXTENSION] PASSED
suites/paged_results/paged_results_test.py::test_search_limits_fail[21-50-cn=config-nsslapd-sizelimit-20-SIZELIMIT_EXCEEDED] PASSED
suites/paged_results/paged_results_test.py::test_search_limits_fail[21-50-cn=config-nsslapd-pagedsizelimit-5-SIZELIMIT_EXCEEDED] PASSED
suites/paged_results/paged_results_test.py::test_search_limits_fail[5-50-cn=config,cn=ldbm database,cn=plugins,cn=config-nsslapd-lookthroughlimit-20-ADMINLIMIT_EXCEEDED] PASSED
suites/paged_results/paged_results_test.py::test_search_sort_success PASSED
suites/paged_results/paged_results_test.py::test_search_abandon PASSED
suites/paged_results/paged_results_test.py::test_search_with_timelimit PASSED
suites/paged_results/paged_results_test.py::test_search_dns_ip_aci[dns = "localhost.localdomain"] PASSED
suites/paged_results/paged_results_test.py::test_search_dns_ip_aci[ip = "::1"] PASSED
suites/paged_results/paged_results_test.py::test_search_multiple_paging PASSED
suites/paged_results/paged_results_test.py::test_search_invalid_cookie[1000] PASSED
suites/paged_results/paged_results_test.py::test_search_invalid_cookie[-1] PASSED
suites/paged_results/paged_results_test.py::test_search_abandon_with_zero_size PASSED
suites/paged_results/paged_results_test.py::test_search_pagedsizelimit_success PASSED
suites/paged_results/paged_results_test.py::test_search_nspagedsizelimit[5-15-PASS] PASSED
suites/paged_results/paged_results_test.py::test_search_nspagedsizelimit[15-5-SIZELIMIT_EXCEEDED] PASSED
suites/paged_results/paged_results_test.py::test_search_paged_limits[conf_attr_values0-ADMINLIMIT_EXCEEDED] PASSED
suites/paged_results/paged_results_test.py::test_search_paged_limits[conf_attr_values1-PASS] PASSED
suites/paged_results/paged_results_test.py::test_search_paged_user_limits[conf_attr_values0-ADMINLIMIT_EXCEEDED] PASSED
suites/paged_results/paged_results_test.py::test_search_paged_user_limits[conf_attr_values1-PASS] PASSED

========================= 24 passed in 96.85 seconds =========================

Marking as verified.

Comment 9 errata-xmlrpc 2016-11-03 20:37:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2594.html