Bug 1925559

Summary: [RFE] Implement time logging for the LDAP queries and warning of high queries time
Product: Red Hat Enterprise Linux 9 Reporter: Antonio Romito <aromito>
Component: sssdAssignee: Sumit Bose <sbose>
Status: CLOSED ERRATA QA Contact: Steeve Goveas <sgoveas>
Severity: unspecified Docs Contact:
Priority: low    
Version: 9.0CC: aboscatt, atikhono, daniele, grajaiya, jhrozek, lslebodn, mzidek, pbrezina, pkettman, sbose, sgadekar, sgoveas, tscherf
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: sync-to-jira
Fixed In Version: sssd-2.7.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2136786 (view as bug list) Environment:
Last Closed: 2022-11-15 11:17:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2136786    

Description Antonio Romito 2021-02-05 14:46:14 UTC
Description of problem:

A customer had an issue for which the sssd queries were taking 2.5 seconds to be fulfilled, while the usual time in their environment was 0.1 seconds.
In the sssd logs they were not able to spot any problem on the LDAP backend.

The customer would like to introduce a timer for any LDAP query, which should be written in the debug logs (correct debug level to be identified).
They would like also to have these queries logged into the sssd with a warning level (lower than debug) once the time taken is really high (eg 5 seconds).
The time for which the logs should be written in warning level in place of debug level should be configurable.

This would help the debug of the LDAP backend during incidents (eg network or LDAP backend overload/service disruption) and to identify deviation from the regular query time from a given server.
bonus: this metric could be useful also for the PCP project.


Version-Release number of selected component (if applicable):


How reproducible:

Introduce a network latency or LDAP backend latency between sssd and LDAP backends.

Steps to Reproduce:

1. introduce a network latency or LDAP backend latency between sssd and LDAP backends
2. increment the logs to debug=9
3. check the sssd logs

Actual results:
No information over the time taken for a given query

Expected results:
Query time is written in the debug logs,
Long time queries written in the regular logs.

Additional info:

Comment 5 Pavel Březina 2021-02-09 12:08:07 UTC
I think this RFE makes sense, it would help to quickly spot performance issues that could be solved by configuration tuning (e.g. provide more specific search filter). But at this point I'm not sure how can we implement it given asynchronous nature of SSSD (e.g. storing large number of groups may block fetching the LDAP result for non-trivial amount of time).

Comment 9 Tomas Halman 2021-03-11 10:35:13 UTC
Having such informaton in the log might be interesting. But for now customer can probably use system-tap sssd probes to measure ldap queries.

Tomas Halman

Comment 10 Alexey Tikhonov 2021-08-31 18:36:01 UTC
At this stage this RFE should target RHEL9. Once/if implemented, backport to RHEL8 might be considered.

Comment 11 Alexey Tikhonov 2022-01-24 19:48:42 UTC
Upstream PR: https://github.com/SSSD/sssd/pull/5969

Comment 12 Alexey Tikhonov 2022-02-15 18:22:47 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5969

* `master`
    * 5494f7ffea268fd21aa81885fccb1f46729b33ee - sdap: add sdap_get_server_ip_str_safe()
    * 15f66efcc8052ea2a41c16f15668b6e757b0099b - sdap: make struct sdap_op private
    * b3646c6637169dbab04593bfe16b840278ff4fe1 - sdap: warn if request needs 80% of timeout
    * e2082c03abc6c019b39d4f068d3787649f087cd1 - ipa: add info string for statistics
    * a9b4ae62af77d3b52e623c272dd42021ea78d051 - ldap: add info string for statistics
    * 5dc34b7531e7bed387b10741807f67088f5bb273 - sdap_op: add strings member for extra statistical information
    * 23e64beee112075e963ec76273364625433e189a - sdap: split out function to get the server IP as string
    * 3063a73c0f28cdce45f5fa787051a7097d97937a - dp: adding log message with spend time
    * 3b7955306d7c8ff4b0ca23e1988f35620b0f3135 - sdap: record time needed for a sdap operation
    * 2fb5cbfa609eb09e0a915ea74b53e61e8bb4f299 - util: add time measurement helpers
    * 775150b5800611191da85b01c5113b446f997d1b - debug: add new dubug level SSSDBG_PERF_STAT

Comment 22 errata-xmlrpc 2022-11-15 11:17:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8325