Bug 1995608

Summary: sssd logging verbose by default results in huge log files
Product: [Fedora] Fedora Reporter: Brian J. Murrell <brian>
Component: sssdAssignee: Alexey Tikhonov <atikhono>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 34CC: abokovoy, atikhono, jhrozek, lslebodn, luk.claes, mzidek, pbrezina, sbose, ssorce, sssd-maintainers, tomek
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: sync-to-jira review
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-14 16:51:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Example log spew none

Description Brian J. Murrell 2021-08-19 13:32:20 UTC
Created attachment 1815664 [details]
Example log spew

Description of problem:
My /var/log/sssd/sssd_$domain.log grows huge.

Version-Release number of selected component (if applicable):
sssd-2.5.2-2.fc34.x86_64

How reproducible:
100%

Steps to Reproduce:
Should pretty obvious.

Actual results:
Log grows large with useless debug info.

Expected results:
Log should not grow as quick and large as it does.

Additional info:
Every minute a spew similar to the attachment is added to the log.

I can squelch this log spam with:

# sssctl debug-level 0

Neither the words debug or log even appear in my configuration:

# grep -ri -e debug -e log /etc/sssd/
[nothing]

so this is appearing to be a rather verbose default setting.

Comment 1 Alexey Tikhonov 2021-08-19 14:03:06 UTC
Hi,

thanks for reporting this.

Did you cut off the lines like:
```
        "********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:\n";
        "********************** BACKTRACE DUMP ENDS HERE *********************************\n\n";
```
from the log or are those missing?

Comment 2 Brian J. Murrell 2021-08-19 14:09:10 UTC
Ah, yes.  Those lines evaded the grep used to try to pull out just one example of the frequently repeated pattern.  So yes, those lines do appear in the log, just not the attached example.  Apologies.

Comment 3 Alexey Tikhonov 2021-08-19 14:20:28 UTC
(In reply to Brian J. Murrell from comment #2)
> Ah, yes.  Those lines evaded the grep used to try to pull out just one
> example of the frequently repeated pattern.  So yes, those lines do appear
> in the log, just not the attached example.

Ok, thanks for the confirmation.

As an immediate remediation you can reduce log level in `sssd.conf::[$domain]` to 0

As a mid term solution I will check if it makes sense to change debug level of this message.

A proper solution (already discussed internally) is to avoid printing duplicating backtraces.

Comment 4 Brian J. Murrell 2021-08-19 14:25:46 UTC
So this confirms that the default log/debug level is not 0?  What is it if I may ask?

Is the situation that is causing the messages to be added to my log at the default log level something I should address?  I.e. is the problem actually something other than an inappropriate log level and I could make the messages stop by fixing something?

Comment 5 Alexey Tikhonov 2021-08-19 14:44:51 UTC
(In reply to Brian J. Murrell from comment #4)
> So this confirms that the default log/debug level is not 0?  What is it if I
> may ask?

It's 2: https://github.com/SSSD/sssd/blob/26654d3e5f5882dd1681116cb49228d108351d48/src/util/debug.h#L124

But every error message with level up to 2 triggers a dump of a full log (all levels) to a certain depth: https://github.com/SSSD/sssd/pull/5585
This backtrace can be disabled via `debug_backtrace_enabled = false` (i.e. you can keep default general log level, disable backtrace and then you will get only single error line "(0x0020): ... Cannot open" without backtraces)


> Is the situation that is causing the messages to be added to my log at the
> default log level something I should address?  I.e. is the problem actually
> something other than an inappropriate log level and I could make the
> messages stop by fixing something?

That's what I mean in "As a mid term solution I will check if it makes sense to change debug level of this message."

Comment 6 Brian J. Murrell 2021-08-19 14:51:28 UTC
Is the entire spew of messages in the attachment all a result of the:

   *  (2021-08-19  9:29:01): [be[example.com]] [remove_tree_with_ctx] (0x0020): [RID#609] Cannot open /var/lib/sss/deskprofile/example.com/brian: [2]: No such file or directory

message?  That missing directory seems to be related to Fleet Commander.  Is that correct?  If so, given that that is a completely optional management tool, it seems that this missing directory really ought not cause such a spew at the default logging level, which I suppose refers to your mid-term solution.

Comment 7 Alexey Tikhonov 2021-08-19 15:06:49 UTC
(In reply to Brian J. Murrell from comment #6)
> Is the entire spew of messages in the attachment all a result of the:
> 
>    *  (2021-08-19  9:29:01): [be[example.com]] [remove_tree_with_ctx]
> (0x0020): [RID#609] Cannot open /var/lib/sss/deskprofile/example.com/brian:
> [2]: No such file or directory
> 
> message?

Right, 0x0020 message itself is logged because default log levels allows for it and then everything between "PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE" and "BACKTRACE DUMP ENDS HERE" is logged because `debug_backtrace_enabled` allows for it.

This is very new feature, there are some flaws. Sorry for inconvenience.


> That missing directory seems to be related to Fleet Commander.  Is
> that correct?

Correct.

Comment 8 Brian J. Murrell 2021-08-19 16:00:36 UTC
Yeah.  I have a full understanding now.  I appreciate your patience with my questions.

So indeed, the root issue here is if the debug level of that:

   *  (2021-08-19  9:29:01): [be[example.com]] [remove_tree_with_ctx] (0x0020): [RID#609] Cannot open /var/lib/sss/deskprofile/example.com/brian: [2]: No such file or directory

being at the default debug level of 0x20 is appropriate.  Given that it's benign and only a result of not using an optional feature of sssd, my suggestion would be that it's not appropriate at 0x20 and the description of 0x20:

0x0020: Critical failures. An error that doesn't kill the SSSD, but one that indicates that at least one major feature is not going to work properly.

seems to bear that out.  I'm not sure anything less than 0x0100 is even appropriate for such a benign message.  But I will leave figuring out the right level to you folks, just so long as it's not at the default level any more.

Comment 9 Alexey Tikhonov 2021-08-23 13:40:24 UTC
Upstream PR: https://github.com/SSSD/sssd/pull/5758

Comment 10 Pavel Březina 2021-08-25 09:42:14 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5758

* `master`
    * bd2ccbf69af8e5836f6f6e09a893d54428d903c5 - file utils: reduce log level in remove_tree_with_ctx() Users of this function are responsible to decide if fail is critical.

Comment 11 Alexey Tikhonov 2022-03-14 16:51:46 UTC
Fixed since sssd-2.6.0