Bug 1416150

Summary: RFE: Log to syslog when sssd cannot contact servers, goes offline
Product: Red Hat Enterprise Linux 7 Reporter: Orion Poplawski <orion>
Component: sssdAssignee: SSSD Maintainers <sssd-maint>
Status: CLOSED ERRATA QA Contact: Madhuri <mupadhye>
Severity: medium Docs Contact: Lucie Vařáková <lmanasko>
Priority: unspecified    
Version: 7.3CC: enewland, fidencio, grajaiya, jhrozek, lmanasko, lslebodn, mkosek, mzidek, orion, pasik, pbrezina, sbose
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: sssd-1.16.0-1.el7 Doc Type: Enhancement
Doc Text:
SSSD logs changes in its status to syslog Previously, the System Security Services Daemon (SSSD) logged information about changing its online or offline status to the SSSD logs only. With this update, the changes in SSSD status are logged also to the syslog service, which improves the availability of the information to system administrators.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 17:09:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1472344    

Description Orion Poplawski 2017-01-24 17:14:05 UTC
Description of problem:

For a long time, this situation went undetected:

(Mon Jan 16 03:45:49 2017) [sssd[be[default]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP'
(Mon Jan 16 03:45:49 2017) [sssd[be[default]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])

because nothing was logged to syslog about it.  This is a failure situation and should be logged as such.

Version-Release number of selected component (if applicable):
sssd-1.14.0-43.el7_3.11.x86_64

Comment 1 Lukas Slebodnik 2017-01-24 17:17:28 UTC
You can do it even nowadays:
* enable logging to journald @see /etc/systemd/system/sssd.service.d/journal.conf
* set "debug_level = 1" in domain section of sssd.conf

Comment 2 Orion Poplawski 2017-01-24 17:24:53 UTC
I don't think this works because the syslog messages are marked as "debug" and don't go to /var/log/messages, but I'll test to make sure.

Comment 3 Orion Poplawski 2017-01-24 17:41:02 UTC
Confirmed, messages went to the journal, but with log level of "debug" and so didn't get logged to /var/log/messages.

Comment 4 Lukas Slebodnik 2017-01-24 18:29:04 UTC
(In reply to Orion Poplawski from comment #3)
> Confirmed, messages went to the journal, but with log level of "debug"
That's correct. I checked the source code and we always log with priority debug
but we also set custom journald attribute SSSD_DEBUG_LEVEL. Maybe rsyslog can somehow consume additional attributes from journal

Comment 5 Orion Poplawski 2017-01-24 20:50:42 UTC
This appears to do the trick after enabling journal logging for sssd:

# cat /etc/rsyslog.d/sssd.conf
if ($!SSSD_DEBUG_LEVEL == "10" or $!SSSD_DEBUG_LEVEL == "20") then {
  action(type="omfile" file="/var/log/messages" template="RSYSLOG_TraditionalFileFormat")
  stop
}

Comment 7 Jakub Hrozek 2017-02-01 10:11:30 UTC
(In reply to Orion Poplawski from comment #5)
> This appears to do the trick after enabling journal logging for sssd:
> 
> # cat /etc/rsyslog.d/sssd.conf
> if ($!SSSD_DEBUG_LEVEL == "10" or $!SSSD_DEBUG_LEVEL == "20") then {
>   action(type="omfile" file="/var/log/messages"
> template="RSYSLOG_TraditionalFileFormat")
>   stop
> }

I don't think we should log debug messages to syslog, but only send very selectively chosen messages like we already do. I think what you described is an acceptable workaround and would prefer to close this bug. Do you agree?

Comment 8 Orion Poplawski 2017-02-01 16:13:22 UTC
The original intent of this bug was to request that sssd log to syslog that sssd has gone offline because it could not connect to any of the providers.  It does not currently do so, and I think it should.  If you disagree, I can continue to use my workaround.

Comment 9 Jakub Hrozek 2017-02-01 20:51:54 UTC
Well, I'm not categorically against. We have other places in sssd where we log some action to syslog.

I'm just wondering how to strike the balance for systems that go offline and online quite often (laptops) to avoid flooding the logs, but at the same time make the message usable in general. Do you think it would to log a LOG_NOTICE-level message?

Comment 10 Orion Poplawski 2017-02-01 21:17:59 UTC
Perhaps it could be paired with a configuration option like "sssd_warn_if_offline"?

Comment 11 Jakub Hrozek 2017-02-08 21:08:09 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/3307

Comment 12 Jakub Hrozek 2017-08-10 14:33:58 UTC
QE: To reproduce, make SSSD go offline, e.g. by switching off the server SSSD is configured with. Check the syslog or journal for messages from SSSD, there should be one telling you that SSSD switched to offline mode. The details will be tbd, perhaps we'll implement two kind of notifications, one for disabling a particular domain and one for the global state (and mid-to-long term we'll be getting rid of the global state and only keep track of the online state per-domain)

Comment 18 Fabiano Fidêncio 2017-11-22 12:58:41 UTC
master:
* 1185cbce8d5dd04e539ca74d8f9564e5715a78aa
* 60ec0db015c354349af445e0ec63f8b8421343fe

Comment 22 Madhuri 2017-12-08 11:05:33 UTC
Verified with
sssd-1.16.0-9.el7.x86_64

Verification Steps:
1. Configured ldap server.

2. Configure sssd client.

3. Make ldap server offline, by doing typo in ldap server url or stop running ldap server.

4. Check server status in domain log
From domain logs,
# cat /var/log/sssd/sssd_LDAP.log | grep 'offline'

(Fri Dec  8 04:48:57 2017) [sssd[be[LDAP]]] [dp_get_options] (0x0400): Option ldap_offline_timeout has value 60
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [be_mark_offline] (0x2000): Going offline!
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask.
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [be_ptask_offline_cb] (0x0400): Back end is offline
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [be_ptask_offline_cb] (0x0400): Back end is offline

5. Check the server status in syslogs

From syslogs,
# journalctl -xeu sssd

Dec 08 04:49:59 ibm-x3650m4-01-vm-06.lab.eng.bos.redhat.com sssd[be[LDAP]][27985]: Backend is offline

Comment 27 errata-xmlrpc 2018-04-10 17:09:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0929