Bug 1416150

Summary:	RFE: Log to syslog when sssd cannot contact servers, goes offline
Product:	Red Hat Enterprise Linux 7	Reporter:	Orion Poplawski <orion>
Component:	sssd	Assignee:	SSSD Maintainers <sssd-maint>
Status:	CLOSED ERRATA	QA Contact:	Madhuri <mupadhye>
Severity:	medium	Docs Contact:	Lucie Vařáková <lmanasko>
Priority:	unspecified
Version:	7.3	CC:	enewland, fidencio, grajaiya, jhrozek, lmanasko, lslebodn, mkosek, mzidek, orion, pasik, pbrezina, sbose
Target Milestone:	rc	Keywords:	FutureFeature
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	sssd-1.16.0-1.el7	Doc Type:	Enhancement
Doc Text:	SSSD logs changes in its status to syslog Previously, the System Security Services Daemon (SSSD) logged information about changing its online or offline status to the SSSD logs only. With this update, the changes in SSSD status are logged also to the syslog service, which improves the availability of the information to system administrators.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-10 17:09:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1472344

Description Orion Poplawski 2017-01-24 17:14:05 UTC

Description of problem:

For a long time, this situation went undetected:

(Mon Jan 16 03:45:49 2017) [sssd[be[default]]] [fo_resolve_service_send] (0x0020): No available servers for service 'LDAP'
(Mon Jan 16 03:45:49 2017) [sssd[be[default]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])

because nothing was logged to syslog about it.  This is a failure situation and should be logged as such.

Version-Release number of selected component (if applicable):
sssd-1.14.0-43.el7_3.11.x86_64

Comment 1 Lukas Slebodnik 2017-01-24 17:17:28 UTC

You can do it even nowadays:
* enable logging to journald @see /etc/systemd/system/sssd.service.d/journal.conf
* set "debug_level = 1" in domain section of sssd.conf

Comment 2 Orion Poplawski 2017-01-24 17:24:53 UTC

I don't think this works because the syslog messages are marked as "debug" and don't go to /var/log/messages, but I'll test to make sure.

Comment 3 Orion Poplawski 2017-01-24 17:41:02 UTC

Confirmed, messages went to the journal, but with log level of "debug" and so didn't get logged to /var/log/messages.

Comment 4 Lukas Slebodnik 2017-01-24 18:29:04 UTC

(In reply to Orion Poplawski from comment #3)
> Confirmed, messages went to the journal, but with log level of "debug"
That's correct. I checked the source code and we always log with priority debug
but we also set custom journald attribute SSSD_DEBUG_LEVEL. Maybe rsyslog can somehow consume additional attributes from journal

Comment 5 Orion Poplawski 2017-01-24 20:50:42 UTC

This appears to do the trick after enabling journal logging for sssd:

# cat /etc/rsyslog.d/sssd.conf
if ($!SSSD_DEBUG_LEVEL == "10" or $!SSSD_DEBUG_LEVEL == "20") then {
  action(type="omfile" file="/var/log/messages" template="RSYSLOG_TraditionalFileFormat")
  stop
}

Comment 7 Jakub Hrozek 2017-02-01 10:11:30 UTC

(In reply to Orion Poplawski from comment #5)
> This appears to do the trick after enabling journal logging for sssd:
> 
> # cat /etc/rsyslog.d/sssd.conf
> if ($!SSSD_DEBUG_LEVEL == "10" or $!SSSD_DEBUG_LEVEL == "20") then {
>   action(type="omfile" file="/var/log/messages"
> template="RSYSLOG_TraditionalFileFormat")
>   stop
> }

I don't think we should log debug messages to syslog, but only send very selectively chosen messages like we already do. I think what you described is an acceptable workaround and would prefer to close this bug. Do you agree?

Comment 8 Orion Poplawski 2017-02-01 16:13:22 UTC

The original intent of this bug was to request that sssd log to syslog that sssd has gone offline because it could not connect to any of the providers.  It does not currently do so, and I think it should.  If you disagree, I can continue to use my workaround.

Comment 9 Jakub Hrozek 2017-02-01 20:51:54 UTC

Well, I'm not categorically against. We have other places in sssd where we log some action to syslog.

I'm just wondering how to strike the balance for systems that go offline and online quite often (laptops) to avoid flooding the logs, but at the same time make the message usable in general. Do you think it would to log a LOG_NOTICE-level message?

Comment 10 Orion Poplawski 2017-02-01 21:17:59 UTC

Perhaps it could be paired with a configuration option like "sssd_warn_if_offline"?

Comment 11 Jakub Hrozek 2017-02-08 21:08:09 UTC

Upstream ticket:
https://fedorahosted.org/sssd/ticket/3307

Comment 12 Jakub Hrozek 2017-08-10 14:33:58 UTC

QE: To reproduce, make SSSD go offline, e.g. by switching off the server SSSD is configured with. Check the syslog or journal for messages from SSSD, there should be one telling you that SSSD switched to offline mode. The details will be tbd, perhaps we'll implement two kind of notifications, one for disabling a particular domain and one for the global state (and mid-to-long term we'll be getting rid of the global state and only keep track of the online state per-domain)

Comment 18 Fabiano Fidêncio 2017-11-22 12:58:41 UTC

master:
* 1185cbce8d5dd04e539ca74d8f9564e5715a78aa
* 60ec0db015c354349af445e0ec63f8b8421343fe

Comment 22 Madhuri 2017-12-08 11:05:33 UTC

Verified with
sssd-1.16.0-9.el7.x86_64

Verification Steps:
1. Configured ldap server.

2. Configure sssd client.

3. Make ldap server offline, by doing typo in ldap server url or stop running ldap server.

4. Check server status in domain log
From domain logs,
# cat /var/log/sssd/sssd_LDAP.log | grep 'offline'

(Fri Dec  8 04:48:57 2017) [sssd[be[LDAP]]] [dp_get_options] (0x0400): Option ldap_offline_timeout has value 60
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [be_mark_offline] (0x2000): Going offline!
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask.
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [be_ptask_offline_cb] (0x0400): Back end is offline
(Fri Dec  8 04:48:58 2017) [sssd[be[LDAP]]] [be_ptask_offline_cb] (0x0400): Back end is offline

5. Check the server status in syslogs

From syslogs,
# journalctl -xeu sssd

Dec 08 04:49:59 ibm-x3650m4-01-vm-06.lab.eng.bos.redhat.com sssd[be[LDAP]][27985]: Backend is offline

Comment 27 errata-xmlrpc 2018-04-10 17:09:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0929