Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
a) sssd does not boot anything b) it's very common to provide log files.
Sorry don't know how I managed to commit this before updating details Description of problem: When I start F24 with sssd (1.14.1-2) and since the beginning it boots in offline mode. Logs suggest it can't find the DNS servers. Now I maybe missing an item (an NM dispatcher or something) in the setup, my test system here has been updated between versions. Version-Release number of selected component (if applicable): sssd (1.14.1-2) How reproducible: Everytime Steps to Reproduce: 1. Boot System 2. sssctl domain-status iongeo.lan -o 3. See if shows Online status: Offline 4. Restarting sssd brings it back to "Online status: Online" Additional info: My reading of the log suggests it can't find the DNS servers that NM has put into resolv.conf maybe doesn't get reread. I have tried with nscd running (configured just for name service caching) and not running. I'll post the full log. (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD' (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'neutral' (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [resolve_srv_send] (0x0200): The status of SRV lookup is neutral (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [ad_srv_plugin_send] (0x0400): About to find domain controllers (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [ad_get_dc_servers_send] (0x0400): Looking up domain controllers in domain ukion.iongeo.com (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [resolv_discover_srv_next_domain] (0x0400): SRV resolution of service 'ldap'. Will use DNS discovery domain 'ukion.iongeo.com' (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.ukion.iongeo.com' (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [unschedule_timeout_watcher] (0x4000): Unscheduling DNS timeout watcher (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [request_watch_destructor] (0x0400): Deleting request watch (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [resolv_discover_srv_done] (0x0040): SRV query failed [11]: Could not contact DNS servers (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [fo_set_port_status] (0x0100): Marking port 0 of server '(no name)' as 'not working' (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [resolve_srv_done] (0x0040): Unable to resolve SRV [1432158236]: SRV lookup error (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'AD' as 'not resolved' (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned [1432158236]: SRV lookup error (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD' (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'not working' (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [fo_resolve_service_send] (0x0020): No available servers for service 'AD' (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [be_resolve_server_done] (0x1000): Server resolution failed: [5]: Input/output error (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error]) (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [be_mark_offline] (0x2000): Going offline! (Mon Sep 26 15:21:41 2016) [sssd[be[ukion.iongeo.com]]] [be_mark_offline] (0x2000): Enable check_if_online_ptask.
Created attachment 1204889 [details] Full log file for this domain
What happens if you manually signal sssd to try to connect, e.g.: # kill -s SIGUSR2 $(pidof sssd) # id not-cache-user Does SSSD start after NM?
Looks like sssd is starting before NetworkManager Sep 27 11:42:33 eden sssd[721]: Starting up Sep 27 11:42:50 eden NetworkManager[742]: <info> [1474972970.7890] NetworkManager (version 1.2.4-2.fc24) is starting... Running the kill command and the id command, it still stays offline
When the force offline it still seems unable to find DNS servers (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.ukion.iongeo.com' (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [unschedule_timeout_watcher] (0x4000): Unscheduling DNS timeout watcher (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [request_watch_destructor] (0x0400): Deleting request watch (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [resolv_discover_srv_done] (0x0040): SRV query failed [11]: Could not contact DNS servers (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [fo_set_port_status] (0x0100): Marking port 0 of server '(no name)' as 'not working' (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [resolve_srv_done] (0x0040): Unable to resolve SRV [1432158236]: SRV lookup error (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'AD' as 'not resolved' (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned [1432158236]: SRV lookup error (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Sep 27 14:33:56 2016) [sssd[be[ukion.iongeo.com]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD' As I say, happy to acknowledge something cookie with this setup.
Do you get the DNS name servers via DHCP? The glibc resolver code will read /etc/resolv.conf only during the start of the process and will not be aware of any changes. It looks like the c-ares resolver SSSD uses might have the same limitation. So you should make sure SSSD starts after NM to make sure /etc/resolv.conf is up-to-date.
Yes we do get DNS from DHCP on desktops. We have seen this issue with glibc only ever using the resolv.conf as of when the program started. I was hoping sssd might have not done this as it's usually pretty dynamic. But starting NM first won't necessarily help, just depends how quickly DHCP responds, for example if ports aren't portfast they may not get an IP for many seconds after NM comes up. This will still break SSSD assuming the above. Shouldn't NM look at resolv.conf or maybe have a dispatcher NM script to force a resolv.conf reload? I could maybe workaround by starting nscd (only doing host caching) up first, then we can have a dispatcher (I think we already have this) that will tell nscd to reload it's config (and resolv.conf). Though as I say, unless nscd is already running when other programs start their glibc resolvers won't use it but do direct resolve instead. So starting nscd first (preSSSD) may "fix" this. Note sure if this issue is RFE stuff?
Making sssd start after nscd seems to workaround this issue, but this also seems to force NetworkManager before sssd (not a big issue but just an observation).
(In reply to Sumit Bose from comment #7) > Do you get the DNS name servers via DHCP? The glibc resolver code will read > /etc/resolv.conf only during the start of the process and will not be aware > of any changes. It looks like the c-ares resolver SSSD uses might have the > same limitation. In general yes, c-ares doesn't re-read resolv.conf, but we do (try to) work around this limitation in sssd. See for example this code in monitor.c: 2137 /* Watch for changes to the DNS resolv.conf */ 2138 ret = monitor_config_file(ctx, ctx, RESOLV_CONF_PATH, 2139 monitor_update_resolv, false); which sets inotify watches for resolv.conf, if there is any modification of resolv.conf, the offline state is re-set. Please note I'm not claiming there is no bug in that code, I'm just stating resolv.conf changes should reset offline status. Could we please also see a logfile from the [sssd] process itself?
(In reply to Colin Simpson from comment #9) > Making sssd start after nscd Is there a reason to use nscd together with sssd? Is nscd used for the maps that sssd serves (like passwd and group) or also others? By the way in general starting sssd before NM (or before networking is up in general) shouldn't cause an issue since either the libnl integration or changes to resolv.conf should reset the offline status. But perhaps we have a bug..
(In reply to Jakub Hrozek from comment #11) > Is there a reason to use nscd together with sssd? Is nscd used for the maps >that sssd serves (like passwd and group) or also others? A I said above "I could maybe workaround by starting nscd (only doing host caching NO passwd, group, services or netgroup)". We need to use nscd for host caching, as we have some badly behaved third party apps that when certain processes fail can get into a constant restart loop, that hammers the nameservers. I will generate the log for you.
Log file attached. Now I guess the interesting lines might be: (Wed Sep 28 15:11:54 2016) [sssd] [monitor_config_file] (0x0080): Could not stat file [/etc/resolv.conf]. Error [2:No such file or directory] Now we do have a line in a dispatcher and startup script that we use to clobber resolv.conf if we detect no valid IP's. This was due to issues in earlier Fedora/RHEL versions where boot would become interminably slow if there were stale entries still in resolv.conf. This seemed to happen if shutdown ON network and booted OFF network. Maybe no such issue exists these days.. e.g. cat >/etc/resolv.conf <<EOF EOF But I can't see this or anything we would do that would cause the file to not exist at any point. I guess it may depend on how NetworkManager updates resolv.conf or more correctly what it's symlinked to: /etc/resolv.conf -> /var/run/NetworkManager/resolv.conf I guess the monitor_config_file is upset it doesn't exist (for unknown reasons) and never bothers to recheck it again?
Created attachment 1205499 [details] sssd.log
Actually it does say later: (Wed Sep 28 15:25:15 2016) [sssd] [monitor_update_resolv] (0x0040): Resolv.conf has been updated. Reloading. (Wed Sep 28 15:25:38 2016) [sssd] [monitor_update_resolv] (0x0040): Resolv.conf has been updated. Reloading. So I guess something notices....
Sorry about not noticing your nscd use-case, it's certainly valid to use it as long as you don't cache the sssd maps. The logs are interesting, mainly this part: (Wed Sep 28 15:25:32 2016) [sssd] [link_msg_handler] (0x1000): netlink link message: iface idx 3 (eno1) flags 0x11043 (broadcast,multicast,up,running,lowerup) (Wed Sep 28 15:25:32 2016) [sssd] [has_phy_80211_subdir] (0x1000): No phy80211 directory in sysfs, probably not a wireless interface (Wed Sep 28 15:25:32 2016) [sssd] [network_status_change_cb] (0x2000): A networking status change detected signaling providers to reset offline status (Wed Sep 28 15:25:32 2016) [sssd] [service_signal] (0x0020): Could not signal service [ukion.iongeo.com]. The failure comes from this code in the logs: 600 if (!svc->conn) { 601 /* Avoid a race condition where we are trying to 602 * order a service to reload that hasn't started 603 * yet. 604 */ 605 DEBUG(SSSDBG_CRIT_FAILURE, 606 "Could not signal service [%s].\n", svc->name); 607 return EIO; 608 } So the sbus connection is not up yet, which is really strange after this long. Do you also have logs from the domain section that correspond to the same timeframe? I wonder if there is some issue bringing the service up.. I also don't see the service registering itself with the monitor. I do see it being queued for startup: (Wed Sep 28 15:11:55 2016) [sssd] [start_service] (0x0100): Queueing service ukion.iongeo.com for startup And responders do register themselves, like NSS here: (Wed Sep 28 15:12:03 2016) [sssd] [sbus_message_handler] (0x2000): Received SBUS method org.freedesktop.sssd.monitor.RegisterService on path /org/freedesktop/sssd/monitor (Wed Sep 28 15:12:03 2016) [sssd] [sbus_get_sender_id_send] (0x2000): Not a sysbus message, quit (Wed Sep 28 15:12:03 2016) [sssd] [client_registration] (0x0100): Received ID registration: (nss,1) I suspect something like a sudo renewal task or similar triggers a GSSAPI bind which blocks sssd_be for a long time b/c the network might not be up at that time and the service registration times out. But that's just a hunch.
I didn't have the domain log file to match the main daemon one. So attached two new ones from a new run of this issue.
Created attachment 1205970 [details] sssd.log Version 2
Created attachment 1205971 [details] sssd_ukion.iongeo.com.log Version 2
1) If resolv.conf is missing, sssd is trying to stat it every 10 seconds until it is found. So this error is not fatal. 2) Provider does not register to the monitor because it hangs while loading /usr/lib64/sssd/libsss_ad.so. Notice the timestamps, it takes it 6 seconds but timeout for registration is 5 seconds. Thu Sep 29 14:14:45 2016) [sssd[be[ukion.iongeo.com]]] [dp_module_open_lib] (0x1000): Loading module [ad] with path [/usr/lib64/sssd/libsss_ad.so] (Thu Sep 29 14:14:51 2016) [sssd[be[ukion.iongeo.com]]] [dp_module_run_constructor] (0x0400): Executing module [ad] constructor. Why does it take so long? Can you run sssd with strace as described at [1]? [1] https://fedorahosted.org/sssd/wiki/DevelTips
(In reply to Pavel Březina from comment #20) > 1) If resolv.conf is missing, sssd is trying to stat it every 10 seconds > until it is found. So this error is not fatal. > > 2) Provider does not register to the monitor because it hangs while loading > /usr/lib64/sssd/libsss_ad.so. Notice the timestamps, it takes it 6 seconds > but timeout for registration is 5 seconds. > > Thu Sep 29 14:14:45 2016) [sssd[be[ukion.iongeo.com]]] [dp_module_open_lib] > (0x1000): Loading module [ad] with path [/usr/lib64/sssd/libsss_ad.so] > (Thu Sep 29 14:14:51 2016) [sssd[be[ukion.iongeo.com]]] > [dp_module_run_constructor] (0x0400): Executing module [ad] constructor. > I suspect this has to do with some initial (enumeration? subdomains) task that might try to bind to LDAP. IIRC the default libldap bind timeout is 6 seconds. If the strace confirms that, I wonder if we should raise the default timeout? > Why does it take so long? Can you run sssd with strace as described at [1]? > > [1] https://fedorahosted.org/sssd/wiki/DevelTips
No, 6 seconds takes to dlopen the module. The associated code is: ret = dp_module_open_lib(module); if (ret != EOK) { goto done; } ret = dp_module_run_constructor(module, be_ctx, provider); if (ret != EOK) { goto done; } At 14:14:45 we are in dp_module_open_lib, dlopening the library. At 14:14:51 we are in dp_module_run_constructor. There is no code from the module itself run in between.
It's not clear from the wiki page exactly what you'd like strace'd for this issue?
(In reply to Colin Simpson from comment #23) > It's not clear from the wiki page exactly what you'd like strace'd for this > issue? This section: https://fedorahosted.org/sssd/wiki/DevelTips#UsingstracetotracktheSSSDprocesses includes how to put a "command=" directive to sssd.conf that would enable us to strace the sssd_be process. Please note that SELinux must be Permissive for that to be permitted..
(In reply to Pavel Březina from comment #22) > No, 6 seconds takes to dlopen the module. > > The associated code is: > > ret = dp_module_open_lib(module); > if (ret != EOK) { > goto done; > } > > ret = dp_module_run_constructor(module, be_ctx, provider); > if (ret != EOK) { > goto done; > } > > At 14:14:45 we are in dp_module_open_lib, dlopening the library. At 14:14:51 > we are in dp_module_run_constructor. There is no code from the module itself > run in between. Then we really need the strace. I wonder if SELInux might be the culprit..
Sorry if I'm being dim here, the wiki says put the strace line in the domain section, as below: [sssd] domains = ukion.iongeo.com config_file_version = 2 services = nss, pam, ifp debug_level = 10 ##command = strace -ff -o /tmp/sssd_be_strace /usr/libexec/sssd/sssd_be --debug-level=10 --domain ukion.iongeo.com [domain/ukion.iongeo.com] ad_domain = ukion.iongeo.com krb5_realm = UKION.IONGEO.COM realmd_tags = manages-system joined-with-samba cache_credentials = True id_provider = ad krb5_store_password_if_offline = True default_shell = /bin/bash # Change Local ldap_id_mapping = False # Change Local use_fully_qualified_names = False fallback_homedir = /home/%u access_provider = ad # Change Local - For local users #override_homedir = /home/%u debug_level = 10 command = strace -ff -o /tmp/sssd_be_strace /usr/libexec/sssd/sssd_be --debug-level=10 --domain ukion.iongeo.com I have tried both the sections (swapping the "command=" line that is commented in the above. It starts in the setup above by doesn't work i.e. I get # sssctl domain-status ukion.iongeo.com -o Unable to get online status [3]: Communication error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Check that SSSD is running and the InfoPipe responder is enabled. Make sure 'ifp' is listed in the 'services' option in sssd.conf. Unable to get online status
Hi, given that provider is not registered in monitor, I believe this is expected behaviour. Or does it work without the "command" option? Please, remove existing logs /var/log/sssd/* and then reboot your machine. SSSD should be offline after reboot, there is no need to check it. We would like to see the generated logs (so we can match timestamps better) together with /tmp/sssd_be_strace. Configuration above looks good.
The "sssctl domain-status ukion.iongeo.com -o" works when the "command" option is commented out. That's my point, so I'm concerned my "command=" argument is incorrect.
Does running with strace generate any logs, though? Could we see them if it does, please?
(In reply to Colin Simpson from comment #26) > Sorry if I'm being dim here, the wiki says put the strace line in the domain > section, as below: > > [sssd] > domains = ukion.iongeo.com > config_file_version = 2 > services = nss, pam, ifp > debug_level = 10 > ##command = strace -ff -o /tmp/sssd_be_strace /usr/libexec/sssd/sssd_be > --debug-level=10 --domain ukion.iongeo.com > > [domain/ukion.iongeo.com] > ad_domain = ukion.iongeo.com > krb5_realm = UKION.IONGEO.COM > realmd_tags = manages-system joined-with-samba > cache_credentials = True > id_provider = ad > krb5_store_password_if_offline = True > default_shell = /bin/bash > # Change Local > ldap_id_mapping = False > # Change Local > use_fully_qualified_names = False > fallback_homedir = /home/%u > access_provider = ad > # Change Local - For local users > #override_homedir = /home/%u > debug_level = 10 > command = strace -ff -o /tmp/sssd_be_strace /usr/libexec/sssd/sssd_be > --debug-level=10 --domain ukion.iongeo.com > I think you miss options "--uid 0 --gid 0" The best is copy&paste parameter from output of commands "pgrep -af sssd_be" when the option command is not used in sssd.conf
This is still present on F25. I did email you directly about this on the 14th November, but you must have missed. The issue is (of course) this only happens at boot time. So I changed the line in /etc/systemd/system/multi-user.target.wants/sssd.service ExecStart=/usr/bin/strace -ff -tt -s256 -o /tmp/sssd_strace /usr/sbin/sssd -D -f and Type=simple And is does come up and strace this, but arghh it comes up online when you do this! If I switch it back to the original way, it again comes up offline..... Any idea how to trap this?
11:19:18.011419 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 16 11:19:18.011442 stat("/var/lib/sss/pipes/private/sbus-dp_$domain", 0x7fff2be58770) = -1 ENOENT (No such file or directory) 11:19:18.011475 bind(16, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/private/sbus-dp_$domain"}, 56) = 0 11:19:28.595978 listen(16, 128) = 0 This is the thing that is causing the initial delay so we should start from here. Why does it take ten seconds to bind? Does anyone have any idea? I thought it may be because dbus is not yet started, but at this point we have only a simple unix socket so this won't be the case.
Since this is a UNIX local socket, the bind actually creates the mapping between the filesystem name and the socket fd. But 10 seconds really seems excessive...I wonder if journald showed some errors on the system in general from around that time? btw the internal "tech-list" mailing list might be a place to ask why can bind to af_unix take 10 seconds..
Colin, is something suspicious in journald between 11:19:18 .. 11:19:28 ?
(In reply to Lukas Slebodnik from comment #34) > Colin, > is something suspicious in journald between 11:19:18 .. 11:19:28 ? btw I can forward you Colin's sssd logs he sent me privately. But I was never able to figure out anything suspicious from them..
(In reply to Jakub Hrozek from comment #35) > (In reply to Lukas Slebodnik from comment #34) > > Colin, > > is something suspicious in journald between 11:19:18 .. 11:19:28 ? > > btw I can forward you Colin's sssd logs he sent me privately. But I was > never able to figure out anything suspicious from them.. I doubt I will find something there. i trust you. BTW there is something suspicious on system. Because once there is 10 seconds delay to dlopen file (Thu Sep 29 14:14:45 2016) [sssd[be[ukion.iongeo.com]]] [dp_module_open_lib] (0x1000): Loading module [ad] with path [/usr/lib64/sssd/libsss_ad.so] (Thu Sep 29 14:14:51 2016) [sssd[be[ukion.iongeo.com]]] [dp_module_run_constructor] (0x0400): Executing module [ad] constructor. and other time there is a long delay to bind unix socket 11:19:18.011475 bind(16, {sa_family=AF_UNIX, sun_path="/var/lib/sss/pipes/private/sbus-dp_$domain"}, 56) = 0 11:19:28.595978 listen(16, 128) = 0 Which are two different functions in sssd. Colin, Do you have nfs or some other network file system on the machine? If you boot macine more time is the 10 seconds delay every time in the same place?
This is with with RH developers Jakub Hrozek and Lukas Slebodnik, who have some new log files.
This message is a reminder that Fedora 24 is nearing its end of life. Approximately 2 (two) weeks from now Fedora will stop maintaining and issuing updates for Fedora 24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '24'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 24 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Not sure if RH developers saw anything from my F24 logs. But this still occurs for me on F26.
Is this still happening with current Fedora?
Still happens on Fedora 27 on my test system. I will probably flatten this machine when F28 lands to see if some hangover through the many update cycles this machine has been through is causing this.
After a clean install of Fedora 28 this no longer occurs for me.
I don't think this bug should be closed, we have similar (same?) reports from RHEL.
(In reply to Jakub Hrozek from comment #43) > I don't think this bug should be closed, we have similar (same?) reports > from RHEL. Colin cannot reproduce it anymore. If it is still reproducible then it will be better to track it in upstream.
(Ubuntu user here, and it's important that they don't use nss_resolve, they just make /etc/resolv.conf a symlink to ../run/systemd/resolve/stub-resolv.conf) If anyone still sees this bug, here are the suggested additional steps: 1. Verify that, in your setup, /etc/resolv.conf is a symlink to somewhere in /run or /var/run. 2. If so, when sssd is in offline mode and you think it should be online, run this: tcpdump -pni lo port 53 For me, it returns some packets for 127.0.0.1:53 at the points of time when sssd tries to go online. In other words, it sends packets to 127.0.0.1:53 even though resolv.conf points to some other DNS server. What seems important is that the file that the /etc/resolv.conf symlink points to does not exist when sssd starts up. So perhaps the resolv.conf reloading code needs to be audited for this specific case.