Bug 1835813

Summary: sssd boots offline if symlink for /etc/resolv.conf is broken/missing
Product: Red Hat Enterprise Linux 7 Reporter: aheverle
Component: sssdAssignee: Alexey Tikhonov <atikhono>
Status: CLOSED ERRATA QA Contact: sssd-qe <sssd-qe>
Severity: high Docs Contact:
Priority: high    
Version: 7.7CC: atikhono, grajaiya, jhrozek, lslebodn, mniranja, mzidek, pbrezina, sgoveas, swachira, thalman, tscherf
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: sync-to-jira review
Fixed In Version: sssd-1.16.5-7.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1842861 (view as bug list) Environment:
Last Closed: 2020-09-29 19:50:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1842861    

Description aheverle 2020-05-14 14:21:55 UTC
Description of problem:
sssd goes offline is symlink for /etc/resolv.con is missing/broken

Version-Release number of selected component (if applicable):
sssd-1.16.4-37.el7_8.1.x86_64

How reproducible:
when symlink is broken

Steps to Reproduce:
1. symlink breaks 
2. sssd goes offline and is not able to recover
3.

Actual results:
(Sun May 10 03:52:25 2020) [sssd[be[EXAMPLE.LOCAL]]] [fo_resolve_service_done] (0x0020): Failed to resolve server 'server.example.local': Could not contact DNS servers
(Sun May 10 03:52:25 2020) [sssd[be[EXAMPLE.LOCAL]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (server.example.local), resolver returned [5]: Input/output error
(Sun May 10 03:52:25 2020) [sssd[be[EXAMPLE.LOCAL]]] [fo_resolve_service_send] (0x0020): No available servers for service 'AD'
(Sun May 10 04:00:03 2020) [sssd[be[EXAMPLE.LOCAL]]] [dp_req_reply_gen_error] (0x0080): DP Request [Initgroups #265]: Finished. Backend is currently offline.
(Sun May 10 04:00:03 2020) [sssd[be[EXAMPLE.LOCAL]]] [dp_req_reply_gen_error] (0x0080): DP Request [Initgroups #266]: Finished. Backend is currently offline.
(Sun May 10 04:00:03 2020) [sssd[be[EXAMPLE.LOCAL]]] [dp_req_reply_gen_error] (0x0080): DP Request [Initgroups #267]: Finished. Backend is currently offline.

Expected results:
sssd to boot up if symlink is resolved

Additional info:

Comment 9 Alexey Tikhonov 2020-05-18 11:02:17 UTC
Upstream PR: https://github.com/SSSD/sssd/pull/5165

Comment 10 Pavel Březina 2020-05-21 08:51:38 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5165

* `sssd-1-16`
    * 9fe64023e32ab9e3fbbfeefc2168a49b748a1846 - MONITOR: Resolve symlinks setting the inotify watchers
    * f952a5de24ba7c40310bbf63fa83d772a9cbaec9 - MONITOR: Add a new option to control resolv.conf monitoring
    * 6e82ba82e4f2ce1440588437ca9e23a1b159df09 - MONITOR: Propagate error when resolv.conf does not exists in polling mode

Comment 14 Niranjan Mallapadi Raghavender 2020-05-27 09:35:38 UTC
Versions:
========
python-sssdconfig-1.16.4-21.el7_7.1.noarch
sssd-client-1.16.4-21.el7_7.1.x86_64
sssd-krb5-common-1.16.4-21.el7_7.1.x86_64
sssd-ldap-1.16.4-21.el7_7.1.x86_64
sssd-dbus-1.16.4-21.el7_7.1.x86_64
sssd-1.16.4-21.el7_7.1.x86_64
sssd-common-1.16.4-21.el7_7.1.x86_64
sssd-common-pac-1.16.4-21.el7_7.1.x86_64
sssd-ipa-1.16.4-21.el7_7.1.x86_64
sssd-krb5-1.16.4-21.el7_7.1.x86_64
sssd-tools-1.16.4-21.el7_7.1.x86_64
sssd-kcm-1.16.4-21.el7_7.1.x86_64
sssd-winbind-idmap-1.16.4-21.el7_7.1.x86_64
sssd-ad-1.16.4-21.el7_7.1.x86_64
sssd-proxy-1.16.4-21.el7_7.1.x86_64


sssd.conf
[sssd]
config_file_version = 2
services = nss, pam
domains = files, example1

[domain/files]
id_provider = files
full_name_format = %1$s
debug_level = 3

[nss]
default_shell = /bin/bash
memcache_timeout = 600
enum_cache_timeout = 5400
entry_cache_nowait_percentage = 75
entry_negative_timeout = 5400
# Default debug_level of nss shall be max 5 because it is logging every uid lookup and can generate 5+GB/day
debug_level = 3


[domain/example1]
ldap_search_base = dc=example,dc=test
id_provider = ldap
auth_provider = ldap
ldap_user_home_directory = /home/%u
ldap_uri = ldaps://auto-hv-02-guest10.idmqe.lab.eng.bos.redhat.com
ldap_tls_cacert = /etc/openldap/cacerts/cacert.pem
use_fully_qualified_names = True
debug_level = 9


1. create a symlink for resolv.conf

[root@auto-hv-02-guest10 etc]# ls -l resolv.conf
lrwxrwxrwx. 1 root root 19 May 27 02:51 resolv.conf -> /opt/resolv.conf

2. Restart sssd 
systemctl restart sssd

3. Remove the actual path, so that symlink is broken
cp /opt/resolv.conf /opt/resolv.conf.backup
rm -f /opt/resolv.conf

4. cat /etc/resolv.conf
[root@qe-blade-01 sssd]# cat /etc/resolv.conf 
cat: /etc/resolv.conf: No such file or directory

5. Restart sssd, sssd goes offline :
(Wed May 27 05:09:28 2020) [sssd[be[example1]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
(Wed May 27 05:09:28 2020) [sssd[be[example1]]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1
(Wed May 27 05:09:28 2020) [sssd[be[example1]]] [sdap_sudo_refresh_connect_done] (0x0020): SUDO LDAP connection failed [11]: Resource temporarily unavailable
(Wed May 27 05:09:28 2020) [sssd[be[example1]]] [be_ptask_done] (0x0040): Task [SUDO Full Refresh]: failed with [11]: Resource temporarily unavailable
(Wed May 27 05:09:28 2020) [sssd[be[example1]]] [be_ptask_schedule] (0x0400): Task [SUDO Full Refresh]: scheduling task 21600 seconds from now [1590592168]
(Wed May 27 05:09:28 2020) [sssd[be[example1]]] [sdap_id_release_conn_data] (0x4000): releasing unused connection
(Wed May 27 05:09:28 2020) [sssd[be[example1]]] [be_ptask_offline_cb] (0x0400): Back end is offline
(Wed May 27 05:09:28 2020) [sssd[be[example1]]] [be_ptask_disable] (0x0400): Task [SUDO Smart Refresh]: disabling task
(Wed May 27 05:09:28 2020) [sssd[be[example1]]] [be_ptask_offline_cb] (0x0400): Back end is offline
(Wed May 27 05:09:28 2020) [sssd[be[example1]]] [be_ptask_disable] (0x0400): Task [SUDO Full Refresh]: disabling task


6. Update the path to fix the symlink

[root@qe-blade-01 sssd]# cat /etc/resolv.conf
# Generated by NetworkManager
search idmqe.lab.eng.bos.redhat.com
nameserver 192.168.122.1


7. Try query the users. 

[root@qe-blade-01 sssd]# getent passwd foo5@example1
[root@qe-blade-01 sssd]# getent passwd foo4@example1
[root@qe-blade-01 sssd]# getent passwd foo3@example1
[root@qe-blade-01 sssd]# getent passwd foo3@example1


8. Update sssd to sssd-1.16.5-7.el7.x86_64

9. check the symlink

[root@auto-hv-02-guest10 opt]# ls -l /etc/resolv.conf
lrwxrwxrwx. 1 root root 16 May 27 05:24 /etc/resolv.conf -> /opt/resolv.conf


10. Query the users to check if user lookup is successful
[root@auto-hv-02-guest10 opt]# id foo1@example1
uid=14583101(foo1@example1) gid=14564100(ldapusers@example1) groups=14564100(ldapusers@example1)
[root@auto-hv-02-guest10 opt]# id foo2@example1
uid=14583102(foo2@example1) gid=14564100(ldapusers@example1) groups=14564100(ldapusers@example1)


11. Remove the path to break the symlink
[root@auto-hv-02-guest10 opt]# mv resolv.conf resolv.conf.orig
[root@auto-hv-02-guest10 opt]# cat /etc/resolv.conf
cat: /etc/resolv.conf: No such file or directory
[root@auto-hv-02-guest10 opt]# 

11. Query the users:

[root@auto-hv-02-guest10 opt]# id foo3@example1
uid=14583103(foo3@example1) gid=14564100(ldapusers@example1) groups=14564100(ldapusers@example1)

[root@auto-hv-02-guest10 opt]# id foo4@example1
uid=14583104(foo4@example1) gid=14564100(ldapusers@example1) groups=14564100(ldapusers@example1)
[root@auto-hv-02-guest10 opt]# id foo8@example1
uid=14583108(foo8@example1) gid=14564100(ldapusers@example1) groups=14564100(ldapusers@example1)


12. Verify if sssd is online or offline

[root@auto-hv-02-guest10 opt]# sssctl  domain-status example1
Online status: Online

Active servers:
LDAP: auto-hv-02-guest10.idmqe.lab.eng.bos.redhat.com

Discovered LDAP servers:
- auto-hv-02-guest10.idmqe.lab.eng.bos.redhat.com

[root@auto-hv-02-guest10 opt]# sssctl  domain-status example1

Comment 31 errata-xmlrpc 2020-09-29 19:50:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3904