Bug 2116207
| Summary: | SSSD starting offline after reboot [rhel-7.9.z] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Chino Soliard <csoliard> |
| Component: | sssd | Assignee: | Alejandro López <allopez> |
| Status: | CLOSED ERRATA | QA Contact: | Steeve Goveas <sgoveas> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.9 | CC: | aboscatt, atikhono, Ken.Fowler, kpfleming, pbrezina |
| Target Milestone: | rc | Keywords: | Triaged, ZStream |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | sync-to-jira | ||
| Fixed In Version: | sssd-1.16.5-10.el7_9.14 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-12-13 11:19:42 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Upstream PR: https://github.com/SSSD/sssd/pull/6413 Pushed PR: https://github.com/SSSD/sssd/pull/6413 * `sssd-1-16` * c52a5a640f0574f28281dd62238ffc7303eb4391 - BACKEND: Reload resolv.conf after initialization This issue is hard to reproduce as it needs an invalid /etc/resolv.conf when SSSD starts and becomes valid after the SSSD backend is started, but before the backend is running in a state where it can receive and handle sbus messages from the monitor. Testing it with the presence of log messages from the fixed version.
Verified with
[root@ip-10-0-199-233 ~]# yum update sssd
.
.
.
Updated:
sssd.x86_64 0:1.16.5-10.el7_9.14
Dependency Updated:
libipa_hbac.x86_64 0:1.16.5-10.el7_9.14 libsss_autofs.x86_64 0:1.16.5-10.el7_9.14 libsss_idmap.x86_64 0:1.16.5-10.el7_9.14 libsss_nss_idmap.x86_64 0:1.16.5-10.el7_9.14
libsss_simpleifp.x86_64 0:1.16.5-10.el7_9.14 libsss_sudo.x86_64 0:1.16.5-10.el7_9.14 python-sss.x86_64 0:1.16.5-10.el7_9.14 python-sssdconfig.noarch 0:1.16.5-10.el7_9.14
sssd-ad.x86_64 0:1.16.5-10.el7_9.14 sssd-client.x86_64 0:1.16.5-10.el7_9.14 sssd-common.x86_64 0:1.16.5-10.el7_9.14 sssd-common-pac.x86_64 0:1.16.5-10.el7_9.14
sssd-dbus.x86_64 0:1.16.5-10.el7_9.14 sssd-ipa.x86_64 0:1.16.5-10.el7_9.14 sssd-kcm.x86_64 0:1.16.5-10.el7_9.14 sssd-krb5.x86_64 0:1.16.5-10.el7_9.14
sssd-krb5-common.x86_64 0:1.16.5-10.el7_9.14 sssd-ldap.x86_64 0:1.16.5-10.el7_9.14 sssd-proxy.x86_64 0:1.16.5-10.el7_9.14 sssd-tools.x86_64 0:1.16.5-10.el7_9.14
[root@ip-10-0-199-233 ~]# systemctl stop sssd
[root@ip-10-0-199-233 ~]# rm -f /var/log/sssd/*
[root@ip-10-0-199-233 ~]# rm -f /var/lib/sss/{mc,db}/*
[root@ip-10-0-199-233 ~]# systemctl reboot
Connection to 10.0.199.233 closed by remote host.
Connection to 10.0.199.233 closed.
[root@899c33979ec2 reboot]# sleep 50; ssh 10.0.199.233
Warning: Permanently added '10.0.199.233' (ED25519) to the list of known hosts.
Last login: Thu Nov 17 05:52:17 2022 from 10.74.18.199
[root@ip-10-0-199-233 ~]# grep -i 'Destroying the old c-ares channel' /var/log/sssd/sssd_domain-4q9b.com.log
(2022-11-17 6:55:05): [be[domain-4q9b.com]] [recreate_ares_channel] (0x0100): Destroying the old c-ares channel
[root@ip-10-0-199-233 ~]# grep -i '[recreate_ares_channel] (0x0100): Initializing new c-ares channel' /var/log/sssd/sssd_domain-4q9b.com.log
(2022-11-17 6:55:04): [be[domain-4q9b.com]] [recreate_ares_channel] (0x0100): Initializing new c-ares channel
(2022-11-17 6:55:05): [be[domain-4q9b.com]] [recreate_ares_channel] (0x0100): Initializing new c-ares channel
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sssd bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8950 |
Description of problem: ----------------------- SSSD is starting ofline after reboot. Restarting the service fix the issue. After adding the following lines under [nss] section... offline_timeout = 5 offline_timeout_random_offset = 10 offline_timeout_max = 0 ... issue was fixed Current sssd.conf _______________________________________________________ [sssd] domains = example.com config_file_version = 2 services = nss, pam, sudo default_domain_suffix = example.com [domain/example.com] ad_domain = example.com krb5_realm = EXAMPLE.COM realmd_tags = manages-system joined-with-adcli cache_credentials = True id_provider = ad krb5_store_password_if_offline = True default_shell = /bin/bash ldap_id_mapping = True use_fully_qualified_names = True fallback_homedir = /home/%u@%d access_provider = ad case_sensitive = Preserving dyndns_update = false dyndns_update_ptr = false override_homedir = /home/%d/%u override_shell = /bin/bash ad_gpo_ignore_unreadable = True ad_enable_gc = False sudo_provider = ldap [pam] [nss] _______________________________________________________ In sssd_example.com: _______________________________________________________ (2022-08-01 12:00:15): [be[example.com]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.example.com' (2022-08-01 12:00:15): [be[example.com]] [schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds (2022-08-01 12:00:15): [be[example.com]] [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher (2022-08-01 12:00:15): [be[example.com]] [request_watch_destructor] (0x0400): Deleting request watch (2022-08-01 12:00:15): [be[example.com]] [resolv_discover_srv_done] (0x0040): SRV query failed [11]: Could not contact DNS servers (2022-08-01 12:00:15): [be[example.com]] [ad_cldap_ping_done] (0x0040): Unable to get site and forest information [1432158237]: SRV lookup error (2022-08-01 12:00:15): [be[example.com]] [fo_set_port_status] (0x0100): Marking port 0 of server '(no name)' as 'not working' (2022-08-01 12:00:15): [be[example.com]] [resolve_srv_done] (0x0040): Unable to resolve SRV [1432158237]: SRV lookup error (2022-08-01 12:00:15): [be[example.com]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'AD' as 'not resolved' (2022-08-01 12:00:15): [be[example.com]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned [1432158237]: SRV lookup error (2022-08-01 12:00:15): [be[example.com]] [be_resolve_server_process] (0x1000): Trying with the next one! (2022-08-01 12:00:15): [be[example.com]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD' (2022-08-01 12:00:15): [be[example.com]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'not working' (2022-08-01 12:00:15): [be[example.com]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues. (2022-08-01 12:00:15): [be[example.com]] [fo_resolve_service_send] (0x0020): No available servers for service 'AD' (2022-08-01 12:00:15): [be[example.com]] [be_resolve_server_done] (0x1000): Server resolution failed: [5]: Input/output error (2022-08-01 12:00:15): [be[example.com]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error]) (2022-08-01 12:00:15): [be[example.com]] [be_mark_offline] (0x2000): Going offline! (2022-08-01 12:00:15): [be[example.com]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask. (2022-08-01 12:00:15): [be[example.com]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created (2022-08-01 12:00:15): [be[example.com]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 88 seconds from now [1659373303] (2022-08-01 12:00:15): [be[example.com]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks. (2022-08-01 12:00:15): [be[example.com]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1 (2022-08-01 12:00:15): [be[example.com]] [ad_subdomains_refresh_connect_done] (0x0020): Unable to connect to LDAP [11]: Resource temporarily unavailable (2022-08-01 12:00:15): [be[example.com]] [ad_subdomains_refresh_connect_done] (0x0080): No AD server is available, cannot get the subdomain list while offline (2022-08-01 12:00:15): [be[example.com]] [be_ptask_done] (0x0040): Task [Subdomains Refresh]: failed with [1432158212]: SSSD is offline (2022-08-01 12:00:15): [be[example.com]] [be_ptask_schedule] (0x0400): Task [Subdomains Refresh]: scheduling task 14400 seconds from now [1659387615] (2022-08-01 12:00:15): [be[example.com]] [sdap_id_release_conn_data] (0x4000): releasing unused connection ... (2022-08-01 12:00:15): [be[example.com]] [sbus_dispatch] (0x4000): Dispatching. (2022-08-01 12:00:15): [be[example.com]] [id_callback] (0x0100): Got id ack and version (1) from Monitor (2022-08-01 12:00:15): [be[example.com]] [be_ptask_offline_cb] (0x0400): Back end is offline (2022-08-01 12:00:15): [be[example.com]] [be_ptask_disable] (0x0400): Task [Subdomains Refresh]: disabling task (2022-08-01 12:00:15): [be[example.com]] [be_ptask_offline_cb] (0x0400): Back end is offline (2022-08-01 12:00:15): [be[example.com]] [be_ptask_disable] (0x0400): Task [SUDO Smart Refresh]: disabling task (2022-08-01 12:00:15): [be[example.com]] [sbus_server_init_new_connection] (0x0200): Entering. (2022-08-01 12:00:15): [be[example.com]] [sbus_server_init_new_connection] (0x0200): Adding connection 0x55861d7f80d0. (2022-08-01 12:00:15): [be[example.com]] [sbus_init_connection] (0x0400): Adding connection 0x55861d7f80d0 (2022-08-01 12:00:15): [be[example.com]] [sbus_add_watch] (0x2000): 0x55861d7f5b20/0x55861d78d910 (21), -/W (disabled) (2022-08-01 12:00:15): [be[example.com]] [sbus_toggle_watch] (0x4000): 0x55861d7f5b20/0x55861d7f3340 (21), R/- (enabled) (2022-08-01 12:00:15): [be[example.com]] [sbus_server_init_new_connection] (0x0200): Got a connection (2022-08-01 12:00:15): [be[example.com]] [dp_client_init] (0x0100): Set-up Backend ID timeout [0x55861d7fe8a0] ... (2022-08-01 12:00:15): [be[example.com]] [sdap_id_op_connect_step] (0x4000): beginning to connect (2022-08-01 12:00:15): [be[example.com]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD' (2022-08-01 12:00:15): [be[example.com]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'not working' (2022-08-01 12:00:15): [be[example.com]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues. (2022-08-01 12:00:15): [be[example.com]] [fo_resolve_service_send] (0x0020): No available servers for service 'AD' (2022-08-01 12:00:15): [be[example.com]] [be_resolve_server_done] (0x1000): Server resolution failed: [5]: Input/output error (2022-08-01 12:00:15): [be[example.com]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error]) (2022-08-01 12:00:15): [be[example.com]] [be_mark_offline] (0x2000): Going offline! (2022-08-01 12:00:15): [be[example.com]] [be_mark_offline] (0x2000): Enable check_if_online_ptask. (2022-08-01 12:00:15): [be[example.com]] [be_ptask_enable] (0x0080): Task [Check if online (periodic)]: already enabled (2022-08-01 12:00:15): [be[example.com]] [be_run_offline_cb] (0x4000): Flag indicates that offline callback were already called. _______________________________________________________ ----------------------------------------------------------------------------------------- Version-Release number of selected component (if applicable): ------------------------------------------------------------- sssd-1.16.5-10.el7_9.13.x86_64 sssd-ad-1.16.5-10.el7_9.13.x86_64 sssd-client-1.16.5-10.el7_9.13.x86_64 sssd-common-1.16.5-10.el7_9.13.x86_64 sssd-common-pac-1.16.5-10.el7_9.13.x86_64 sssd-ipa-1.16.5-10.el7_9.13.x86_64 sssd-krb5-1.16.5-10.el7_9.13.x86_64 sssd-krb5-common-1.16.5-10.el7_9.13.x86_64 sssd-ldap-1.16.5-10.el7_9.13.x86_64 sssd-proxy-1.16.5-10.el7_9.13.x86_64 ----------------------------------------------------------------------------------------- How reproducible: ----------------- This is happening always after reboot the system. ----------------------------------------------------------------------------------------- Steps to Reproduce: ------------------- 1. Join AD domain 2. Configure SSSD as in the description 3. Reboot the system ----------------------------------------------------------------------------------------- Actual results: --------------- SSSD is starting offline, and after restarting the service it is working fine ----------------------------------------------------------------------------------------- Expected results: ----------------- SSSD start online and working. ----------------------------------------------------------------------------------------- Additional info: ---------------- - This is not related to SSSD starting before network. - This looks similar to this bug: Bug 1379415 - SSSD always boots in Offline mode [*] https://bugzilla.redhat.com/show_bug.cgi?id=1379415