RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2116207 - SSSD starting offline after reboot [rhel-7.9.z]
Summary: SSSD starting offline after reboot [rhel-7.9.z]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sssd
Version: 7.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Alejandro López
QA Contact: Steeve Goveas
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-07 23:54 UTC by Chino Soliard
Modified: 2022-12-28 17:49 UTC (History)
5 users (show)

Fixed In Version: sssd-1.16.5-10.el7_9.14
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-13 11:19:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 6383 0 None open sssd is not waiting for network-online.target 2022-10-10 12:37:39 UTC
Github SSSD sssd pull 6413 0 None Draft BACKEND: Reload resolv.conf after initialization 2022-11-07 21:03:54 UTC
Red Hat Issue Tracker RHELPLAN-130449 0 None None None 2022-08-08 00:04:12 UTC
Red Hat Issue Tracker SSSD-4936 0 None None None 2022-08-11 15:30:25 UTC
Red Hat Product Errata RHBA-2022:8950 0 None None None 2022-12-13 11:19:46 UTC

Description Chino Soliard 2022-08-07 23:54:28 UTC
Description of problem:
-----------------------

SSSD is starting ofline after reboot. Restarting the service fix the issue.

After adding the following lines under [nss] section...

    offline_timeout = 5
    offline_timeout_random_offset = 10
    offline_timeout_max = 0

... issue was fixed

Current sssd.conf
    _______________________________________________________

    [sssd]
    domains = example.com
    config_file_version = 2
    services = nss, pam, sudo
    default_domain_suffix = example.com

    [domain/example.com]
    ad_domain = example.com
    krb5_realm = EXAMPLE.COM
    realmd_tags = manages-system joined-with-adcli
    cache_credentials = True
    id_provider = ad
    krb5_store_password_if_offline = True
    default_shell = /bin/bash
    ldap_id_mapping = True
    use_fully_qualified_names = True
    fallback_homedir = /home/%u@%d
    access_provider = ad
    case_sensitive = Preserving
    dyndns_update = false
    dyndns_update_ptr = false
    override_homedir = /home/%d/%u
    override_shell = /bin/bash
    ad_gpo_ignore_unreadable = True
    ad_enable_gc = False
    sudo_provider = ldap

    [pam]


    [nss]
    _______________________________________________________

In sssd_example.com:
    _______________________________________________________

    (2022-08-01 12:00:15): [be[example.com]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.example.com'
    (2022-08-01 12:00:15): [be[example.com]] [schedule_request_timeout] (0x2000): Scheduling a timeout of 6 seconds
    (2022-08-01 12:00:15): [be[example.com]] [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher
    (2022-08-01 12:00:15): [be[example.com]] [request_watch_destructor] (0x0400): Deleting request watch
    (2022-08-01 12:00:15): [be[example.com]] [resolv_discover_srv_done] (0x0040): SRV query failed [11]: Could not contact DNS servers
    (2022-08-01 12:00:15): [be[example.com]] [ad_cldap_ping_done] (0x0040): Unable to get site and forest information [1432158237]: SRV lookup error
    (2022-08-01 12:00:15): [be[example.com]] [fo_set_port_status] (0x0100): Marking port 0 of server '(no name)' as 'not working'
    (2022-08-01 12:00:15): [be[example.com]] [resolve_srv_done] (0x0040): Unable to resolve SRV [1432158237]: SRV lookup error
    (2022-08-01 12:00:15): [be[example.com]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'AD' as 'not resolved'
    (2022-08-01 12:00:15): [be[example.com]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned [1432158237]: SRV lookup error
    (2022-08-01 12:00:15): [be[example.com]] [be_resolve_server_process] (0x1000): Trying with the next one!
    (2022-08-01 12:00:15): [be[example.com]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
    (2022-08-01 12:00:15): [be[example.com]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'not working'
    (2022-08-01 12:00:15): [be[example.com]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues.
    (2022-08-01 12:00:15): [be[example.com]] [fo_resolve_service_send] (0x0020): No available servers for service 'AD'
    (2022-08-01 12:00:15): [be[example.com]] [be_resolve_server_done] (0x1000): Server resolution failed: [5]: Input/output error
    (2022-08-01 12:00:15): [be[example.com]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
    (2022-08-01 12:00:15): [be[example.com]] [be_mark_offline] (0x2000): Going offline!
    (2022-08-01 12:00:15): [be[example.com]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask.
    (2022-08-01 12:00:15): [be[example.com]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created
    (2022-08-01 12:00:15): [be[example.com]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 88 seconds from now [1659373303]
    (2022-08-01 12:00:15): [be[example.com]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
    (2022-08-01 12:00:15): [be[example.com]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1
    (2022-08-01 12:00:15): [be[example.com]] [ad_subdomains_refresh_connect_done] (0x0020): Unable to connect to LDAP [11]: Resource temporarily unavailable
    (2022-08-01 12:00:15): [be[example.com]] [ad_subdomains_refresh_connect_done] (0x0080): No AD server is available, cannot get the subdomain list while offline
    (2022-08-01 12:00:15): [be[example.com]] [be_ptask_done] (0x0040): Task [Subdomains Refresh]: failed with [1432158212]: SSSD is offline
    (2022-08-01 12:00:15): [be[example.com]] [be_ptask_schedule] (0x0400): Task [Subdomains Refresh]: scheduling task 14400 seconds from now [1659387615]
    (2022-08-01 12:00:15): [be[example.com]] [sdap_id_release_conn_data] (0x4000): releasing unused connection
    ...
    (2022-08-01 12:00:15): [be[example.com]] [sbus_dispatch] (0x4000): Dispatching.
    (2022-08-01 12:00:15): [be[example.com]] [id_callback] (0x0100): Got id ack and version (1) from Monitor
    (2022-08-01 12:00:15): [be[example.com]] [be_ptask_offline_cb] (0x0400): Back end is offline
    (2022-08-01 12:00:15): [be[example.com]] [be_ptask_disable] (0x0400): Task [Subdomains Refresh]: disabling task
    (2022-08-01 12:00:15): [be[example.com]] [be_ptask_offline_cb] (0x0400): Back end is offline
    (2022-08-01 12:00:15): [be[example.com]] [be_ptask_disable] (0x0400): Task [SUDO Smart Refresh]: disabling task
    (2022-08-01 12:00:15): [be[example.com]] [sbus_server_init_new_connection] (0x0200): Entering.
    (2022-08-01 12:00:15): [be[example.com]] [sbus_server_init_new_connection] (0x0200): Adding connection 0x55861d7f80d0.
    (2022-08-01 12:00:15): [be[example.com]] [sbus_init_connection] (0x0400): Adding connection 0x55861d7f80d0
    (2022-08-01 12:00:15): [be[example.com]] [sbus_add_watch] (0x2000): 0x55861d7f5b20/0x55861d78d910 (21), -/W (disabled)
    (2022-08-01 12:00:15): [be[example.com]] [sbus_toggle_watch] (0x4000): 0x55861d7f5b20/0x55861d7f3340 (21), R/- (enabled)
    (2022-08-01 12:00:15): [be[example.com]] [sbus_server_init_new_connection] (0x0200): Got a connection
    (2022-08-01 12:00:15): [be[example.com]] [dp_client_init] (0x0100): Set-up Backend ID timeout [0x55861d7fe8a0]
    ...
    (2022-08-01 12:00:15): [be[example.com]] [sdap_id_op_connect_step] (0x4000): beginning to connect
    (2022-08-01 12:00:15): [be[example.com]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'AD'
    (2022-08-01 12:00:15): [be[example.com]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'not working'
    (2022-08-01 12:00:15): [be[example.com]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues.
    (2022-08-01 12:00:15): [be[example.com]] [fo_resolve_service_send] (0x0020): No available servers for service 'AD'
    (2022-08-01 12:00:15): [be[example.com]] [be_resolve_server_done] (0x1000): Server resolution failed: [5]: Input/output error
    (2022-08-01 12:00:15): [be[example.com]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
    (2022-08-01 12:00:15): [be[example.com]] [be_mark_offline] (0x2000): Going offline!
    (2022-08-01 12:00:15): [be[example.com]] [be_mark_offline] (0x2000): Enable check_if_online_ptask.
    (2022-08-01 12:00:15): [be[example.com]] [be_ptask_enable] (0x0080): Task [Check if online (periodic)]: already enabled
    (2022-08-01 12:00:15): [be[example.com]] [be_run_offline_cb] (0x4000): Flag indicates that offline callback were already called.
    _______________________________________________________


-----------------------------------------------------------------------------------------

Version-Release number of selected component (if applicable):
-------------------------------------------------------------

sssd-1.16.5-10.el7_9.13.x86_64
sssd-ad-1.16.5-10.el7_9.13.x86_64
sssd-client-1.16.5-10.el7_9.13.x86_64
sssd-common-1.16.5-10.el7_9.13.x86_64
sssd-common-pac-1.16.5-10.el7_9.13.x86_64
sssd-ipa-1.16.5-10.el7_9.13.x86_64
sssd-krb5-1.16.5-10.el7_9.13.x86_64
sssd-krb5-common-1.16.5-10.el7_9.13.x86_64
sssd-ldap-1.16.5-10.el7_9.13.x86_64
sssd-proxy-1.16.5-10.el7_9.13.x86_64

-----------------------------------------------------------------------------------------

How reproducible:
-----------------

This is happening always after reboot the system.

-----------------------------------------------------------------------------------------

Steps to Reproduce:
-------------------

1. Join AD domain
2. Configure SSSD as in the description
3. Reboot the system

-----------------------------------------------------------------------------------------

Actual results:
---------------

SSSD is starting offline, and after restarting the service it is working fine

-----------------------------------------------------------------------------------------

Expected results:
-----------------

SSSD start online and working.

-----------------------------------------------------------------------------------------

Additional info:
----------------

- This is not related to SSSD starting before network.

- This looks similar to this bug:

      Bug 1379415 - SSSD always boots in Offline mode
      [*] https://bugzilla.redhat.com/show_bug.cgi?id=1379415

Comment 6 Alexey Tikhonov 2022-11-07 21:03:55 UTC
Upstream PR: https://github.com/SSSD/sssd/pull/6413

Comment 7 Alexey Tikhonov 2022-11-15 09:48:12 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/6413

* `sssd-1-16`
    * c52a5a640f0574f28281dd62238ffc7303eb4391 - BACKEND: Reload resolv.conf after initialization

Comment 8 Steeve Goveas 2022-11-18 06:02:18 UTC
This issue is hard to reproduce as it needs an invalid /etc/resolv.conf when SSSD starts and becomes valid after the SSSD backend is started, but before the backend is running in a state where it can receive and handle sbus messages from the monitor. Testing it with the presence of log messages from the fixed version.

Verified with

[root@ip-10-0-199-233 ~]# yum update sssd
.
.
.
Updated:
  sssd.x86_64 0:1.16.5-10.el7_9.14                                                                                                                                                                                 

Dependency Updated:
  libipa_hbac.x86_64 0:1.16.5-10.el7_9.14                libsss_autofs.x86_64 0:1.16.5-10.el7_9.14         libsss_idmap.x86_64 0:1.16.5-10.el7_9.14         libsss_nss_idmap.x86_64 0:1.16.5-10.el7_9.14         
  libsss_simpleifp.x86_64 0:1.16.5-10.el7_9.14           libsss_sudo.x86_64 0:1.16.5-10.el7_9.14           python-sss.x86_64 0:1.16.5-10.el7_9.14           python-sssdconfig.noarch 0:1.16.5-10.el7_9.14        
  sssd-ad.x86_64 0:1.16.5-10.el7_9.14                    sssd-client.x86_64 0:1.16.5-10.el7_9.14           sssd-common.x86_64 0:1.16.5-10.el7_9.14          sssd-common-pac.x86_64 0:1.16.5-10.el7_9.14          
  sssd-dbus.x86_64 0:1.16.5-10.el7_9.14                  sssd-ipa.x86_64 0:1.16.5-10.el7_9.14              sssd-kcm.x86_64 0:1.16.5-10.el7_9.14             sssd-krb5.x86_64 0:1.16.5-10.el7_9.14                
  sssd-krb5-common.x86_64 0:1.16.5-10.el7_9.14           sssd-ldap.x86_64 0:1.16.5-10.el7_9.14             sssd-proxy.x86_64 0:1.16.5-10.el7_9.14           sssd-tools.x86_64 0:1.16.5-10.el7_9.14

[root@ip-10-0-199-233 ~]# systemctl stop sssd
[root@ip-10-0-199-233 ~]# rm -f /var/log/sssd/*
[root@ip-10-0-199-233 ~]# rm -f /var/lib/sss/{mc,db}/*

[root@ip-10-0-199-233 ~]# systemctl reboot
Connection to 10.0.199.233 closed by remote host.
Connection to 10.0.199.233 closed.

[root@899c33979ec2 reboot]# sleep 50; ssh 10.0.199.233
Warning: Permanently added '10.0.199.233' (ED25519) to the list of known hosts.
Last login: Thu Nov 17 05:52:17 2022 from 10.74.18.199

[root@ip-10-0-199-233 ~]# grep -i 'Destroying the old c-ares channel' /var/log/sssd/sssd_domain-4q9b.com.log 
(2022-11-17  6:55:05): [be[domain-4q9b.com]] [recreate_ares_channel] (0x0100): Destroying the old c-ares channel

[root@ip-10-0-199-233 ~]# grep -i '[recreate_ares_channel] (0x0100): Initializing new c-ares channel' /var/log/sssd/sssd_domain-4q9b.com.log
(2022-11-17  6:55:04): [be[domain-4q9b.com]] [recreate_ares_channel] (0x0100): Initializing new c-ares channel
(2022-11-17  6:55:05): [be[domain-4q9b.com]] [recreate_ares_channel] (0x0100): Initializing new c-ares channel

Comment 16 errata-xmlrpc 2022-12-13 11:19:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8950


Note You need to log in before you can comment on or make changes to this bug.