Bug 2333179

Summary: /etc/hosts containing an entry with a name and alias but no IPv4 address results in sssd_be crash
Product: [Fedora] Fedora Reporter: Gregory Lee Bartholomew <gregory.lee.bartholomew>
Component: sssdAssignee: sssd-maintainers <sssd-maintainers>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 41CC: abokovoy, atikhono, lslebodn, pbrezina, sbose, ssorce, sssd-maintainers
Target Milestone: ---Keywords: Upgrades
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Wed 2024-12-18 12:54:36 CST SIGSEGV sssd_be
none
sssd.log
none
sssd_ldap.log
none
sssd_ldap.log with debug_level = 9 none

Description Gregory Lee Bartholomew 2024-12-19 02:14:31 UTC
sssd.service fails to start when the /var/lib/sss/gpo_cache directory is empty.

Reproducible: Always

Steps to Reproduce:
# This happened on two servers that I just updated.
0. have sssd configured as a ldap client
1. dnf update --releasever=41 && reboot

Actual Results:  
[root@redacted gpo_cache]# systemctl status sssd.service
× sssd.service - System Security Services Daemon
     Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: failed (Result: exit-code) since Wed 2024-12-18 19:57:28 CST; 1min 56s ago
   Duration: 34.808s
 Invocation: b86216780f7849ec9a6c69a271a88fe9
    Process: 672112 ExecStartPre=/bin/chown -f sssd:sssd /etc/sssd (code=exited, status=0/SUCCESS)
    Process: 672114 ExecStartPre=/bin/chown -f sssd:sssd /etc/sssd/sssd.conf (code=exited, status=0/SUCCESS)
    Process: 672116 ExecStartPre=/bin/chown -f -R sssd:sssd /etc/sssd/conf.d (code=exited, status=0/SUCCESS)
    Process: 672118 ExecStartPre=/bin/chown -f -R sssd:sssd /etc/sssd/pki (code=exited, status=0/SUCCESS)
    Process: 672120 ExecStartPre=/bin/sh -c /bin/chown -f sssd:sssd /var/lib/sss/db/*.ldb (code=exited, status=0/SUCCESS)
    Process: 672123 ExecStartPre=/bin/sh -c /bin/chown -f sssd:sssd /var/lib/sss/gpo_cache/* (code=exited, status=1/FAILURE)
    Process: 672125 ExecStartPre=/bin/sh -c /bin/chown -f sssd:sssd /var/log/sssd/*.log (code=exited, status=0/SUCCESS)
    Process: 672127 ExecStart=/usr/sbin/sssd -i ${DEBUG_LOGGER} (code=exited, status=1/FAILURE)
   Main PID: 672127 (code=exited, status=1/FAILURE)
         IO: 0B read, 0B written
   Mem peak: 43.5M
        CPU: 505ms

Dec 18 19:56:53 redacted.com sssd_pam[672133]: Starting up
Dec 18 19:57:00 redacted.com sssd_be[672182]: Starting up
Dec 18 19:57:03 redacted.com sssd_be[672262]: Starting up
Dec 18 19:57:17 redacted.com sssd_be[672374]: Starting up
Dec 18 19:57:28 redacted.com sssd[672127]: Exiting the SSSD. Could not restart criti…dap].
Dec 18 19:57:28 redacted.com sssd_ssh[672134]: Shutting down (status = 0)
Dec 18 19:57:28 redacted.com sssd_pam[672133]: Shutting down (status = 0)
Dec 18 19:57:28 redacted.com sssd_nss[672132]: Shutting down (status = 0)
Dec 18 19:57:28 redacted.com systemd[1]: sssd.service: Main process exited, code=exi…ILURE
Dec 18 19:57:28 redacted.com systemd[1]: sssd.service: Failed with result 'exit-code'.

Expected Results:  
[root@redacted gpo_cache]# systemctl status sssd.service
● sssd.service - System Security Services Daemon
     Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: active (running) since Wed 2024-12-18 20:00:07 CST; 4s ago
 Invocation: 724c16248ba44bd681be6e4baeef858e
    Process: 673775 ExecStartPre=/bin/chown -f sssd:sssd /etc/sssd (code=exited, status=0/SUCCESS)
    Process: 673777 ExecStartPre=/bin/chown -f sssd:sssd /etc/sssd/sssd.conf (code=exited, status=0/SUCCESS)
    Process: 673779 ExecStartPre=/bin/chown -f -R sssd:sssd /etc/sssd/conf.d (code=exited, status=0/SUCCESS)
    Process: 673782 ExecStartPre=/bin/chown -f -R sssd:sssd /etc/sssd/pki (code=exited, status=0/SUCCESS)
    Process: 673784 ExecStartPre=/bin/sh -c /bin/chown -f sssd:sssd /var/lib/sss/db/*.ldb (code=exited, status=0/SUCCESS)
    Process: 673786 ExecStartPre=/bin/sh -c /bin/chown -f sssd:sssd /var/lib/sss/gpo_cache/* (code=exited, status=0/SUCCESS)
    Process: 673788 ExecStartPre=/bin/sh -c /bin/chown -f sssd:sssd /var/log/sssd/*.log (code=exited, status=0/SUCCESS)
   Main PID: 673790 (sssd)
         IO: 0B read, 0B written
      Tasks: 5 (limit: 153881)
     Memory: 42M (peak: 42.7M)
        CPU: 274ms
     CGroup: /system.slice/sssd.service
             ├─673790 /usr/sbin/sssd -i --logger=files
             ├─673799 /usr/libexec/sssd/sssd_be --domain ldap --logger=files
             ├─673800 /usr/libexec/sssd/sssd_nss --logger=files
             ├─673801 /usr/libexec/sssd/sssd_pam --logger=files
             └─673802 /usr/libexec/sssd/sssd_ssh --logger=files

Dec 18 20:00:07 redacted.com sssd[673790]: Starting up
Dec 18 20:00:07 redacted.com sssd_be[673799]: Starting up
Dec 18 20:00:07 redacted.com sssd_nss[673800]: Starting up
Dec 18 20:00:07 redacted.com sssd_ssh[673802]: Starting up
Dec 18 20:00:07 redacted.com sssd_pam[673801]: Starting up

I was able to work around the problem by running the following command:

touch /var/lib/sss/gpo_cache/dummy

Comment 1 Gregory Lee Bartholomew 2024-12-19 02:19:58 UTC
Quick follow-up -- I was able to start the sssd.service. I was even able to query users briefly. However, the sssd.service has crashed again. The work around I found seems to be insufficient to completely resolve whatever the problem is.

Comment 2 Gregory Lee Bartholomew 2024-12-19 02:38:07 UTC
Dec 18 20:35:04 redacted.com sssd_be[182660]: Starting up
Dec 18 20:35:14 redacted.com audit[182660]: ANOM_ABEND auid=4294967295 uid=986 gid=982 ses=4294967295 subj=system_u:system_r:sssd_t:s0 pid=182660 comm="sssd_be" exe="/usr/libexec/sssd/sssd_be" sig=11 res=1
Dec 18 20:35:14 redacted.com kernel: show_signal_msg: 12 callbacks suppressed
Dec 18 20:35:14 redacted.com kernel: sssd_be[182660]: segfault at 0 ip 00005648914144f5 sp 00007ffdba26b500 error 4 in sssd_be[1b4f5,5648913f9000+21000] likely on CPU 26 (core 5, socket 0)
Dec 18 20:35:14 redacted.com kernel: Code: 00 66 45 89 6c 24 02 41 0f 11 44 24 08 e9 66 ff ff ff 0f 1f 00 b9 02 00 00 00 66 41 c1 c5 08 66 89 08 48 8b 43 18 4a 8b 04 f0 <48> 8b 00 8b 00 66 45 89 6c 24 02 41 89 44 24 04 e9 39 ff ff ff 66
Dec 18 20:35:14 redacted.com kernel: audit: type=1701 audit(1734575714.471:31667): auid=4294967295 uid=986 gid=982 ses=4294967295 subj=system_u:system_r:sssd_t:s0 pid=182660 comm="sssd_be" exe="/usr/libexec/sssd/sssd_be" sig=11 res=1
Dec 18 20:35:14 redacted.com systemd-coredump[182700]: Process 182660 (sssd_be) of user 986 terminated abnormally with signal 11/SEGV, processing...
Dec 18 20:35:14 redacted.com audit: BPF prog-id=2391 op=LOAD
Dec 18 20:35:14 redacted.com kernel: audit: type=1334 audit(1734575714.494:31668): prog-id=2391 op=LOAD
Dec 18 20:35:14 redacted.com audit: BPF prog-id=2392 op=LOAD
Dec 18 20:35:14 redacted.com audit: BPF prog-id=2393 op=LOAD
Dec 18 20:35:14 redacted.com kernel: audit: type=1334 audit(1734575714.495:31669): prog-id=2392 op=LOAD
Dec 18 20:35:14 redacted.com kernel: audit: type=1334 audit(1734575714.495:31670): prog-id=2393 op=LOAD
Dec 18 20:35:14 redacted.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@85-182700-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 18 20:35:14 redacted.com kernel: audit: type=1130 audit(1734575714.500:31671): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@85-182700-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 18 20:35:14 redacted.com systemd-coredump[182701]: [🡕] Process 182660 (sssd_be) of user 986 terminated abnormally without generating a coredump.
Dec 18 20:35:14 redacted.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@85-182700-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 18 20:35:14 redacted.com kernel: audit: type=1131 audit(1734575714.697:31672): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@85-182700-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 18 20:35:14 redacted.com sssd[182480]: Exiting the SSSD. Could not restart critical service [ldap].

Comment 3 Gregory Lee Bartholomew 2024-12-19 02:41:40 UTC
Forgot to mention in the previous comment. I had run the following command before I saw the above errors in the journal:

[root@redacted ~]# dnf downgrade $(rpm -qa --qf="%{name}\n" | grep '^sssd')
Updating and loading repositories:
 Fedora 41 - x86_64                                    100% |  16.3 MiB/s |  35.4 MiB |  00m02s
 Fedora 41 openh264 (From Cisco) - x86_64              100% |  12.7 KiB/s |   6.0 KiB |  00m00s
Repositories loaded.
Package                          Arch    Version                       Repository          Size
Downgrading:
 libsss_autofs                   x86_64  2.10.0-1.fc41                 fedora          69.5 KiB
   replacing libsss_autofs       x86_64  2.10.0-2.fc41                 updates         69.5 KiB
 libsss_certmap                  x86_64  2.10.0-1.fc41                 fedora         144.3 KiB
   replacing libsss_certmap      x86_64  2.10.0-2.fc41                 updates        144.3 KiB
 libsss_idmap                    x86_64  2.10.0-1.fc41                 fedora          77.8 KiB
   replacing libsss_idmap        x86_64  2.10.0-2.fc41                 updates         77.8 KiB
 libsss_nss_idmap                x86_64  2.10.0-1.fc41                 fedora          86.3 KiB
   replacing libsss_nss_idmap    x86_64  2.10.0-2.fc41                 updates         86.3 KiB
 libsss_sudo                     x86_64  2.10.0-1.fc41                 fedora          57.9 KiB
   replacing libsss_sudo         x86_64  2.10.0-2.fc41                 updates         57.9 KiB
 python3-sss                     x86_64  2.10.0-1.fc41                 fedora          36.4 KiB
   replacing python3-sss         x86_64  2.10.0-2.fc41                 updates         36.4 KiB
 python3-sssdconfig              noarch  2.10.0-1.fc41                 fedora         284.4 KiB
   replacing python3-sssdconfig  noarch  2.10.0-2.fc41                 updates        284.4 KiB
 sssd-client                     x86_64  2.10.0-1.fc41                 fedora         375.1 KiB
   replacing sssd-client         x86_64  2.10.0-2.fc41                 updates        375.1 KiB
 sssd-common                     x86_64  2.10.0-1.fc41                 fedora           5.3 MiB
   replacing sssd-common         x86_64  2.10.0-2.fc41                 updates          5.3 MiB
 sssd-dbus                       x86_64  2.10.0-1.fc41                 fedora         325.1 KiB
   replacing sssd-dbus           x86_64  2.10.0-2.fc41                 updates        325.1 KiB
 sssd-krb5                       x86_64  2.10.0-1.fc41                 fedora          91.5 KiB
   replacing sssd-krb5           x86_64  2.10.0-2.fc41                 updates         91.5 KiB
 sssd-krb5-common                x86_64  2.10.0-1.fc41                 fedora         227.1 KiB
   replacing sssd-krb5-common    x86_64  2.10.0-2.fc41                 updates        227.1 KiB
 sssd-ldap                       x86_64  2.10.0-1.fc41                 fedora         182.4 KiB
   replacing sssd-ldap           x86_64  2.10.0-2.fc41                 updates        182.3 KiB
 sssd-nfs-idmap                  x86_64  2.10.0-1.fc41                 fedora          47.3 KiB
   replacing sssd-nfs-idmap      x86_64  2.10.0-2.fc41                 updates         47.3 KiB
 sssd-tools                      x86_64  2.10.0-1.fc41                 fedora         337.3 KiB
   replacing sssd-tools          x86_64  2.10.0-2.fc41                 updates        337.3 KiB

Transaction Summary:
 Replacing:         15 package
 Downgrading:       15 packages

Total size of inbound packages is 3 MiB. Need to download 3 MiB.
After this operation, 296 B extra will be used (install 8 MiB, remove 8 MiB).
Is this ok [y/N]: y
[ 1/15] sssd-nfs-idmap-0:2.10.0-1.fc41.x86_64          100% |  11.2 MiB/s |  34.4 KiB |  00m00s
[ 2/15] sssd-client-0:2.10.0-1.fc41.x86_64             100% |  26.3 MiB/s | 161.5 KiB |  00m00s
[ 3/15] sssd-krb5-common-0:2.10.0-1.fc41.x86_64        100% |  22.2 MiB/s |  90.9 KiB |  00m00s
[ 4/15] sssd-dbus-0:2.10.0-1.fc41.x86_64               100% |  30.5 MiB/s | 124.9 KiB |  00m00s
[ 5/15] sssd-tools-0:2.10.0-1.fc41.x86_64              100% |  34.5 MiB/s | 176.8 KiB |  00m00s
[ 6/15] sssd-krb5-0:2.10.0-1.fc41.x86_64               100% |  16.6 MiB/s |  68.0 KiB |  00m00s
[ 7/15] sssd-ldap-0:2.10.0-1.fc41.x86_64               100% |  38.0 MiB/s | 155.6 KiB |  00m00s
[ 8/15] python3-sss-0:2.10.0-1.fc41.x86_64             100% |   7.9 MiB/s |  24.1 KiB |  00m00s
[ 9/15] sssd-common-0:2.10.0-1.fc41.x86_64             100% |  40.4 MiB/s |   1.5 MiB |  00m00s
[10/15] python3-sssdconfig-0:2.10.0-1.fc41.noarch      100% |   3.9 MiB/s |  71.9 KiB |  00m00s
[11/15] libsss_sudo-0:2.10.0-1.fc41.x86_64             100% |   1.6 MiB/s |  30.3 KiB |  00m00s
[12/15] libsss_certmap-0:2.10.0-1.fc41.x86_64          100% |  27.9 MiB/s |  85.8 KiB |  00m00s
[13/15] libsss_autofs-0:2.10.0-1.fc41.x86_64           100% |   8.1 MiB/s |  33.0 KiB |  00m00s
[14/15] libsss_idmap-0:2.10.0-1.fc41.x86_64            100% |   8.8 MiB/s |  36.1 KiB |  00m00s
[15/15] libsss_nss_idmap-0:2.10.0-1.fc41.x86_64        100% |  20.0 MiB/s |  40.9 KiB |  00m00s
-----------------------------------------------------------------------------------------------
[15/15] Total                                          100% |  51.8 MiB/s |   2.6 MiB |  00m00s
Running transaction
[ 1/32] Verify package files                           100% | 468.0   B/s |  15.0   B |  00m00s
[ 2/32] Prepare transaction                            100% |  36.0   B/s |  30.0   B |  00m01s
[ 3/32] Downgrading libsss_idmap-0:2.10.0-1.fc41.x86_6 100% |   2.3 MiB/s |  79.0 KiB |  00m00s
[ 4/32] Downgrading libsss_certmap-0:2.10.0-1.fc41.x86 100% |   6.5 MiB/s | 146.4 KiB |  00m00s
[ 5/32] Downgrading libsss_nss_idmap-0:2.10.0-1.fc41.x 100% |   8.6 MiB/s |  87.6 KiB |  00m00s
[ 6/32] Downgrading sssd-client-0:2.10.0-1.fc41.x86_64 100% |   5.0 MiB/s | 382.3 KiB |  00m00s
[ 7/32] Downgrading libsss_autofs-0:2.10.0-1.fc41.x86_ 100% |   4.1 MiB/s |  70.7 KiB |  00m00s
[ 8/32] Downgrading libsss_sudo-0:2.10.0-1.fc41.x86_64 100% |   8.2 MiB/s |  58.7 KiB |  00m00s
[ 9/32] Downgrading python3-sssdconfig-0:2.10.0-1.fc41 100% |   9.1 MiB/s | 288.8 KiB |  00m00s
[10/32] Downgrading sssd-nfs-idmap-0:2.10.0-1.fc41.x86 100% | 451.8 KiB/s |  48.8 KiB |  00m00s
[11/32] Downgrading sssd-common-0:2.10.0-1.fc41.x86_64 100% |  15.4 MiB/s |   5.3 MiB |  00m00s
[12/32] Downgrading sssd-krb5-common-0:2.10.0-1.fc41.x 100% |   8.9 MiB/s | 228.6 KiB |  00m00s
[13/32] Downgrading sssd-dbus-0:2.10.0-1.fc41.x86_64   100% |   7.4 MiB/s | 327.2 KiB |  00m00s
[14/32] Downgrading python3-sss-0:2.10.0-1.fc41.x86_64 100% |   2.1 MiB/s |  37.1 KiB |  00m00s
[15/32] Downgrading sssd-tools-0:2.10.0-1.fc41.x86_64  100% |   3.4 MiB/s | 350.0 KiB |  00m00s
[16/32] Downgrading sssd-krb5-0:2.10.0-1.fc41.x86_64   100% |   4.6 MiB/s |  93.7 KiB |  00m00s
[17/32] Downgrading sssd-ldap-0:2.10.0-1.fc41.x86_64   100% |   8.2 MiB/s | 184.9 KiB |  00m00s
[18/32] Removing sssd-tools-0:2.10.0-2.fc41.x86_64     100% |   6.3 KiB/s |  78.0   B |  00m00s
[19/32] Removing sssd-ldap-0:2.10.0-2.fc41.x86_64      100% | 652.0   B/s |  15.0   B |  00m00s
[20/32] Removing sssd-dbus-0:2.10.0-2.fc41.x86_64      100% | 224.0   B/s |  13.0   B |  00m00s
[21/32] Removing sssd-krb5-0:2.10.0-2.fc41.x86_64      100% |   1.7 KiB/s |  14.0   B |  00m00s
[22/32] Removing python3-sssdconfig-0:2.10.0-2.fc41.no 100% |   6.1 KiB/s |  25.0   B |  00m00s
[23/32] Removing python3-sss-0:2.10.0-2.fc41.x86_64    100% | 500.0   B/s |   4.0   B |  00m00s
[24/32] Removing sssd-krb5-common-0:2.10.0-2.fc41.x86_ 100% | 476.0   B/s |  10.0   B |  00m00s
[25/32] Removing sssd-common-0:2.10.0-2.fc41.x86_64    100% | 923.0   B/s | 216.0   B |  00m00s
[26/32] Removing sssd-client-0:2.10.0-2.fc41.x86_64    100% |   3.2 KiB/s |  49.0   B |  00m00s
[27/32] Removing libsss_idmap-0:2.10.0-2.fc41.x86_64   100% |   1.6 KiB/s |   8.0   B |  00m00s
[28/32] Removing libsss_nss_idmap-0:2.10.0-2.fc41.x86_ 100% |   1.3 KiB/s |   8.0   B |  00m00s
[29/32] Removing libsss_autofs-0:2.10.0-2.fc41.x86_64  100% |   1.6 KiB/s |   8.0   B |  00m00s
[30/32] Removing libsss_sudo-0:2.10.0-2.fc41.x86_64    100% |   1.2 KiB/s |   6.0   B |  00m00s
[31/32] Removing sssd-nfs-idmap-0:2.10.0-2.fc41.x86_64 100% |   1.1 KiB/s |   9.0   B |  00m00s
[32/32] Removing libsss_certmap-0:2.10.0-2.fc41.x86_64 100% |   1.0   B/s |  12.0   B |  00m08s
>>> Running trigger-install scriptlet: systemd-0:256.9-2.fc41.x86_64
>>> Finished trigger-install scriptlet: systemd-0:256.9-2.fc41.x86_64
>>> Scriptlet output:
>>> Detected autofs mount point /home during canonicalization of home.
>>> Skipping /home
>>> 
Complete!
[root@redacted ~]#

Comment 4 Alexey Tikhonov 2024-12-19 07:56:54 UTC
Hi.

> ExecStartPre=/bin/sh -c /bin/chown -f sssd:sssd /var/lib/sss/gpo_cache/* (code=exited, status=1/FAILURE)
isn't a problem because sss.service contains
```
ExecStartPre=+-/bin/sh -c "/bin/chown -f sssd:sssd /var/lib/sss/gpo_cache/*"
```
 - so errors are ignored.

This
> sssd[672127]: Exiting the SSSD. Could not restart criti…dap].
is a real problem.

Can you share /var/log/sssd/* ?

Comment 5 Sumit Bose 2024-12-19 10:41:20 UTC
Hi,

can you also check with `coredumctl` if there are any coredumps from the `sssd_be` process stored and attach them as well?

bye,
Sumit

Comment 6 Alexey Tikhonov 2024-12-19 11:24:13 UTC
(In reply to Sumit Bose from comment #5)
> 
> can you also check with `coredumctl` if there are any coredumps from the
> `sssd_be` process stored and attach them as well?

Probably you need to enable code dump generation on the system.

Comment 7 Gregory Lee Bartholomew 2024-12-19 15:46:09 UTC
Created attachment 2063216 [details]
Wed 2024-12-18 12:54:36 CST SIGSEGV sssd_be

This coredump of sssd_be is from a third, older server with a similar configuration that is crashing in the same way. I'm going to rollback the root filesystems of the other two servers since I need them for production.

Comment 8 Gregory Lee Bartholomew 2024-12-19 16:06:05 UTC
Created attachment 2063217 [details]
sssd.log

Comment 9 Gregory Lee Bartholomew 2024-12-19 16:06:41 UTC
Created attachment 2063218 [details]
sssd_ldap.log

Comment 10 Gregory Lee Bartholomew 2024-12-19 16:17:14 UTC
Just FYI -- I've rolled back the other two servers to Fedora 40 (sssd-ldap-2.9.5-1.fc40.x86_64) and they are working again. No configuration changes were made and all these servers are pointing at the same LDAP server.

Comment 11 Sumit Bose 2024-12-19 17:21:09 UTC
Hi,

thank you for the coredump and the logs. It looks there is something odd with the DNS lookup of the LDAP server. Is it expected that DNS will only return an IPv6 address for the LDAP server?

bye,
Sumit

Comment 12 Gregory Lee Bartholomew 2024-12-19 17:28:46 UTC
Yes, there is a back channel network between these servers which only has IPv6 addresses configured. I could try to route that differently, but I didn't expect that to be a source of problems. I am using IPv6 local addresses because I have the typical IPv4 local networks (10.0.0.0/8 and 192.168.0.0/16) routed to other places (and I don't want to use 172.16.0.0/12 because some external connections come in on another interface with those addresses).

Comment 13 Gregory Lee Bartholomew 2024-12-19 17:32:50 UTC
Correction, the LDAP connection should be via the BCN. The DNS servers are on IPv4 (But I don't think that should matter because ldap.cs.siue.edu is defined in /etc/hosts):

# grep ldap /etc/hosts
fd63:736e:6574:0:766d::1 vm-01 ldap.cs.siue.edu

Comment 14 Sumit Bose 2024-12-19 18:06:16 UTC
Hi,

thanks for the clarification. Yes, I can see from the logs as well that the address is taken from /etc/hosts, maybe this will help to reproduce the issue.

bye,
Sumit

Comment 15 Sumit Bose 2024-12-20 11:27:44 UTC
Hi,

so far I wasn't able to reproduce the issue with an IPv6 address from /etc/hosts. Do you have any permission restrictions on /etc/hosts or is it readable for everyone? Would it be possible to run SSSD on a test system with `debug_level = 9` in the [domain/...] section to get more details about what happened before the crash?

bye,
Sumit

Comment 16 Gregory Lee Bartholomew 2024-12-20 14:52:09 UTC
Created attachment 2063364 [details]
sssd_ldap.log with debug_level = 9

Sure, no problem.

# ls -al /etc/hosts
-rw-r--r--. 1 root root 1107 Dec 17 15:39 /etc/hosts

# grep -A 1 domain sssd.conf
domains = ldap

--
[domain/ldap]
debug_level = 9

# systemctl restart sssd.service

... <wait a few minutes> ...

# systemctl status sssd.service
× sssd.service - System Security Services Daemon
     Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf, 50-keep-warm.conf
     Active: failed (Result: exit-code) since Fri 2024-12-20 08:48:02 CST; 28s ago
   Duration: 47.360s
 Invocation: c19fc7d6f2da44d3a3e8acb8f7401645
    Process: 52286 ExecStartPre=/bin/chown -f sssd:sssd /etc/sssd (code=exited, status=0/SUCCESS)
    Process: 52288 ExecStartPre=/bin/chown -f sssd:sssd /etc/sssd/sssd.conf (code=exited, status=0/SUCCESS)
    Process: 52290 ExecStartPre=/bin/chown -f -R sssd:sssd /etc/sssd/conf.d (code=exited, status=0/SUCCESS)
    Process: 52292 ExecStartPre=/bin/chown -f -R sssd:sssd /etc/sssd/pki (code=exited, status=0/SUCCESS)
    Process: 52294 ExecStartPre=/bin/sh -c /bin/chown -f sssd:sssd /var/lib/sss/db/*.ldb (code=exited, status=0/SUCCESS)
    Process: 52296 ExecStartPre=/bin/sh -c /bin/chown -f sssd:sssd /var/lib/sss/gpo_cache/* (code=exited, status=1/FAILURE)
    Process: 52299 ExecStartPre=/bin/sh -c /bin/chown -f sssd:sssd /var/log/sssd/*.log (code=exited, status=0/SUCCESS)
    Process: 52301 ExecStart=/usr/sbin/sssd -i ${DEBUG_LOGGER} (code=exited, status=1/FAILURE)
   Main PID: 52301 (code=exited, status=1/FAILURE)
         IO: 0B read, 0B written
   Mem peak: 42.5M
        CPU: 606ms

Dec 20 08:47:14 redacted.com sssd_nss[52306]: Starting up
Dec 20 08:47:14 redacted.com sssd_ssh[52308]: Starting up
Dec 20 08:47:14 redacted.com sssd_pam[52307]: Starting up
Dec 20 08:47:14 redacted.com systemd[1]: Started sssd.service - System Security Services Daemon.
Dec 20 08:47:24 redacted.com sssd_be[52319]: Starting up
Dec 20 08:47:37 redacted.com sssd_be[52331]: Starting up
Dec 20 08:47:51 redacted.com sssd_be[52341]: Starting up
Dec 20 08:48:02 redacted.com sssd[52301]: Exiting the SSSD. Could not restart critical service [ldap].
Dec 20 08:48:02 redacted.com systemd[1]: sssd.service: Main process exited, code=exited, status=1/FAILURE
Dec 20 08:48:02 redacted.com systemd[1]: sssd.service: Failed with result 'exit-code'.

The resulting sssd_ldap.log is attached.

Comment 17 Gregory Lee Bartholomew 2024-12-20 15:01:03 UTC
I don't know if it is significant, but concerning permissions, I noticed that there are some files under /var/lib/sss that are not owned by sssd. For example:

# find /var/lib/sss/secrets ! -user sssd -exec ls -al {} \;
-rw-------. 1 root root 32 Aug  5  2020 /var/lib/sss/secrets/.secrets.mkey

There are more under pubconf/krb5.include.d and pipes/private.

Comment 18 Sumit Bose 2024-12-20 18:59:26 UTC
Hi,

thank you for the logs, now I can reproduce the issue. It is related to the alias entry in /etc/hosts. If you remove the `vm-01` alias form /etc/hosts it should work (at least it worked for me).

The reason is that `ares_gethostbyname_file()` from the c-ares library SSSD is using to resolve hostnames, returns an entry even if explicitly IPv4 addresses are requested. The entry will have the name and the alias entry but no address. The missing address is currently used as an indicator that the name component contains the path to an LDAPI socket. So this case should be properly checked after calling `ares_gethostbyname_file()` and treated as an unresolved name, which it is.

bye,
Sumit

Comment 19 Gregory Lee Bartholomew 2024-12-20 20:09:40 UTC
Well, I need the vm-01 hostname to resolve for other things, so I added another IPv6 address to the LDAP server and updated the /etc/hosts files on the clients accordingly.

And ... Everything seems to be working now. Thanks!

Comment 20 Sumit Bose 2024-12-21 09:06:51 UTC
Hi,

thank you for the confirmation, glad to hear it is working for you now. Nevertheless, since having aliases in /etc/hosts is a valid configuration we have to fix the SSSD side as well.

bye,
Sumit

Comment 21 Gregory Lee Bartholomew 2025-01-02 14:40:41 UTC
@atikhono did you mean "but no IPv4 address"? My /etc/hosts entry had an IP address.

Comment 22 Alexey Tikhonov 2025-01-02 14:44:09 UTC
(In reply to Gregory Lee Bartholomew from comment #21)
> @atikhono did you mean "but no IPv4 address"? My /etc/hosts entry had an IP
> address.

Thanks. Hopefully this time updated summary correctly.