Bug 1494843

Summary:

KCM does not work

Product:

[Fedora] Fedora

Reporter:

Dennis Gilmore <dennis>

Component:

sssd

Assignee:

Jakub Hrozek <jhrozek>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

abokovoy, andrew, asn, bberg, bojan, chuck.lever, debarshir, dennis, fidencio, jhrozek, j, kparal, lars, lslebodn, nalin, pbrezina, pschindl, rharwood, sbose, sgallagh, ssorce, tomek, tpopela, wattersm, zing

Target Milestone:

---

Keywords:

CommonBugs

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

https://fedoraproject.org/wiki/Common_F27_bugs#kerberos-broken

Fixed In Version:

sssd-1.16.1-2.fc27 sssd-1.16.1-2.fc28 sssd-1.16.1-2.fc26

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-04-02 13:01:48 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
sssd_secrets.log	none
sssd_kcm.log.xz	none
sssd_kcm.log.xz	none
sssd_secrets.log.xz	none
sssd_kcm.log	none
sssd_secrets.log	none

Description Dennis Gilmore 2017-09-23 13:13:46 UTC

Description of problem:
On a new fedora 27 install kerberos does not work. I get the following message
kinit: Internal credentials cache error while getting default ccache
I was able to get kerberos working by commenting out /etc/krb5.conf.d/kcm_default_ccache I use IPA for authentication 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Lukas Slebodnik 2017-09-23 13:49:51 UTC

(In reply to Dennis Gilmore from comment #0)
> Description of problem:
> On a new fedora 27 install kerberos does not work. I get the following
> message
> kinit: Internal credentials cache error while getting default ccache
> I was able to get kerberos working by commenting out
>   I use IPA for authentication 
> 
If you have file /etc/krb5.conf.d/kcm_default_ccache then it means package sssd-kcm is installed. http://fedoraproject.org/wiki/Releases/27/ChangeSet#Kerberos_KCM_credential_cache_by_default


We will need to see log files from daemons sssd-secrets and sssd-kcm

Add following lines to sssd.conf or (/etc/sssd/conf.d/11_kcm.conf)
[kcm]
debug_level = 10
debug_microseconds = true

[secrets]
debug_level = 10
debug_microseconds = true

* restart services sssd sssd-secrets and sssd-kcm
* reproduce issue with "KRB5_TRACE=/dev/stderr kinit .."
* and provide output of previous command + log files /var/log/sssd/sssd_secrets.log /var/log/sssd/sssd_kcm.log

Comment 2 Dennis Gilmore 2017-09-25 21:42:58 UTC

sssd_kcm.log gave me 
(Mon Sep 25 16:34:52 2017) [sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104, Connection reset by peer), aborting client

I did not get any extra output from anything else. kinit was the exact same output.

I had to reboot the system for the changes to take effect. I also had issues logging in as my user when KCM was configured and the sssd debugging enabled

Comment 3 Lukas Slebodnik 2017-09-25 22:20:27 UTC

(In reply to Dennis Gilmore from comment #2)
> sssd_kcm.log gave me 
> (Mon Sep 25 16:34:52 2017) [sssd[kcm]] [kcm_recv] (0x0010): Failed to
> receive data (104, Connection reset by peer), aborting client
> 
Do you have /var/log/sssd/sssd_secrets.log with increased debug level?

Was sssd-secrets running at that time. (it is socket activated service. The same as sssd-kcm.

 
> I had to reboot the system . I also had issues
> logging in as my user when KCM was configured and the sssd debugging enabled
Which changes do you have in mind?

Do you mean default krb5ccache in /etc/krb5.conf.d/kcm_default_ccache?
Because it was already written in description that it was enought to modify that file.

Comment 4 Kamil Páral 2017-09-28 12:21:13 UTC

I have exactly the same problem. Kerberos was working great for redhat.com and fedoraproject.org on Fedora 26. After upgrade to Fedora 27, it's completely broken.

$ klist -a
klist: Internal credentials cache error while getting default ccache

$ KRB5_TRACE=/dev/stderr kinit kparal
kinit: Internal credentials cache error while searching for ccache for kparal

In sssd_kcm.log I see:
(Thu Sep 28 14:16:40 2017) [sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104, Connection reset by peer), ab
orting client
(Thu Sep 28 14:16:40 2017) [sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104, Connection reset by peer), ab
orting client
(Thu Sep 28 14:16:40 2017) [sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104, Connection reset by peer), ab
orting client
(Thu Sep 28 14:16:40 2017) [sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104, Connection reset by peer), ab
orting client
(Thu Sep 28 14:16:42 2017) [sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104, Connection reset by peer), aborting client
(Thu Sep 28 14:16:42 2017) [sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104, Connection reset by peer), aborting client

over and over again, every few seconds.

In sssd_secrets.log I see:
(Thu Sep 28 14:03:48 2017) [sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children
(Thu Sep 28 14:05:29 2017) [sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children
(Thu Sep 28 14:13:03 2017) [sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children
(Thu Sep 28 14:16:03 2017) [sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children

over and over again, every few minutes.

While testing this, I performed the changes stated in comment 1 (created /etc/sssd/conf.d/11_kcm.conf). Both sssd-secrets.service and sssd-kcm.service were running during that time.

Comment 5 Kamil Páral 2017-09-28 12:26:53 UTC

I can confirm that after commenting out contents of /etc/krb5.conf.d/kcm_default_ccache file kerberos starts working perfectly.

Comment 6 Jakub Hrozek 2017-09-28 13:15:50 UTC

(In reply to Kamil Páral from comment #5)
> I can confirm that after commenting out contents of
> /etc/krb5.conf.d/kcm_default_ccache file kerberos starts working perfectly.

Please provide debug logs as per comment #1.

Comment 7 Jakub Hrozek 2017-09-28 13:23:59 UTC

btw if you have issues with generating the log files even after adding the debug_level directives to sssd.conf to the kcm and secret sections and restarting sssd, chances are the old sssd_kcm and/or sssd_secrets processes are still around, you can kill them so that the next socket-activated instances read the verbose debug level (pkill sssd_kcm; pkill sssd_secrets).

Alternatively, you can always run the deamons on the foreground:
/usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --debug-level=10 --debug-to-files
/usr/libexec/sssd/sssd_secrets --uid 0 --gid 0 --debug-level=10 --debug-to-files

That should generate /var/log/sssd/sssd_kcm.log and /var/log/sssd/sssd_secrets.log

Comment 8 Kamil Páral 2017-09-28 13:24:10 UTC

Created attachment 1331984 [details]
sssd_secrets.log

Comment 9 Kamil Páral 2017-09-28 13:24:47 UTC

Created attachment 1331985 [details]
sssd_kcm.log.xz

Comment 10 Kamil Páral 2017-09-28 13:26:47 UTC

I rebooted before generating the log files.

Comment 11 Jakub Hrozek 2017-09-28 13:40:54 UTC

These logs are not verbose enough, because sssd logs only critical errors by default. Please use a higher debug level, see comment #7 to run the deamons from the command line if you have problems generating the logs.

And please truncate the existing logs so the logs you attach ideally only include a single test

Comment 12 Lukas Slebodnik 2017-10-02 09:46:54 UTC

(In reply to Jakub Hrozek from comment #11)
> These logs are not verbose enough, because sssd logs only critical errors by
> default. Please use a higher debug level, see comment #7 to run the deamons
> from the command line if you have problems generating the logs.
> 
> And please truncate the existing logs so the logs you attach ideally only
> include a single test

Running daemons from command line is not ideal because they are running with different SELinux context.

Instructions in comment 1 were created for Denis because he use sssd with IPA
but in case of clean/minimal f27 it won't work due to upstream bug https://pagure.io/SSSD/sssd/issue/3439. It will work with missing sssd.conf with following change cp /etc/sssd/conf.d/11_kcm.conf /etc/sssd/sssd.conf

The main reason why I wrote instructions in comment#1 is enabling debug logs with microseconds which might help a lot. (and will not work with instructins in comment#7)

BTW I suspect bug due to responder_idle_timeout (which I disabled on my laptop due to running daemons with valgrind)

Comment 13 Kamil Páral 2017-10-04 12:43:17 UTC

I placed the content from comment 1 into /etc/sssd/sssd.conf and still the logs are not verbose. I'm available in the Red Hat office, perhaps you can stop by and try to debug it locally, if you can't reproduce that?

Comment 14 Kamil Páral 2017-10-19 07:57:06 UTC

So, to summarize. For some reason the logs started being verbose only after some time after reboot. The issue might be affected by other issues, some of them related to gnome-online-accounts. For testing purposes I disabled them. I tried to reproduce the issue on fresh F27 installs, and even on an F26->F27 upgrade, but it worked. 

On my local machine, though, it only started working after deleting everything in /var/lib/sss/secrets/ (including hidden files) and rebooting. Since then, KCM seems to work.

Dennis, can you confirm the same workaround works for you as well? Thanks.

Comment 15 Benjamin Berg 2017-11-03 10:17:39 UTC

So, I just ran into my issue again, and after a short chat with Patrick Uiterwijk it looks the race condition is simply that systemd has not opened the socket when the sssd service is being active. i.e. what happens is:

 * sssd-*.service is started
 * sssd-*.socket is also triggered (but the socket is not bound yet)
 * daemon comes up
 * daemon binds to the socket as systemd has not done so yet
 * systemd fails to bind the socket and the sssd-*.socket units fail to start up

There are different possible fixes for this:
 * add proper Before=/After= lines
 * prevent the service from ever trying to bind to the socket if running under systemd

Comment 16 Benjamin Berg 2017-11-03 10:33:19 UTC

Clearing the needinfo flag as the above analysis might explain the seen issue (at least the issue that I was seeing).

Comment 17 Lukas Slebodnik 2017-11-03 10:51:27 UTC

Benjamin, your issue was very likely different.

Comment 18 Debarshi Ray 2017-11-07 17:38:48 UTC

(In reply to Benjamin Berg from comment #15)
> So, I just ran into my issue again, and after a short chat with Patrick
> Uiterwijk it looks the race condition is simply that systemd has not opened
> the socket when the sssd service is being active. i.e. what happens is:
> 
>  * sssd-*.service is started
>  * sssd-*.socket is also triggered (but the socket is not bound yet)
>  * daemon comes up
>  * daemon binds to the socket as systemd has not done so yet
>  * systemd fails to bind the socket and the sssd-*.socket units fail to
> start up
> 
> There are different possible fixes for this:
>  * add proper Before=/After= lines
>  * prevent the service from ever trying to bind to the socket if running
> under systemd

(In reply to Lukas Slebodnik from comment #17)
> Benjamin, your issue was very likely different.

Isn't that bug 1475999 ?

Comment 19 Lukas Slebodnik 2017-11-08 11:20:35 UTC

(In reply to Debarshi Ray from comment #18)
> (In reply to Benjamin Berg from comment #15)
> > So, I just ran into my issue again, and after a short chat with Patrick
> > Uiterwijk it looks the race condition is simply that systemd has not opened
> > the socket when the sssd service is being active. i.e. what happens is:
> > 
> >  * sssd-*.service is started
> >  * sssd-*.socket is also triggered (but the socket is not bound yet)
> >  * daemon comes up
> >  * daemon binds to the socket as systemd has not done so yet
> >  * systemd fails to bind the socket and the sssd-*.socket units fail to
> > start up
> > 
> > There are different possible fixes for this:
> >  * add proper Before=/After= lines
> >  * prevent the service from ever trying to bind to the socket if running
> > under systemd
> 
> (In reply to Lukas Slebodnik from comment #17)
> > Benjamin, your issue was very likely different.
> 
> Isn't that bug 1475999 ?

NO BZ1475999 is already fixed even though state of BZ does not say so.
sh4 grep -E "kcm|secrets" /usr/lib/systemd/system-preset/*
/usr/lib/systemd/system-preset/90-default.preset:enable sssd-kcm.socket
/usr/lib/systemd/system-preset/90-default.preset:enable sssd-secrets.socket

Comment 20 Andrew Meredith 2017-12-23 01:17:42 UTC

Still present in sssd-kcm-1.16.0-5.fc27.x86_64

Comment 21 Lukas Slebodnik 2018-02-02 09:29:09 UTC

(In reply to Andrew Meredith from comment #20)
> Still present in sssd-kcm-1.16.0-5.fc27.x86_64

May I ask what kind of error can you see after calling klist/kinit?

The same as in Comment4 ?

Comment 22 Tomasz Torcz 2018-02-02 11:18:12 UTC

For me, adding debug level to logs did not create more logs. Only the message "[sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104, Connection reset by peer), aborting client" is ever generated.

But straceing KCM, I see that directly before generating this message, KCM does:

155677 writev(12, [{iov_base="\0\0\0\0", iov_len=4}], 1) = 4
155677 writev(12, [{iov_base="\226\307:\215", iov_len=4}], 1) = 4
155677 writev(12, [{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0… ", iov_len=2048}], 1) = -1 EPIPE (Broken pipe)

And FD 12 seem to be the socket passed from socket activation:

155677 accept(3, {sa_family=AF_UNIX}, [110->2]) = 12

So I guess whatever connects to /var/run/.heim_org.h5l.kcm-socket just closes connection in the middle of conversation.

Comment 23 Lukas Slebodnik 2018-02-02 11:22:11 UTC

(In reply to Tomasz Torcz from comment #22)
> For me, adding debug level to logs did not create more logs. Only the
> message "[sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104,
> Connection reset by peer), aborting client" is ever generated.
> 

I hope following steps will help to provide more verbose log files

Add following lines to /etc/sssd/sssd.conf (if file does not exist then create new one)

"""
[kcm]
debug_level = 10
debug_microseconds = true

[secrets]
debug_level = 10
debug_microseconds = true
"""

And execute following commands
* chmod 600 /etc/sssd/sssd.conf
* /usr/sbin/sssd --genconf
* systemctl stop sssd-secrets and sssd-kcm
* reproduce issue with "KRB5_TRACE=/dev/stderr kinit .."
* and provide output of previous command + log files /var/log/sssd/sssd_secrets.log /var/log/sssd/sssd_kcm.log

Comment 24 Tomasz Torcz 2018-02-04 17:36:39 UTC

#  /usr/sbin/sssd --genconf                                   
(Sun Feb  4 18:33:34:570150 2018) [sssd] [main] (0x0010): pidfile exists at /var/run/sssd.pid                                                                
SSSD is already running 
#

% KRB5_TRACE=/dev/stderr kinit zdzichu       
kinit: Matching credential not found while getting default ccache
%

Files attached.

Comment 25 Tomasz Torcz 2018-02-04 17:37:20 UTC

Created attachment 1391035 [details]
sssd_kcm.log.xz

Comment 26 Tomasz Torcz 2018-02-04 17:38:10 UTC

Created attachment 1391036 [details]
sssd_secrets.log.xz

Comment 27 Lukas Slebodnik 2018-02-05 09:49:05 UTC

(In reply to Tomasz Torcz from comment #24)
> #  /usr/sbin/sssd --genconf                                   
> (Sun Feb  4 18:33:34:570150 2018) [sssd] [main] (0x0010): pidfile exists at
> /var/run/sssd.pid                                                           
> 
> SSSD is already running 
> #

Ahh I forgot about this case. "/usr/sbin/sssd --genconf" will not regenerate internal conf.db if sssd is already running.

Here are updated steps (I'm sorry for inconveniences):

Add following lines to /etc/sssd/sssd.conf (if file does not exist then create new one)

"""
[kcm]
debug_level = 10
debug_microseconds = true

[secrets]
debug_level = 10
debug_microseconds = true
"""

And execute following commands
* chmod 600 /etc/sssd/sssd.conf
* /usr/sbin/sssd --genconf || systemctl restart sssd
* rm -f /var/log/sssd/*
* systemctl stop sssd-secrets and sssd-kcm
* reproduce issue with "KRB5_TRACE=/dev/stderr kinit .."
* and provide output of previous command + log files /var/log/sssd/sssd_secrets.log /var/log/sssd/sssd_kcm.log

Comment 28 Tomasz Torcz 2018-02-09 07:07:26 UTC

# chmod 600 /etc/sssd/sssd.conf                               
# /usr/sbin/sssd --genconf || systemctl restart sssd          
(Fri Feb  9 07:57:33:884277 2018) [sssd] [main] (0x0010): pidfile exists at /var/run/sssd.pid                                                                  
SSSD is already running                
# rm -f /var/log/sssd/*                                       
#  systemctl stop sssd-secrets and sssd-kcm                   
Failed to stop and.service: Unit and.service not loaded.                       
Warning: Stopping sssd-secrets.service, but it can still be activated by:      
  sssd-secrets.socket                  
Warning: Stopping sssd-kcm.service, but it can still be activated by:          
  sssd-kcm.socket                      


% KRB5_TRACE=/dev/stderr kinit zdzichu        
kinit: Matching credential not found while getting default ccache
%

At this point I've tried to login remotely via SSH (it, too, doesn't work).
KCM logs for SSH connection start at Fri Feb  9 07:58:57:260232 2018

UID/GID 1001 you see in logs belongs to my user:
# id zdzichu            
uid=1001(zdzichu) gid=1001(zdzichu) groups=1001(zdzichu),190(systemd-journal),10(wheel)                                                                        
# getent passwd zdzichu 
zdzichu:*:1001:1001:Tomek Torcz:/home/zdzichu:/bin/zsh

Comment 29 Tomasz Torcz 2018-02-09 07:08:06 UTC

Created attachment 1393551 [details]
sssd_kcm.log

Comment 30 Tomasz Torcz 2018-02-09 07:08:37 UTC

Created attachment 1393552 [details]
sssd_secrets.log

Comment 31 Michael Watters 2018-02-27 14:12:43 UTC

Having the same issue on my workstation which is running CentOS 7.  The kcm log actually filled up the / partition with over 20 GB of data.  The same message is logged repeatedly in the log file.


[sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children

Comment 32 Fabiano Fidêncio 2018-02-27 14:28:20 UTC

(In reply to Michael Watters from comment #31)
> Having the same issue on my workstation which is running CentOS 7.  The kcm
> log actually filled up the / partition with over 20 GB of data.  The same
> message is logged repeatedly in the log file.
> 
> 
> [sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children

Michael, your problem is not the same as described by Tomasz, but it does sound like an issue.
Logging the responder activity when it's socket-activated (it's your case, right?) may be done in a more careful way in order to avoid:
- flooding the logs;
- filling up the disk of the users;

I'm opening a upstream bug about this one.

Comment 33 Fabiano Fidêncio 2018-02-27 14:31:09 UTC

(In reply to Fabiano Fidêncio from comment #32)
> (In reply to Michael Watters from comment #31)
> > Having the same issue on my workstation which is running CentOS 7.  The kcm
> > log actually filled up the / partition with over 20 GB of data.  The same
> > message is logged repeatedly in the log file.
> > 
> > 
> > [sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children
> 
> Michael, your problem is not the same as described by Tomasz, but it does
> sound like an issue.
> Logging the responder activity when it's socket-activated (it's your case,
> right?) may be done in a more careful way in order to avoid:
> - flooding the logs;
> - filling up the disk of the users;
> 
> I'm opening a upstream bug about this one.

Here's a link for the upstream issue: https://pagure.io/SSSD/sssd/issue/3651

Comment 34 Fabiano Fidêncio 2018-02-27 15:56:56 UTC

(In reply to Fabiano Fidêncio from comment #33)
> (In reply to Fabiano Fidêncio from comment #32)
> > (In reply to Michael Watters from comment #31)
> > > Having the same issue on my workstation which is running CentOS 7.  The kcm
> > > log actually filled up the / partition with over 20 GB of data.  The same
> > > message is logged repeatedly in the log file.
> > > 
> > > 
> > > [sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children
> > 
> > Michael, your problem is not the same as described by Tomasz, but it does
> > sound like an issue.
> > Logging the responder activity when it's socket-activated (it's your case,
> > right?) may be done in a more careful way in order to avoid:
> > - flooding the logs;
> > - filling up the disk of the users;
> > 
> > I'm opening a upstream bug about this one.
> 
> Here's a link for the upstream issue: https://pagure.io/SSSD/sssd/issue/3651

And the bug report has been inferred to be non useful and I've closed it.

Comment 35 Michael Watters 2018-02-27 16:44:11 UTC

I have updated the bug report with additional details.  The actual issue is that KCM lookups appear to be failing.  I am seeing this in CentOS 7 and Fedora 27 which both have the sssd-kcm.socket service enabled.

Here's the log messages which are being repeated every few seconds.

(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(Mon Feb 26 07:36:50 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client

Comment 36 Jakub Hrozek 2018-02-27 18:46:20 UTC

Since it's EBADMSG, there are only two places in the code where this can happen:

152     if (msglen == 0) {
153         DEBUG(SSSDBG_CRIT_FAILURE, "Illegal zero-length message\n");
154         return EBADMSG;
155     }
156
157     if (msglen != reqbuf->v_msg.nprocessed) {
158         DEBUG(SSSDBG_CRIT_FAILURE,
159               "Sender claims the message is %"PRIu32" bytes, "
160               "but received %zu\n",
161               msglen, reqbuf->v_msg.nprocessed);
162         return EBADMSG;
163     }

So it's either a zero-length message or the sender sent a message that is not as long as it claims to be, or, of course, we have a bug in the sssd processing logic.

I think the first step should be to bump the debug logs to a more verbose level to see which of the two it is. The logs should also include the PID of the caller which might reveal who's sending the (potentially bogus?) data. I wonder if we should also enhance upstream to dump the whole raw message (or at least the headers etc, perhaps not the payload..) with a high debug level. When I was debugging KCM, IIRC I was using strace's "-e read=$num" to see the dump of the KCM socket data..

Comment 37 Simo Sorce 2018-02-27 21:09:45 UTC

Jakub,
we should implement some log throttling, probably by using a private timestamped cache or something. It makes no sense to spit out the same exact message multiple times a second... perahps we should also think of throttling clients, to prevent a bad behaved client from monopolizing the KCM.

Comment 38 Lukas Slebodnik 2018-02-27 21:54:08 UTC

(In reply to Simo Sorce from comment #37)
> Jakub,
> we should implement some log throttling, probably by using a private
> timestamped cache or something. It makes no sense to spit out the same exact
> message multiple times a second... perahps we should also think of
> throttling clients, to prevent a bad behaved client from monopolizing the
> KCM.

I would say we should firstly focus on bug an then on unnecessary log messages.
And it is very likely that fixing bug will reduce size of log file as well.

Comment 39 Lars Kellogg-Stedman 2018-03-13 14:01:02 UTC

I'm hitting this problem with sssd-kcm-1.16.0-6.fc27.x86_64.

Stopping sssd-kcm{.service,.socket}, removing /var/lib/sss/secrets/*, and restarting sssd-kcm.socket seems to have cleared things up. I am not able to reproduce the problem on demand.

If it matters, I am running with multiple credential caches (one internal, one for fedoraproject.org):

$ klist -l
Principal name                 Cache name
--------------                 ----------
username     KCM:1000:7064
username           KCM:1000

Comment 40 Lars Kellogg-Stedman 2018-03-15 12:27:17 UTC

Even restarting the sssd-kcm service only temporarily resolves the problem.  Here you can see that I had a functional credentials cache for approximately 15 seconds:

[lars@madhatter ~]$ date; klist
Thu Mar 15 08:24:27 EDT 2018
klist: Internal credentials cache error while getting default ccache

[lars@madhatter ~]$ sudo systemctl stop sssd-kcm.{service,socket}; sudo rm -f /var/lib/sss/secrets/.secrets.mkey /var/lib/sss/secrets/secrets.ldb; sudo systemctl start sssd-kcm.socket
[lars@madhatter ~]$ date; klist
Thu Mar 15 08:24:29 EDT 2018
Ticket cache: KCM:1000:66640
Default principal: username

[lars@madhatter ~]$ date; klist
Thu Mar 15 08:24:34 EDT 2018
klist: Internal credentials cache error while getting default ccache

Comment 41 Fabiano Fidêncio 2018-03-19 21:51:47 UTC

Okay, this bugzilla has several different issues being treated as the same, although they're not.

So, let's try to organize it and make it more useful to everyone involved.

Michael Watters, are you running SSSD against AD? I have the feeling I've fixed your issue and I may even have a testing build that you can install, test and give me some feedback. Just do a `dnf copr enable fidencio/sssd-warpil` and update SSSD.

Lars Kellogg-Stedman, please, I'd really like to ask you for logs. As this bug is a little bit messier than usual, I'd like to be sure what kind of issue we're dealing here and then I'll open more tickets, each one specific for each issue we're facing.

Tomasz, I'll get to your logs. You're still facing the issue, right?

Andrew, Kamil, Dennis. What are the status of this issue for you guys? Would you be able to provide us some debug_logs as well and maybe some steps to reproduce the issue?

Sorry for the really long time it's taking us to track this issue down.

Comment 42 Debarshi Ray 2018-03-20 17:46:32 UTC

I think that the problems reported by Dennis, Kamil and Lars are the same issue - the one which you debugged with me yesterday. I had the same symptoms as the ones described in comment 2, comment 4 (I checked the logs I had saved in the past), comment 39 and comment 40.

Especially note how restarting the systemd sockets (comment 39 and comment 40) seems to fix the problem. That's also how we accidentally (?) fixed my problem yesterday.

I don't know the exact problem that Benjamin (comment 15) is referring to. However, given that he is talking about systemd sockets, maybe that's also the same problem?

Comment 43 Fabiano Fidêncio 2018-03-20 17:58:25 UTC

(In reply to Debarshi Ray from comment #42)
> I think that the problems reported by Dennis, Kamil and Lars are the same
> issue - the one which you debugged with me yesterday. I had the same
> symptoms as the ones described in comment 2, comment 4 (I checked the logs I
> had saved in the past), comment 39 and comment 40.

Yep, but we didn't find out what was causing the issue you've faced. And in the logs provided I can't the very same JSON error we could find in yours. So, I would take the safest path here and do not treat the issues as the same (at least till the logs show us the opposite).

> 
> Especially note how restarting the systemd sockets (comment 39 and comment
> 40) seems to fix the problem. That's also how we accidentally (?) fixed my
> problem yesterday.

Yep, but the problem came back for Lars. So, I'm talking to him privately in order to get more data and try to find out what may actually cause the issue and then come up with a proper solution.

> 
> I don't know the exact problem that Benjamin (comment 15) is referring to.
> However, given that he is talking about systemd sockets, maybe that's also
> the same problem?

I don't think so, Rishi.
Benjamin problem was https://github.com/SSSD/sssd/commit/53d1459e9b87196b4f6e327f0f5db4d9229bf541 ... which makes me want to re-review Units parameters used for sssd-kcm and sssd-secrets as those still seem to be problematic.

Comment 44 Debarshi Ray 2018-03-20 18:09:32 UTC

(In reply to Fabiano Fidêncio from comment #43)
> (In reply to Debarshi Ray from comment #42)
> > I think that the problems reported by Dennis, Kamil and Lars are the same
> > issue - the one which you debugged with me yesterday. I had the same
> > symptoms as the ones described in comment 2, comment 4 (I checked the logs I
> > had saved in the past), comment 39 and comment 40.
> 
> Yep, but we didn't find out what was causing the issue you've faced. And in
> the logs provided I can't the very same JSON error we could find in yours.

The thing is that I had dozens of these in my logs from last November before the JSON error was logged:

[sssd[kcm]] [kcm_recv] (0x0010): Failed to receive data (104, Connection reset by peer), aborting client

[sssd[secrets]] [orderly_shutdown] (0x0010): SIGTERM: killing children

Dennis, Kamil, do you happen to have this in your sssd_kcm.log:

[sssd[kcm]] [sec_list_parse] (0x0020): Failed to parse JSON payload on line 21: unable to decode byte 0x88

> > Especially note how restarting the systemd sockets (comment 39 and comment
> > 40) seems to fix the problem. That's also how we accidentally (?) fixed my
> > problem yesterday.
> 
> Yep, but the problem came back for Lars. So, I'm talking to him privately in
> order to get more data and try to find out what may actually cause the issue
> and then come up with a proper solution.

Ok, great.

> > I don't know the exact problem that Benjamin (comment 15) is referring to.
> > However, given that he is talking about systemd sockets, maybe that's also
> > the same problem?
> 
> I don't think so, Rishi.
> Benjamin problem was
> https://github.com/SSSD/sssd/commit/53d1459e9b87196b4f6e327f0f5db4d9229bf541
> ... which makes me want to re-review Units parameters used for sssd-kcm and
> sssd-secrets as those still seem to be problematic.

I see, ok.

Comment 45 Fabiano Fidêncio 2018-03-21 19:49:14 UTC

People,

I'd like to provide you guys a test build that may fix those issues you've been facing. To install it, please, just do `dnf copr enable fidencio/sssd-kcm-issues && dnf update sssd`.


Please, let me know if you face more issues after this build (logs are always needed to help us to debug the issue for you).

Thanks,

Comment 46 Lars Kellogg-Stedman 2018-03-22 17:20:41 UTC

Fabiano: I've been running with the test build since last night, and I haven't had any problems with Kerberos since then.

Comment 47 Michael Watters 2018-03-23 13:14:49 UTC

Fabiano:

I am using sssd with AD authentication enabled.  Unfortunately I'm still having issues with the sssd_kcm.log file filling up my root partition.  There are thousands of lines repeating the same error as follows.

(Fri Mar 23 08:57:11 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client

I am running CentOS 7 so I will not be able to install the copr packages.  Do you have a yum repo with packages for CentOS 7 available?

Comment 48 Fabiano Fidêncio 2018-03-23 13:16:30 UTC

(In reply to Michael Watters from comment #47)
> Fabiano:
> 
> I am using sssd with AD authentication enabled.  Unfortunately I'm still
> having issues with the sssd_kcm.log file filling up my root partition. 
> There are thousands of lines repeating the same error as follows.
> 
> (Fri Mar 23 08:57:11 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse
> data (74, Bad message), aborting client
> 
> I am running CentOS 7 so I will not be able to install the copr packages. 
> Do you have a yum repo with packages for CentOS 7 available?

Gimme a few minutes and I'll provide you an EL7 based build.

Comment 49 Jakub Hrozek 2018-03-23 13:29:42 UTC

(In reply to Fabiano Fidêncio from comment #48)
> (In reply to Michael Watters from comment #47)
> > Fabiano:
> > 
> > I am using sssd with AD authentication enabled.  Unfortunately I'm still
> > having issues with the sssd_kcm.log file filling up my root partition. 
> > There are thousands of lines repeating the same error as follows.
> > 
> > (Fri Mar 23 08:57:11 2018) [sssd[kcm]] [kcm_recv] (0x0010): Failed to parse
> > data (74, Bad message), aborting client
> > 
> > I am running CentOS 7 so I will not be able to install the copr packages. 
> > Do you have a yum repo with packages for CentOS 7 available?
> 
> Gimme a few minutes and I'll provide you an EL7 based build.

If you're patching the 7.4 package, please note that there were several patches for kCM that never made it to RHEL..

Comment 50 Fabiano Fidêncio 2018-03-23 14:03:29 UTC

(In reply to Jakub Hrozek from comment #49)
> > 
> > Gimme a few minutes and I'll provide you an EL7 based build.
> 
> If you're patching the 7.4 package, please note that there were several
> patches for kCM that never made it to RHEL..

And seems that f27 src.rpm doesn't build properly for EL7, thus it'll take more time than a few minutes.

Till the end of the day I'll provide the EL7 copr.

Comment 51 Fabiano Fidêncio 2018-03-23 21:45:29 UTC

(In reply to Fabiano Fidêncio from comment #50)
> (In reply to Jakub Hrozek from comment #49)
> > > 
> > > Gimme a few minutes and I'll provide you an EL7 based build.
> > 
> > If you're patching the 7.4 package, please note that there were several
> > patches for kCM that never made it to RHEL..
> 
> And seems that f27 src.rpm doesn't build properly for EL7, thus it'll take
> more time than a few minutes.
> 
> Till the end of the day I'll provide the EL7 copr.

Not possible till the end of the day. COPR is just kicking me quite hard as the build is breaking due to the following error:

Traceback (most recent call last):
  File "/usr/libexec/mock/mock", line 481, in rebuild_generic
    ret = cmd(item)
  File "/usr/libexec/mock/mock", line 552, in cmd
    follow_links=options.symlink_dereference)
  File "/usr/lib/python3.6/site-packages/mockbuild/trace_decorator.py", line 96, in trace
    result = func(*args, **kw)
  File "/usr/lib/python3.6/site-packages/mockbuild/backend.py", line 391, in buildsrpm
    rebuilt_srpm = self.rebuild_installed_srpm(chrootspec, timeout)
  File "/usr/lib/python3.6/site-packages/mockbuild/trace_decorator.py", line 96, in trace
    result = func(*args, **kw)
  File "/usr/lib/python3.6/site-packages/mockbuild/backend.py", line 456, in rebuild_installed_srpm
    printOutput=self.config['print_main_output']
  File "/usr/lib/python3.6/site-packages/mockbuild/buildroot.py", line 204, in doChroot
    env=env, shell=shell, *args, **kargs)
  File "/usr/lib/python3.6/site-packages/mockbuild/trace_decorator.py", line 96, in trace
    result = func(*args, **kw)
  File "/usr/lib/python3.6/site-packages/mockbuild/util.py", line 626, in do
    raise exception.Error("Command failed: \n # %s\n%s" % (command, output), child.returncode)
mockbuild.exception.Error: Command failed: 
 # /usr/bin/systemd-nspawn -q -M f90cb55e9fae49b0bc683c1c2281821b -D /var/lib/mock/731963-epel-7-x86_64-1521841038.897379/root -a --setenv=TERM=vt100 --setenv=SHELL=/bin/bash --setenv=HOME=/builddir --setenv=HOSTNAME=mock --setenv=PATH=/usr/bin:/bin:/usr/sbin:/sbin --setenv=PROMPT_COMMAND=printf "\033]0;<mock-chroot>\007" --setenv=PS1=<mock-chroot> \s-\v\$  --setenv=LANG=en_US.UTF-8 -u mockbuild bash --login -c /usr/bin/rpmbuild -bs --target x86_64 --nodeps /builddir/build/SPECS/sssd.spec

I will get back to it on Monday and then provide Michael's build.

Comment 52 Bojan Smojver 2018-03-27 05:14:07 UTC

I'm running sssd-1.16.1-1 on F27, with KDC being AD (Windows). If I attempt kinit using certain principals (to my knowledge, these are not "Unix" IDs in AD), the sssd_kcm.log file grows by GBs in a very short time (in fact, it filled my /var). The messages are all along the lines of "Failed to parse data" mentioned in comments to this bug.

If nothing else, that is a DOS, so this excessive logging should be changed. I don't have any debugging enabled. Just FYI.

Comment 53 Fabiano Fidêncio 2018-03-27 06:32:06 UTC

(In reply to Bojan Smojver from comment #52)
> I'm running sssd-1.16.1-1 on F27, with KDC being AD (Windows). If I attempt
> kinit using certain principals (to my knowledge, these are not "Unix" IDs in
> AD), the sssd_kcm.log file grows by GBs in a very short time (in fact, it
> filled my /var). The messages are all along the lines of "Failed to parse
> data" mentioned in comments to this bug.

Bojan,

Please, would you mind to: `dnf copr enable fidencio/sssd-kcm-issues && dnf update sssd`.

By doing that you'll be testing a build that includes two fixes that I've provided for sssd KCM but that are still under review. It should solve your problem.

Comment 54 Bojan Smojver 2018-03-27 07:39:11 UTC

Thanks, I may try this when I login next.

Comment 55 Tomasz Torcz 2018-03-27 18:16:10 UTC

Re #41:

I'm still facing the problems. I tried with today's copr build, no improvement.

Comment 56 Jakub Hrozek 2018-03-27 19:07:24 UTC

(In reply to Tomasz Torcz from comment #55)
> Re #41:
> 
> I'm still facing the problems. I tried with today's copr build, no
> improvement.

I'm sorry you are still facing issues. I'll leave the final say to Fabiano because he spends the most hours maintaining the Fedora SSSD packages, but since we had several reporters confirm that the fixes under review did help them, I would suggest to use this BZ to track that issue and I would prefer if people who are still facing issues with KCM to file a separate ticket. Otherwise I'm afraid we might lose track of several problems tracked in a single bugzilla.

Comment 57 Bojan Smojver 2018-03-28 03:28:28 UTC

(In reply to Fabiano Fidêncio from comment #53)

> Please, would you mind to: `dnf copr enable fidencio/sssd-kcm-issues && dnf
> update sssd`.
> 
> By doing that you'll be testing a build that includes two fixes that I've
> provided for sssd KCM but that are still under review. It should solve your
> problem.

That worked for me. I can now get the TGT from AD for the principal that I could not before. The spamming of the log stopped and I can now actually login to some machines that I could not into before using ssh with GSSAPI.

So, in my case, your packages work. Thanks.

Comment 58 Fabiano Fidêncio 2018-03-28 17:59:10 UTC

Michael Watters,

Sorry for the long delay it took for me to provide you a build, but here it is: https://copr.fedorainfracloud.org/coprs/fidencio/sssd-kcm-issues/build/733586/

In order to easily install it, please, ensure that you have `yum-plugin-copr` installed and then do:
yum copr enable fidencio/sssd-kcm-issue;
yum update sssd;

Please, mind that in case you downgrade from this version you'd have to remove the cache as in 1.16.1 we bumped our cache version.

Let us know about the results.

Comment 59 Michael Watters 2018-03-28 20:18:17 UTC

Fabiano,

Thanks for building those packages.  I'm getting a 404 error using the  yum copr enable fidencio/sssd-kcm-issue command however I was able to install the
packages from the repo using the following procedure.

cd /etc/yum.repos.d
curl -O https://copr.fedorainfracloud.org/coprs/fidencio/sssd-kcm-issues/repo/epel-7/fidencio-sssd-kcm-issues-epel-7.repo
yum update sssd

After updating the packages I restarted ssd-kcm.service and sssd-kcm.socket which appears to have quieted the log files down.   I'm still seeing errors but not nearly as often.  For example, these messages appear every about every 8 minutes.

(Wed Mar 28 15:56:12 2018) [sssd[kcm]] [orderly_shutdown] (0x0010): SIGTERM: killing children
(Wed Mar 28 16:03:47 2018) [sssd[kcm]] [orderly_shutdown] (0x0010): SIGTERM: killing children
(Wed Mar 28 16:11:22 2018) [sssd[kcm]] [orderly_shutdown] (0x0010): SIGTERM: killing children

Comment 60 Bojan Smojver 2018-03-28 20:52:51 UTC

(In reply to Michael Watters from comment #59)

> (Wed Mar 28 15:56:12 2018) [sssd[kcm]] [orderly_shutdown] (0x0010): SIGTERM:
> killing children
> (Wed Mar 28 16:03:47 2018) [sssd[kcm]] [orderly_shutdown] (0x0010): SIGTERM:
> killing children
> (Wed Mar 28 16:11:22 2018) [sssd[kcm]] [orderly_shutdown] (0x0010): SIGTERM:
> killing children

Yep, I'm seeing these too with copr packages. Just FYI.

Comment 61 Fabiano Fidêncio 2018-03-28 21:08:44 UTC

Michael, Borjan,


Those messages are not an error. KCM shuts itself down after 300 seconds inactive (well, currently even when activities are happening ... I've opened a PR fixing this issue already submitted).

So, IOW, those messages are "normal" and *will* be toned down soon (I'll open a PR for this soon).

The important question ... KCM works for you Michael? Did the build solve the issue you had before?

Comment 62 Bojan Smojver 2018-03-28 21:13:43 UTC

(In reply to Fabiano Fidêncio from comment #61)

> So, IOW, those messages are "normal" and *will* be toned down soon (I'll
> open a PR for this soon).

Cool, thanks.

Comment 63 Michael Watters 2018-03-29 20:02:53 UTC

(In reply to Fabiano Fidêncio from comment #61)

> The important question ... KCM works for you Michael? Did the build solve
> the issue you had before?

Yes, the new packages are working great.  My hard drive hasn't filled up since I installed them.  :D

Comment 64 Fabiano Fidêncio 2018-03-30 13:37:51 UTC

Tomasz,

Please, I'd like to ask you to cc yourself to https://bugzilla.redhat.com/show_bug.cgi?id=1553056

That bug has a very similar logs than yours and I'd like to treat that one out of the scope of this one.

I'm going to close this one soon as the most part of the fixes were included in a Fedora build that I've done Today.

Deal? :-)

Comment 65 Tomasz Torcz 2018-03-30 14:06:07 UTC

Sure, thanks for looking into the logs.

Comment 66 Fedora Update System 2018-03-30 14:23:17 UTC

sssd-1.16.1-2.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-94e3753451

Comment 67 Fedora Update System 2018-03-30 14:26:48 UTC

sssd-1.16.1-2.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-5de5bfcbe2

Comment 68 Fedora Update System 2018-03-30 14:55:11 UTC

sssd-1.16.1-2.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-495d60f679

Comment 69 Fedora Update System 2018-04-01 04:28:30 UTC

sssd-1.16.1-2.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-5de5bfcbe2

Comment 70 Fedora Update System 2018-04-01 04:52:17 UTC

sssd-1.16.1-2.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-495d60f679

Comment 71 Fedora Update System 2018-04-01 04:57:12 UTC

sssd-1.16.1-2.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-94e3753451

Comment 72 Fedora Update System 2018-04-02 13:01:48 UTC

sssd-1.16.1-2.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 73 Michael Watters 2018-04-02 13:29:13 UTC

Since this issue also affects RHEL (CentOS) 7 will there be updates released for EL7 as well?

Comment 74 Fabiano Fidêncio 2018-04-02 13:33:15 UTC

(In reply to Michael Watters from comment #73)
> Since this issue also affects RHEL (CentOS) 7 will there be updates released
> for EL7 as well?

At some point, yes. Unfortunately I can't exactly tell you when it'll officially there. I'll add the patches to our sssd-1-16 copr repo (still have to discuss this with our team) and propose to backport the issues fixed so far (and hopefully also the ones to come).

Comment 75 Fedora Update System 2018-04-05 11:46:42 UTC

sssd-1.16.1-2.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 76 Fedora Update System 2018-04-05 15:00:45 UTC

sssd-1.16.1-2.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 77 Bojan Smojver 2018-04-15 10:04:21 UTC

Just as an aside here, this version of sssd_kcm from the latest RPM often appears near the top of top output on an otherwise idle machine. Both my VM (AD back end) and my laptop (krb5/ldap back end) often list this process with just under 2% of CPU utilisation. Seems unusual...

Comment 78 Michael Watters 2018-05-17 19:59:48 UTC

I just discovered that these packages cause sssd to segfault on CentOS 7.5.

journalctl shows the following info.

mdct-dev27b sssd[7722]: Starting up
mdct-dev27b kernel: sssd[7722]: segfault at 7fbd4d3ead58 ip 00007fbd4d3ead58 sp 00007ffd3eb93a38 error 15 in memberof.so[7fbd4d3ea00
mdct-dev27b abrt-hook-ccpp[7723]: Process 7722 (sssd) of user 0 killed by SIGSEGV - dumping core
mdct-dev27b abrt-hook-ccpp[7723]: Failed to create core_backtrace: dwfl_getthread_frames failed: No DWARF information found
mdct-dev27b systemd[1]: sssd.service: main process exited, code=dumped, status=11/SEGV
mdct-dev27b systemd[1]: Failed to start System Security Services Daemon.

Would it be possible to update these packages to work on 7.5?

Comment 79 Fabiano Fidêncio 2018-05-17 20:08:41 UTC

(In reply to Michael Watters from comment #78)
> I just discovered that these packages cause sssd to segfault on CentOS 7.5.
> 
> journalctl shows the following info.
> 
> mdct-dev27b sssd[7722]: Starting up
> mdct-dev27b kernel: sssd[7722]: segfault at 7fbd4d3ead58 ip 00007fbd4d3ead58
> sp 00007ffd3eb93a38 error 15 in memberof.so[7fbd4d3ea00
> mdct-dev27b abrt-hook-ccpp[7723]: Process 7722 (sssd) of user 0 killed by
> SIGSEGV - dumping core
> mdct-dev27b abrt-hook-ccpp[7723]: Failed to create core_backtrace:
> dwfl_getthread_frames failed: No DWARF information found
> mdct-dev27b systemd[1]: sssd.service: main process exited, code=dumped,
> status=11/SEGV
> mdct-dev27b systemd[1]: Failed to start System Security Services Daemon.
> 
> Would it be possible to update these packages to work on 7.5?

Michael,

The patches are now part of https://copr.fedorainfracloud.org/coprs/g/sssd/sssd-1-16/ (which is already built against 7.5)

Please, just use that repo and mine can be retired :-)

Also, let us know if that repo worked for you.

Comment 80 Michael Watters 2018-05-18 13:29:01 UTC

Fabiano,

Thanks.  The packages provided at https://copr.fedorainfracloud.org/coprs/g/sssd/sssd-1-16 do appear to work with CentOS 7.5.  SSH logins using password authentication with kerberos are now working again.

Comment 81 Chuck Lever 2020-08-10 19:07:18 UTC

Still seeing sssd_kcm in an infinite loop, filling up /var/log with the following:

(2020-08-10 15:03:04:680265): [kcm] [kcm_input_parse] (0x1000): Received message with length 0
(2020-08-10 15:03:04:680284): [kcm] [kcm_input_parse] (0x0020): Illegal zero-length message
(2020-08-10 15:03:04:680302): [kcm] [kcm_recv] (0x0010): Failed to parse data (74, Bad message), aborting client
(2020-08-10 15:03:04:680319): [kcm] [kcm_reply_error] (0x0040): KCM operation returs failure [74]: Bad message
(2020-08-10 15:03:04:680353): [kcm] [kcm_failbuf_construct] (0x1000): Sent reply with error -1765328188

This is Fedora 32, sssd-kcm-2.3.1-2.fc32.x86_64.

Comment 82 Jakub Hrozek 2020-08-13 08:40:54 UTC

(In reply to Chuck Lever from comment #81)
> Still seeing sssd_kcm in an infinite loop, filling up /var/log with the
> following:
> 
> (2020-08-10 15:03:04:680265): [kcm] [kcm_input_parse] (0x1000): Received
> message with length 0
> (2020-08-10 15:03:04:680284): [kcm] [kcm_input_parse] (0x0020): Illegal
> zero-length message
> (2020-08-10 15:03:04:680302): [kcm] [kcm_recv] (0x0010): Failed to parse
> data (74, Bad message), aborting client
> (2020-08-10 15:03:04:680319): [kcm] [kcm_reply_error] (0x0040): KCM
> operation returs failure [74]: Bad message
> (2020-08-10 15:03:04:680353): [kcm] [kcm_failbuf_construct] (0x1000): Sent
> reply with error -1765328188
> 
> This is Fedora 32, sssd-kcm-2.3.1-2.fc32.x86_64.

I haven't been working on SSSD is some time so I can't comment on the bug itself. At any rate, I would recommend to open a new bug, a comment on a closed bug is just going to get lost in the bugzilla mail storm.

Comment 83 Chuck Lever 2020-08-13 13:58:50 UTC

(In reply to Jakub Hrozek from comment #82)
> I haven't been working on SSSD in some time so I can't comment on the bug
> itself. At any rate, I would recommend to open a new bug, a comment on a
> closed bug is just going to get lost in the bugzilla mail storm.

Thank you for responding, Jakub. I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1868696