Bug 1014834

Summary: Logind not working in current F20 (with offline FreeIPA auth)
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: sssdAssignee: Jakub Hrozek <jhrozek>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 20CC: abokovoy, awilliam, jhrozek, johannbg, lnykryn, lslebodn, mkosek, msekleta, nalin, nathaniel, pbrezina, plautrba, rcritten, rmainz, sbose, sgallagh, ssorce, systemd-maint, vpavlin, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-06-01 13:23:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1015089    
Bug Blocks:    
Attachments:
Description Flags
full journals from boot+login with the bug
none
list of packages updated since the last successful boot. note: version only accurate for the first few, for the others, the version is the *old* one.
none
patch to allow logind to continue if dir exists none

Description Adam Williamson 2013-10-02 21:41:45 UTC
Created attachment 806741 [details]
full journals from boot+login with the bug

I configured my domain for FreeIPA auth last week, and this week I'm in the UK with my laptop outside of my domain, so I've been on offline auth all week. Not sure if that's relevant to the bug, but it obviously may be.

Up until today it's been working fine (with sssd credential caching) as expected. Today I can log in fine, but loginctl is showing me with no user session, which has various consequences: sound doesn't work, 3D acceleration doesn't work, can't shut down / reboot from the desktop, etc.

I see these messages in the journal:

Oct 02 22:01:56 vaioz.happyassassin.net systemd-logind[848]: Failed to create runtime directory /run/user/1001: File exists
Oct 02 22:01:56 vaioz.happyassassin.net lightdm[1417]: pam_systemd(lightdm:session): Failed to create session: File exists

and nothing else really obviously related. I see these errors in every boot / login I've tried today. The last time I booted the system fresh was on Sep 29. In that session's logs, I see instead:

Sep 29 17:26:09 vaioz.happyassassin.net systemd[1]: Starting Session 1 of user adamw.
Sep 29 17:26:09 vaioz.happyassassin.net systemd-logind[815]: New session 1 of user adamw.
Sep 29 17:26:09 vaioz.happyassassin.net systemd[1]: Started Session 1 of user adamw.
Sep 29 17:26:09 vaioz.happyassassin.net systemd-logind[815]: Linked /tmp/.X11-unix/X0 to /run/user/1001/X11-display.

which is, you know, what I'd expect.

I've tried wiping /run/user/1001 and rebooting/re-logging in, so it's not just some transient weirdness with that directory. The date on the directory is within the same second as the error appears in the journal:

drwxrwxrwt. 2 root    root     40 2013-10-02 22:01:56.637509059 +0100 1001

Not really sure what's going on, to be honest. I'll attach a full log of a bad boot+login to desktop (I think this is using lightdm+xfce, but it behaves exactly the same with GDM+GNOME; that's where I first saw the bug, and switched to lightdm+xfce to see if it would fix it and also because Shell with software rendering is horrible). I can attach the 'successful' Sep 29 boot log too if it'd help.

I'll also attach a list of all the packages that were updated between the last successful and first broken boots.

Comment 1 Adam Williamson 2013-10-02 22:34:03 UTC
Created attachment 806773 [details]
list of packages updated since the last successful boot. note: version only accurate for the first few, for the others, the version is the *old* one.

Comment 2 Adam Williamson 2013-10-04 09:50:32 UTC
Hah! I tracked this one down myself. It was subtle: a setting written to /etc/krb5.conf by ipa-client-install:

 default_ccache_name = DIR:/run/user/%{uid}/krb5cc

I think this causes either a race between logind and kerberos or simply always causes kerberos to create /run/user/%uid before logind gets around to it, and logind barfs on that. Curious that /run/user/1001/krb5cc didn't exist - only the directory - but I guess there was nothing to cache?

Anyway, changing that line to the default from (very recent) stock krb5.conf:

default_ccache_name = KEYRING:persistent:%{uid}

seems to fix the bug and I have not noticed any problems caused by the change yet. I should obviously pull the FreeIPA folks in on this too, but is it still a bug in logind that it fails completely if something else has created /run/user/uid ?

Comment 3 Adam Williamson 2013-10-04 09:52:30 UTC
CCing mkosek and pviktori for ipa-client-install .

Comment 4 Jóhann B. Guðmundsson 2013-10-04 10:05:50 UTC
Sounds like the same issue as in #1015089

Comment 5 Alexander Bokovoy 2013-10-04 10:06:54 UTC
We need to solve bug #1015089 first, adding dependency.

Comment 6 Alexander Bokovoy 2013-10-04 10:08:47 UTC
(In reply to Jóhann B. Guðmundsson from comment #4)
> Sounds like the same issue as in #1015089
Note that this is not a duplicate, as FreeIPA will need to change default ccache type for new installs and fix existing one on upgrades.

Comment 7 Adam Williamson 2013-10-04 10:10:21 UTC
sounds like a job for a fedup hook!

Comment 8 Jóhann B. Guðmundsson 2013-10-04 10:26:33 UTC
Moving against correct compononent

Comment 9 Zbigniew Jędrzejewski-Szmek 2013-10-04 12:27:50 UTC
Created attachment 807605 [details]
patch to allow logind to continue if dir exists

Jóhann, what do you think about a patch like this?

Comment 10 Adam Williamson 2013-10-04 13:05:15 UTC
zbigniew: from the other bug, it looks like logind would actually be OK if the other process had created the dir with appropriate perms, so changes to systemd may not be needed.

Comment 11 Jóhann B. Guðmundsson 2013-10-04 13:09:10 UTC
Comment on attachment 807605 [details]
patch to allow logind to continue if dir exists

The patch looks nice and we most definitely should properly handle this gracefully on our part but it wont solve any race conditions with pam_systemd creating /run/user/UID in the pam_open_session() phase.

Comment 12 Zbigniew Jędrzejewski-Szmek 2013-10-04 18:11:04 UTC
Comment on attachment 807605 [details]
patch to allow logind to continue if dir exists

Hm, if freeipa creates a directory with root.root, mode 0777 that really is not suitable for /run/user/<uid>. Please disregard this patch.

Comment 13 Martin Kosek 2013-10-25 13:37:09 UTC
I found no reference to default_ccache_name neither in FreeIPA source code, nor in installed FreeIPA server/client. This option must have been set manually in krb5.conf after FreeIPA was installed.

Moving to krb5 component to consider what should be done with this bug - if documenting it or adding a fedup hook.

Comment 14 Nalin Dahyabhai 2013-10-25 15:49:02 UTC
A fedup hook won't affect anything here -- the default's new in F20, and updates from the F19 package should add it during %triggerun on krb5-libs < 1.11.3-16.
What happened on this system is the result of churn during the F20 cycle as the keyring changes went in, then had to be backed out while they were incomplete, and then reinstated.

If that directory's being created with the wrong permissions, in this configuration it looks like either SSSD or logind would be doing it, so we may need to reassign the bug to get it to be created with the right permissions.  If you can retry with "debug_level" set to 9 in the [domain/...] section of /etc/sssd/sssd.conf, the krb5_child.log should indicate whether or not SSSD is creating the directory.

If it's not specific to the permissions being wrong, then the problem of logind being upset if SSSD or some other process has already created /run/user/1001 before it goes to do so (so that they don't fail to store credentials in the directory when the directory doesn't exist yet) is going to have to be fixed.

Comment 15 Zbigniew Jędrzejewski-Szmek 2013-10-26 16:49:47 UTC
> If it's not specific to the permissions being wrong...
It is. Systemd will not complain if the directory exists and has the expected mode and ownership:

http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/mkdir.c?id=HEAD#n34

Comment 16 Nalin Dahyabhai 2013-10-28 17:49:47 UTC
Thanks for chasing that down.  I'm seeing log messages from pam_sss but not pam_krb5.  As libkrb5 only has logic to create the final component in the path (/run/user/$UID/krb5cc) if it doesn't exist, it's not likely that incorrect permissions on an intermediate directory (/run/user/$UID) are being created by libkrb5, and that all suggests that SSSD is what's creating the intermediate directories.  Moving this to sssd.

Comment 17 Jakub Hrozek 2013-10-28 20:25:56 UTC
(In reply to Nalin Dahyabhai from comment #16)
> Thanks for chasing that down.  I'm seeing log messages from pam_sss but not
> pam_krb5.  As libkrb5 only has logic to create the final component in the
> path (/run/user/$UID/krb5cc) if it doesn't exist, it's not likely that
> incorrect permissions on an intermediate directory (/run/user/$UID) are
> being created by libkrb5, and that all suggests that SSSD is what's creating
> the intermediate directories.  Moving this to sssd.

What SSSD version is this?

Comment 18 Lukas Slebodnik 2013-10-29 15:09:57 UTC
There were lot of bugs with DIR ccache (DIR:/run/user/%{uid}/krb5cc)
in fedora 19. (BZ965133, BZ986610, ...)
You can read very log discussion on fedora devel mailing list. https://lists.fedoraproject.org/pipermail/devel/2013-July/186930.html

This was a reason why new keyring cache was created (BZ991169)
and "KEYRING:persistent:%{uid}" is a default value in fedora 20

Martin wrote in comment 1014834#c13, that ipa-client-install did not touch
default_ccache_name in krb5.conf

Did you change it yourself?
Do you want to have DIR ccache?

Comment 19 Jakub Hrozek 2013-10-31 17:00:36 UTC
The latest SSSD I built today defaults to using the kernel keyring cache, can you try that?

Comment 20 Jakub Hrozek 2014-03-21 11:08:13 UTC
Ping, do you still see the bug?

Comment 21 Adam Williamson 2014-03-21 15:01:10 UTC
well, no, but I already 'fixed' it for myself in c#2, months ago. the bug was remaining open, I believe, because we wanted to ensure other people would not have to make that modification manually on upgrade.

Comment 22 Fedora End Of Life 2015-05-29 09:30:11 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 23 Alexander Bokovoy 2015-06-01 13:23:58 UTC
Closing it now that we have three releases with KEYRING ccache type out there.