Bug 1357418

Summary: SSSD fast cache for local users
Product: [Fedora] Fedora Reporter: Jan Kurik <jkurik>
Component: Changes TrackingAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact: Simon Clark <simon.richard.clark>
Priority: unspecified    
Version: 26CC: jhrozek, sgallagh, simon.richard.clark, wibrown
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: ChangeAcceptedF26, SystemWideChange
Fixed In Version: glibc-2.25-3.fc26, sssd-1.15.0-4.fc26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-25 17:04:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1427646    
Bug Blocks:    

Description Jan Kurik 2016-07-18 07:42:01 UTC
This is a tracking bug for Change: SSSD fast cache for local users
For more details, see: https://fedoraproject.org//wiki/Changes/SSSDCacheForLocalUsers

Enable resolving all users through the sss NSS modules for better performance.

Comment 1 Jan Kurik 2016-07-26 05:04:07 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 2 Jan Kurik 2016-07-26 05:22:46 UTC
On 2016-July-26, we have reached Fedora 25 Change Checkpoint: Completion deadline (testable).

At this point, all accepted changes should be substantially complete, and testable. Additionally, if a change is to be enabled by default, it must be enabled at Change Completion deadline as well.

Change tracking bug should be set to the MODIFIED state to indicate it achieved completeness.

Incomplete and non testable Changes will be reported to FESCo on 2016-July-29 meeting.

Comment 3 Jakub Hrozek 2016-07-26 07:57:58 UTC
(In reply to Jan Kurik from comment #2)
> On 2016-July-26, we have reached Fedora 25 Change Checkpoint: Completion
> deadline (testable).
> 
> At this point, all accepted changes should be substantially complete, and
> testable. Additionally, if a change is to be enabled by default, it must be
> enabled at Change Completion deadline as well.
> 
> Change tracking bug should be set to the MODIFIED state to indicate it
> achieved completeness.
> 
> Incomplete and non testable Changes will be reported to FESCo on
> 2016-July-29 meeting.

I'm afraid this feature still needs work and I propose to move it to Fedora-26.

Comment 4 Jan Kurik 2016-08-01 20:26:37 UTC
Based on the FESCo decision deferring this Change to F26: https://fedorahosted.org/fesco/ticket/1606#comment:3

Comment 5 Fedora End Of Life 2017-02-28 10:00:23 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 6 Jan Kurik 2017-02-28 10:08:33 UTC
On 2017-Feb-28, we have reached the Fedora 26 Change Checkpoint: Completion deadline (testable).

At this point, all accepted changes should be substantially complete, and testable. Additionally, if a change is to be enabled by default, it must be enabled at Change Completion deadline as well.

Change tracking bug should be set to the MODIFIED state to indicate it achieved completeness.

Incomplete and non testable Changes will be reported to FESCo for 2017-Mar-03 meeting.

Comment 7 Jan Kurik 2017-03-01 09:44:23 UTC
May I ask for status update on this Change, as requested in Comment #6 ?

Comment 8 Jakub Hrozek 2017-03-01 13:34:13 UTC
(In reply to Jan Kurik from comment #7)
> May I ask for status update on this Change, as requested in Comment #6 ?

As of the glibc and sssd versions in the FixedIn fields, we should be ready as far as code changes are concerned.

What I'm not sure about is if all the Fedora variants want to pull in sssd-common by default..

Comment 9 Jakub Hrozek 2017-03-02 16:50:07 UTC
since, all the packages that were touched are built and we are not planning to include any more changes (except the decision to include sssd in the default installation tracked by a separate FESCO ticket), I'm moving this tracker to MODIFIED.

Comment 10 wibrown@redhat.com 2017-04-26 23:19:17 UTC
Hi

I appreciate the work you have done for this change. I have some concerns about this process.

The first is that you say this is "not a system wide change". This is indeed a system wide change, which affects every user account. I think that this needs to be more clearly stated. Additionally, there are many deployments which will have altered their nsswitch / pam files for their own reasons. As a result, it is important that on an upgrade, this change does not affect those setups.

The next is the testing section. I am disappointed at how light this section is.

This is not just some little daemon, this is user id resolution that underpins every service on a system. I would like to insist that the following tests be performed:

* Single user mode must work (both init=bin/bash, and systemd.unit=rescue.target)
* User resolution must work with sssd disabled and be *as fast* as without sss in nsswitch. Pam must also work.
* The system be able to boot and operate with an *invalid* and sssd (or sssd disabled)
* The system must operate with an invalid passwd file (ie duplicate names, uids, etc.) Yes, this happens in the wild.
* ids should still be resolvable even under high "churn"  of id's in passwd. 

There may be more than this. I hope that you develop a more comprehensive testing plan. This is a large change, and I think that it can not be approached so lightly. 

Thanks,

Comment 11 wibrown@redhat.com 2017-04-26 23:45:04 UTC
As an additional assertion, all sss_cache flushing commands must work with this local user cache out of the box, and sss_cache -E must flush this cache also.

Comment 12 Stephen Gallagher 2017-04-26 23:51:17 UTC
(In reply to wibrown from comment #10)
> Hi
> 
> I appreciate the work you have done for this change. I have some concerns
> about this process.
> 
> The first is that you say this is "not a system wide change". This is indeed
> a system wide change, which affects every user account. I think that this
> needs to be more clearly stated. Additionally, there are many deployments
> which will have altered their nsswitch / pam files for their own reasons. As
> a result, it is important that on an upgrade, this change does not affect
> those setups.

FESCo agreed with you; you'll notice that the change was accepted as a System Wide Change.


> 
> The next is the testing section. I am disappointed at how light this section
> is.
> 
> This is not just some little daemon, this is user id resolution that
> underpins every service on a system. I would like to insist that the
> following tests be performed:
> 
> * Single user mode must work (both init=bin/bash, and
> systemd.unit=rescue.target)

Reasonable.

> * User resolution must work with sssd disabled and be *as fast* as without
> sss in nsswitch. Pam must also work.

The "as fast" statement is unrealistic. No matter what, it's always going to be at least some tiny bit slower if only because glibc will have to process down that codepath. That being said, it's been measured (Jakub has the details) and it's a negligible increase.

However, when SSSD is *enabled*, it's an order of magnitude faster than with 'sss' not in nsswitch.

> * The system be able to boot and operate with an *invalid* and sssd (or sssd
> disabled)

I agree, and that's actually been an upstream requirement since before SSSD 1.0.

> * The system must operate with an invalid passwd file (ie duplicate names,
> uids, etc.) Yes, this happens in the wild.

The way SSSD does this should be identical to the behavior of glibc in this regard, since what it's doing is just calling the nss_files.so routines directly under the hood and storing their results in the SSSD cache for faster replies.

That said, we should make sure that SSSD upstream is testing that case explicitly.

> * ids should still be resolvable even under high "churn"  of id's in passwd. 
> 

I don't know what you mean by "churn" here. If you mean that someone is constantly updating the /etc/passwd file, then the design actually *is* built to handle this (it just may fall back to doing so without the enhanced speed of the cache).


> There may be more than this. I hope that you develop a more comprehensive
> testing plan. This is a large change, and I think that it can not be
> approached so lightly. 
> 

I think you make good points and you're right, they probably should be enumerated in the Change page. That being said, upstream has tests to do nearly everything you just described (and its harness runs Fedora), so I think we're in better shape than you might expect.

Comment 13 Stephen Gallagher 2017-04-26 23:52:56 UTC
(In reply to wibrown from comment #11)
> As an additional assertion, all sss_cache flushing commands must work with
> this local user cache out of the box, and sss_cache -E must flush this cache
> also.

That's kind of a fuzzy statement, actually. SSSD monitors all of the files with inotify and will automatically flush its own cache any time that the contents change, so the use of the `sss_cache` tool should be entirely unnecessary.

Comment 14 wibrown@redhat.com 2017-04-26 23:58:43 UTC
(In reply to Stephen Gallagher from comment #12)
> (In reply to wibrown from comment #10)
> > Hi
> > 
> > I appreciate the work you have done for this change. I have some concerns
> > about this process.
> > 
> > The first is that you say this is "not a system wide change". This is indeed
> > a system wide change, which affects every user account. I think that this
> > needs to be more clearly stated. Additionally, there are many deployments
> > which will have altered their nsswitch / pam files for their own reasons. As
> > a result, it is important that on an upgrade, this change does not affect
> > those setups.
> 
> FESCo agreed with you; you'll notice that the change was accepted as a
> System Wide Change.
> 

I'm glad FESco agreed - but this is not reflected on the document listed https://fedoraproject.org/wiki/Changes/SSSDCacheForLocalUsers 


> 
> I think you make good points and you're right, they probably should be
> enumerated in the Change page. That being said, upstream has tests to do
> nearly everything you just described (and its harness runs Fedora), so I
> think we're in better shape than you might expect.

Even if these are "upstream", to not display them and place them on a feature page, is not very engineer-ing like. Additionally, there are uses cases, that can't be tested with CI etc, so we need to make sure that the integration tests are just as thorough at the upstream tests. All I'm asking is that we take a meticulous eye of quality on a change like this because else it will hurt our users. Given FESCo sees this as a system wide change, we must treat it with the respect and impact it deserves, and we must be diligent. 

> That's kind of a fuzzy statement, actually. SSSD monitors all of the files with inotify and will automatically flush its own cache any time that the contents change, so the use of the `sss_cache` tool should be entirely unnecessary.

That doesn't negate the need for sss_cache -E to flush the cache. More than once SSSD have released "caches" to features, but with no method to remove or flush them. It's important for issue resolution and diagnosis that this tool exists *even* if you think it "doesn't have to". It does have to. As a sysadmin, it's vital we have access to this capability to resolve issues. 

Thanks,

Comment 15 Jan Kurik 2017-05-17 09:13:26 UTC
On 2017-May-16 we reached the "Change Checkpoint: 100% Code Complete Deadline" milestone for Fedora 26 release. At this point all the Changes not at least in "ON_QA" state should be brought to FESCo for review. Please update the state of this bug to "ON_QA" if it is already 100% completed. Please let me know in case you have any trouble with the implementation and the Change needs any help or review.

Thanks, Jan

Comment 16 Jakub Hrozek 2017-05-17 15:09:22 UTC
My understanding is that this feature is complete. So far I haven't seen any bug reports either.

Comment 17 wibrown@redhat.com 2017-05-17 21:48:35 UTC
(In reply to Jakub Hrozek from comment #16)
> My understanding is that this feature is complete. So far I haven't seen any
> bug reports either.


That was not the point that I was making.

Comment 18 Simon Clark 2017-07-08 11:46:43 UTC
I have added release notes text for this change to the draft F26 Release Notes here: https://pagure.io/release-notes/c/a1df5faa44aa532f7746909505ad882a02e5c405?branch=f26

Please would you take a look at the draft and let me know if anything should be added or changed?

Comment 19 Jakub Hrozek 2017-07-10 13:29:33 UTC
(In reply to Simon Clark from comment #18)
> I have added release notes text for this change to the draft F26 Release
> Notes here:
> https://pagure.io/release-notes/c/
> a1df5faa44aa532f7746909505ad882a02e5c405?branch=f26
> 
> Please would you take a look at the draft and let me know if anything should
> be added or changed?

I'm sorry about the late reply; I was on vacation for the last week.

Yes, the release note sounds good to me.