Bug 959980

Summary: getgrouplist returns a different sort order via nscd
Product: Red Hat Enterprise Linux 6 Reporter: Deepak Das <ddas>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED NOTABUG QA Contact: qe-baseos-tools-bugs
Severity: low Docs Contact:
Priority: unspecified    
Version: 6.4CC: fweimer, law, mfranc, pfrankli, phoned, spoyarek
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-05-09 13:52:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Deepak Das 2013-05-06 10:54:12 UTC
Created attachment 744073 [details]
id-withnscd

Description of problem:

Using "id" command (rpm package coreutils) I am get different supplementary group sequence when the command is executed with and without "nscd" service. Example based on my id is given below

Without nscd
========
uid=501(testuser) gid=501(testuser) groups=501(testuser),3(sys),100(users),491(fuse),513(support)


With nscd
=======
uid=501(testuser) gid=501(testuser) groups=3(sys),100(users),491(fuse),501(testuser),513(support)

id command uses function "getgrouplist" under compat-glibc  to get the supplementary group list. 

I have the ltrace for both with and with nscd use and I have attached it to bugzilla. Please let me know if this is a bug?

Version-Release number of selected component (if applicable):


How reproducible:

You can replicate the issue on your laptop/computer by using the following steps

a) Without nscd
=========
i) service nscd stop
ii) id <username>

b) With nscd
=========
i) service nscd start
ii) id <username>
  
Actual results:


Expected results:


Additional info:

Comment 2 Siddhesh Poyarekar 2013-05-06 13:40:59 UTC
The default getgrouplist() call without nscd first adds the default gid to the group list and then appends the list of groups in the order that they were returned from the data source (/etc/groups, ldap, etc.).  That's not how nscd behaves.  nscd gets the group list for the user in the order returned from the data source.  Then, if the input gid is not present in the returned list, the group is appended to the list.

We don't guarantee any kind of order in the returned groups list, so I am not sure if this is a bug.  However, I do think that the outputs with and without nscd should be as consistent as we can get them to be.

To make the nscd result consistent with the default, we could add the default group first and then add the returned groups one by one, skipping over the already added group.  There will be a performance hit to this though, since I'll have to choose between calling read() multiple times vs duplicating at least part of the buffer.

The other way around is simpler.  We simply add the default group at the end like in nscd, thus allowing the nss provider to add the default group in the order that it finds it (if it does).  It would also be faster I guess, since we skip the searching through the list at every group append.  The downside however is that the code change magnitude is probably higher - it'll require removal of the search code from all nss modules.  It's not harmful to leave it in though; it's merely a performance hit.

Carlos, what do you think?  I think I prefer the second approach.

Comment 4 David 2013-05-08 13:23:42 UTC
I was the trigger for this bugzilla. 

The application we had issue with was using the secondary group list (of which all unix's will put the primary group first) to set proper permissions form a setuid root binary. Using the order nscd did, it was using whatever the first secondary group was as the primary, which is wrong.

If you make it "consistent" using the second method (changing non-nscd to put the group at the end) you will completely break our setup and we'll need to deal with local patches and non-sense just to make an enterprise application work.

Thanks,

Comment 5 Siddhesh Poyarekar 2013-05-08 13:33:11 UTC
Thanks for your feedback David - this occurred to me earlier today and had intended to update the bugzilla mentioning that applications will usually expect the default sort order (despite it not being explicitly documented as such) and that we'll end up breaking them by changing it to the nscd order.

Comment 6 Carlos O'Donell 2013-05-08 14:28:28 UTC
(1) Other applications that break if we change anything.

There can equally well be a number of applications that rely on the existing behaviour of nscd returning a numerically sorted list from the group database.

Do we risk breaking other applications?

(2) No guarantee on order.

The order of the returned entries does not have to be consistent and no guarantee about it is provided. No other OS I know provides this guarantee, even if the implementation appears to keep a consistent order.

Users expect implementers to do everything possible to be as *fast* as possible. Within the requirements of the API that can include the list being in a different order.

In this case the list order is exactly the way it is because we are trying to be as fast as possible and use as little memory as possible.

Do we penalize all other applications because one application expects the default gid to be at the start of the list?

Summary
=======

My answer to (1) is "Yes if it makes the lookup faster", and my answer to (2) is "No."

If you want the default gid you must specifically ask for it using getpw*, that's just the way the API works.

I'm not opposed to changing the ordering to make it consistent, but if any change is going to be made it will be to increase the speed of the lookup, not to make it match what getgrouplist returns without nscd.

The lookup order is consistent given the same configuration and in my opinion that's all that we will guarantee. The lookup is consistently the same order without nscd, and consistently the same order with nscd. Once you change the configuration of your system, expect some changes (including performance gains with a correctly configured nscd).

In summary this is "CLOSED/NOTABUG" or an RFE which we will mark "CLOSED/UPSTREAM" and file an upstream enhancement.

I'd like a comment from the user before doing either of those.

Comment 7 David 2013-05-09 13:46:24 UTC
I agree it should not be expected to be in any order. I'm not sure why this application works the way it does, but we have to support and live with it either way.

If you believe it's OK for it to be consistent with no-nscd vs nscd then I am fine with keeping things status quo. We will keep nscd disabled or at the very least disable group caching.

It was just quite unexpected to us and we wanted to verify if this is the way it should work or not. 

Hopefully if anyone else runs into this they will at least be able to find this bugzilla for reference.

Thanks,

Comment 8 Carlos O'Donell 2013-05-09 13:52:42 UTC
(In reply to comment #7)
> I agree it should not be expected to be in any order. I'm not sure why this
> application works the way it does, but we have to support and live with it
> either way.
> 
> If you believe it's OK for it to be consistent with no-nscd vs nscd then I
> am fine with keeping things status quo. We will keep nscd disabled or at the
> very least disable group caching.
> 
> It was just quite unexpected to us and we wanted to verify if this is the
> way it should work or not. 
> 
> Hopefully if anyone else runs into this they will at least be able to find
> this bugzilla for reference.

David,

Thanks for your comments.

Yes, I do believe it's OK for two different system configurations to return the list in different order, it's part of the optimization the system is doing to be as fast as possible.

We've been working in the background on a "runtime tunables" project that might be able to put a knob at this location to allow you to tweak the runtime behaviour at the cost of performance. I've added your use case to the list of customer use cases. That way we'll remember this case.

Marking as CLOSED / NOTABUG.

Cheers,
Carlos.