Bug 181906 - group.db using nss_db _sometimes_ works.
group.db using nss_db _sometimes_ works.
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: nss_db (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Nalin Dahyabhai
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-17 13:02 EST by Seth Vidal
Modified: 2012-06-20 12:16 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 12:16:00 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Seth Vidal 2006-02-17 13:02:59 EST
Description of problem:
using group.db files in /var/db - users show up as being in the group when you
run 'id' but when the user tries to chgrp a file they are not allowed to. We've
made sure that:

- the user has logged out and back in
- the system was rebooted
- nscd was off and its cache was cleared
- the .db files appear to work for other users.

We've had this happen on 3 different systems affecting 3 different users for 3
different groups. 

We've tested the same thing on FC4 and RHEL4 and CentOS4 and we can get it to
happen on all of them but it's not the most consistent thing. It appears like
the presence of group.db confuses the nss lookup.


Version-Release number of selected component (if applicable):

nss_db-2.2-29

How reproducible:
Hard to reproduce - but try adding a group.db file - adding some entries into it
and seeing if you can replicate it.


I know this is not the most helpful bug report but unfortunately we can only get
it to occur part of the time. I was hoping someone more aware of the code might
know where the problem is coming from.

Thank you.
Comment 1 Nalin Dahyabhai 2006-02-17 13:57:25 EST
Seth, when the user can't chgrp the file, do we have a list of supplemental
groups of which the user is a member?

There's one fix outstanding for an older release, but as far as we've been able
to tell, the internal semantics of glibc's nsswitch subsystem made it
unnecessary to make the same change for RHEL 4.  I'll attach the patch.  I
understand that it's not an easily-reproducible thing, but if you can try a
modified nss_db with the change long enough to rule it out as a fix, that'd be
useful.
Comment 2 Nalin Dahyabhai 2006-02-17 13:58:56 EST
Actually, it's easier to test than that, because the fix for that (bug #152467)
is already in the Raw Hide package.  Can I get you to rebuild it for RHEL 4 and
use that as a test?
Comment 3 Seth Vidal 2006-02-17 14:09:42 EST
Nalin,
 yes, I have a list of all the groups the user is a member, yes. And the group
in question is listed. A couple of odd points:

if I'm logged in as the user the following commands return different things,
some of the time:

id

id myusername

it seems like they should return the same thing, wouldn't it?

is the fix in rawhide glibc or rawhide nss_db?

I can rebuild either for rhel4 - but I'm wondering - is rawhide 2.3.90 glibc
compatible with rhel4 or will it play hell?

thanks,
-sv
Comment 4 Nalin Dahyabhai 2006-02-17 14:24:32 EST
IIRC id with no arguments prints the group membership list for the current
process, so if you've changed your primary group with 'newgrp', its output will
change.  With a user name (even your own), it just looks it up in the system
databases.  At an API level, I guess it's the difference between getgroups() and
getgrouplist().

The change in question is in nss_db (2.2-33 and later, see
nss_db-2.2-enoent.patch).  As far as nss_db is concerned, you can probably even
use the binary package from Raw Hide -- I don't see any versioned deps on newer
glibc than we had in FC4, and the nsswitch ABI hasn't changed in years.

I couldn't say if Raw Hide's glibc needs anything that isn't in RHEL4 without
trying it...
Comment 5 Seth Vidal 2006-02-17 14:33:27 EST
okay - rebuilding it now and I'll let you know - I believe I still have two
boxes actively displaying the behavior.

thanks
Comment 6 Seth Vidal 2006-02-17 15:46:56 EST
I got it to rebuild okay outside of a chroot build environment - but if I build
it inside a rhel4 mock chroot then I get errors about selinux not being
available even though the selinux-devel package is in the chroot.

Right now I'm trying to get it working on rhel4 so I can test the most acute
problem we're seeing.

any suggestions?
Comment 7 Nalin Dahyabhai 2006-02-17 17:43:02 EST
Try adding a buildrequires: on "ed" and building using the fedora-3-i386-core
configuration.  That should be sufficiently similar, and works on a Raw Hide system.
Comment 8 Nalin Dahyabhai 2006-02-17 17:44:41 EST
Aargh. Never mind.
Comment 9 Nalin Dahyabhai 2006-02-17 18:00:21 EST
The 2.2-35 package will rebuild cleanly.
Comment 10 Seth Vidal 2006-04-19 17:06:51 EDT
Tested 2.2-35 - no change.

Just to complicate matters:

It appears that of the affected users it only happens when the user logs in
using the kerberos/afs password and gets an afs token.

Comment 11 Nalin Dahyabhai 2006-04-19 18:51:07 EDT
Three theories, then.

One, the supplemental group membership list is losing entries when the two
entries in the list which represent the PAG get added (compare the output of 'id
-G' with what you're expecting).  I'll gladly stick my fingers in my ears and
say "can't hear you, AFS, la la la la".

Two, your users are trying to do this *in* AFS, and something's wrong with the
version of AFS you're running, because AFAIK that's always going to work (maybe
they have to own the root directory of the volume, or have admin privs on the
directory, but I'd have to dig into the reference docs to find the rule).

Three, the database is hosed up somehow.  Unlikely, but what the heck.  Dump its
contents with 'db_dump -p /var/db/group.db' and check the entries which have
keys of the form '0'+(decimal number) for corruption.  (The initgroups() call
eventually iterates over these entries -- the other keys are used for
lookup-by-name and lookup-by-gid.)
Comment 12 Seth Vidal 2006-04-26 11:37:53 EDT
Found it.

openafs is the crack.

It's doing something odd with groups as PAG's and eating our groups.

When we move our groups out of the < 500 gid range the group suddenly starts
working.

Comment 13 Nalin Dahyabhai 2006-04-26 11:56:45 EDT
As promised, "can't hear you, AFS, la la la la".

Seriously though, if this is something that happens inside of the setpag pioctl,
I don't think that there's much that can be done outside of OpenAFS to address it.  

I can do some spot checking (slap together a test program that calls
initgroups() for one of these users, then dumps the value returned by
getgroups(), calls setpag(), and repeats the getgroups()).  I just need some
real-world sample data to try to chase it down further.  (Feel free to remove
any identifying parts and change user names -- the combination of UIDs and GIDs
is what I'm after.)
Comment 14 Seth Vidal 2006-04-26 12:02:17 EDT
actually it's pretty easy to duplicate:

1. install rhel4
2. install mock on rhel4
3. install openafs
4. setup to get an afs token when you login
5. add yourself to the mock group
6. login to the machine, making sure to get an afs token
7. type 'id' check the two groups at the front of your group list
8. see if you can read the file: /usr/bin/mock-helper
Comment 15 Nalin Dahyabhai 2006-04-26 18:32:52 EDT
Given that the group list appears to be sorted, it stands to reason that if the
kernel module is overwriting the first two entries in the group list with the
PAG information instead of prepending it, that groups with low GIDs would be lost.

Are you running OpenAFS 1.4.1?  I can't reproduce this with that version (via
sshd using pam_krb5 2.1.15 with the "external=sshd" option on the PAM session
line and attachment #713 from bug #918 at bugzilla.mindrot.org, or via console
login).
Comment 16 Jiri Pallich 2012-06-20 12:16:00 EDT
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.

Note You need to log in before you can comment on or make changes to this bug.