Red Hat Bugzilla – Bug 181906
group.db using nss_db _sometimes_ works.
Last modified: 2012-06-20 12:16:00 EDT
Description of problem:
using group.db files in /var/db - users show up as being in the group when you
run 'id' but when the user tries to chgrp a file they are not allowed to. We've
made sure that:
- the user has logged out and back in
- the system was rebooted
- nscd was off and its cache was cleared
- the .db files appear to work for other users.
We've had this happen on 3 different systems affecting 3 different users for 3
We've tested the same thing on FC4 and RHEL4 and CentOS4 and we can get it to
happen on all of them but it's not the most consistent thing. It appears like
the presence of group.db confuses the nss lookup.
Version-Release number of selected component (if applicable):
Hard to reproduce - but try adding a group.db file - adding some entries into it
and seeing if you can replicate it.
I know this is not the most helpful bug report but unfortunately we can only get
it to occur part of the time. I was hoping someone more aware of the code might
know where the problem is coming from.
Seth, when the user can't chgrp the file, do we have a list of supplemental
groups of which the user is a member?
There's one fix outstanding for an older release, but as far as we've been able
to tell, the internal semantics of glibc's nsswitch subsystem made it
unnecessary to make the same change for RHEL 4. I'll attach the patch. I
understand that it's not an easily-reproducible thing, but if you can try a
modified nss_db with the change long enough to rule it out as a fix, that'd be
Actually, it's easier to test than that, because the fix for that (bug #152467)
is already in the Raw Hide package. Can I get you to rebuild it for RHEL 4 and
use that as a test?
yes, I have a list of all the groups the user is a member, yes. And the group
in question is listed. A couple of odd points:
if I'm logged in as the user the following commands return different things,
some of the time:
it seems like they should return the same thing, wouldn't it?
is the fix in rawhide glibc or rawhide nss_db?
I can rebuild either for rhel4 - but I'm wondering - is rawhide 2.3.90 glibc
compatible with rhel4 or will it play hell?
IIRC id with no arguments prints the group membership list for the current
process, so if you've changed your primary group with 'newgrp', its output will
change. With a user name (even your own), it just looks it up in the system
databases. At an API level, I guess it's the difference between getgroups() and
The change in question is in nss_db (2.2-33 and later, see
nss_db-2.2-enoent.patch). As far as nss_db is concerned, you can probably even
use the binary package from Raw Hide -- I don't see any versioned deps on newer
glibc than we had in FC4, and the nsswitch ABI hasn't changed in years.
I couldn't say if Raw Hide's glibc needs anything that isn't in RHEL4 without
okay - rebuilding it now and I'll let you know - I believe I still have two
boxes actively displaying the behavior.
I got it to rebuild okay outside of a chroot build environment - but if I build
it inside a rhel4 mock chroot then I get errors about selinux not being
available even though the selinux-devel package is in the chroot.
Right now I'm trying to get it working on rhel4 so I can test the most acute
problem we're seeing.
Try adding a buildrequires: on "ed" and building using the fedora-3-i386-core
configuration. That should be sufficiently similar, and works on a Raw Hide system.
Aargh. Never mind.
The 2.2-35 package will rebuild cleanly.
Tested 2.2-35 - no change.
Just to complicate matters:
It appears that of the affected users it only happens when the user logs in
using the kerberos/afs password and gets an afs token.
Three theories, then.
One, the supplemental group membership list is losing entries when the two
entries in the list which represent the PAG get added (compare the output of 'id
-G' with what you're expecting). I'll gladly stick my fingers in my ears and
say "can't hear you, AFS, la la la la".
Two, your users are trying to do this *in* AFS, and something's wrong with the
version of AFS you're running, because AFAIK that's always going to work (maybe
they have to own the root directory of the volume, or have admin privs on the
directory, but I'd have to dig into the reference docs to find the rule).
Three, the database is hosed up somehow. Unlikely, but what the heck. Dump its
contents with 'db_dump -p /var/db/group.db' and check the entries which have
keys of the form '0'+(decimal number) for corruption. (The initgroups() call
eventually iterates over these entries -- the other keys are used for
lookup-by-name and lookup-by-gid.)
openafs is the crack.
It's doing something odd with groups as PAG's and eating our groups.
When we move our groups out of the < 500 gid range the group suddenly starts
As promised, "can't hear you, AFS, la la la la".
Seriously though, if this is something that happens inside of the setpag pioctl,
I don't think that there's much that can be done outside of OpenAFS to address it.
I can do some spot checking (slap together a test program that calls
initgroups() for one of these users, then dumps the value returned by
getgroups(), calls setpag(), and repeats the getgroups()). I just need some
real-world sample data to try to chase it down further. (Feel free to remove
any identifying parts and change user names -- the combination of UIDs and GIDs
is what I'm after.)
actually it's pretty easy to duplicate:
1. install rhel4
2. install mock on rhel4
3. install openafs
4. setup to get an afs token when you login
5. add yourself to the mock group
6. login to the machine, making sure to get an afs token
7. type 'id' check the two groups at the front of your group list
8. see if you can read the file: /usr/bin/mock-helper
Given that the group list appears to be sorted, it stands to reason that if the
kernel module is overwriting the first two entries in the group list with the
PAG information instead of prepending it, that groups with low GIDs would be lost.
Are you running OpenAFS 1.4.1? I can't reproduce this with that version (via
sshd using pam_krb5 2.1.15 with the "external=sshd" option on the PAM session
line and attachment #713 from bug #918 at bugzilla.mindrot.org, or via console
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life.
Please See https://access.redhat.com/support/policy/updates/errata/
If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.