Description of problem: using group.db files in /var/db - users show up as being in the group when you run 'id' but when the user tries to chgrp a file they are not allowed to. We've made sure that: - the user has logged out and back in - the system was rebooted - nscd was off and its cache was cleared - the .db files appear to work for other users. We've had this happen on 3 different systems affecting 3 different users for 3 different groups. We've tested the same thing on FC4 and RHEL4 and CentOS4 and we can get it to happen on all of them but it's not the most consistent thing. It appears like the presence of group.db confuses the nss lookup. Version-Release number of selected component (if applicable): nss_db-2.2-29 How reproducible: Hard to reproduce - but try adding a group.db file - adding some entries into it and seeing if you can replicate it. I know this is not the most helpful bug report but unfortunately we can only get it to occur part of the time. I was hoping someone more aware of the code might know where the problem is coming from. Thank you.
Seth, when the user can't chgrp the file, do we have a list of supplemental groups of which the user is a member? There's one fix outstanding for an older release, but as far as we've been able to tell, the internal semantics of glibc's nsswitch subsystem made it unnecessary to make the same change for RHEL 4. I'll attach the patch. I understand that it's not an easily-reproducible thing, but if you can try a modified nss_db with the change long enough to rule it out as a fix, that'd be useful.
Actually, it's easier to test than that, because the fix for that (bug #152467) is already in the Raw Hide package. Can I get you to rebuild it for RHEL 4 and use that as a test?
Nalin, yes, I have a list of all the groups the user is a member, yes. And the group in question is listed. A couple of odd points: if I'm logged in as the user the following commands return different things, some of the time: id id myusername it seems like they should return the same thing, wouldn't it? is the fix in rawhide glibc or rawhide nss_db? I can rebuild either for rhel4 - but I'm wondering - is rawhide 2.3.90 glibc compatible with rhel4 or will it play hell? thanks, -sv
IIRC id with no arguments prints the group membership list for the current process, so if you've changed your primary group with 'newgrp', its output will change. With a user name (even your own), it just looks it up in the system databases. At an API level, I guess it's the difference between getgroups() and getgrouplist(). The change in question is in nss_db (2.2-33 and later, see nss_db-2.2-enoent.patch). As far as nss_db is concerned, you can probably even use the binary package from Raw Hide -- I don't see any versioned deps on newer glibc than we had in FC4, and the nsswitch ABI hasn't changed in years. I couldn't say if Raw Hide's glibc needs anything that isn't in RHEL4 without trying it...
okay - rebuilding it now and I'll let you know - I believe I still have two boxes actively displaying the behavior. thanks
I got it to rebuild okay outside of a chroot build environment - but if I build it inside a rhel4 mock chroot then I get errors about selinux not being available even though the selinux-devel package is in the chroot. Right now I'm trying to get it working on rhel4 so I can test the most acute problem we're seeing. any suggestions?
Try adding a buildrequires: on "ed" and building using the fedora-3-i386-core configuration. That should be sufficiently similar, and works on a Raw Hide system.
Aargh. Never mind.
The 2.2-35 package will rebuild cleanly.
Tested 2.2-35 - no change. Just to complicate matters: It appears that of the affected users it only happens when the user logs in using the kerberos/afs password and gets an afs token.
Three theories, then. One, the supplemental group membership list is losing entries when the two entries in the list which represent the PAG get added (compare the output of 'id -G' with what you're expecting). I'll gladly stick my fingers in my ears and say "can't hear you, AFS, la la la la". Two, your users are trying to do this *in* AFS, and something's wrong with the version of AFS you're running, because AFAIK that's always going to work (maybe they have to own the root directory of the volume, or have admin privs on the directory, but I'd have to dig into the reference docs to find the rule). Three, the database is hosed up somehow. Unlikely, but what the heck. Dump its contents with 'db_dump -p /var/db/group.db' and check the entries which have keys of the form '0'+(decimal number) for corruption. (The initgroups() call eventually iterates over these entries -- the other keys are used for lookup-by-name and lookup-by-gid.)
Found it. openafs is the crack. It's doing something odd with groups as PAG's and eating our groups. When we move our groups out of the < 500 gid range the group suddenly starts working.
As promised, "can't hear you, AFS, la la la la". Seriously though, if this is something that happens inside of the setpag pioctl, I don't think that there's much that can be done outside of OpenAFS to address it. I can do some spot checking (slap together a test program that calls initgroups() for one of these users, then dumps the value returned by getgroups(), calls setpag(), and repeats the getgroups()). I just need some real-world sample data to try to chase it down further. (Feel free to remove any identifying parts and change user names -- the combination of UIDs and GIDs is what I'm after.)
actually it's pretty easy to duplicate: 1. install rhel4 2. install mock on rhel4 3. install openafs 4. setup to get an afs token when you login 5. add yourself to the mock group 6. login to the machine, making sure to get an afs token 7. type 'id' check the two groups at the front of your group list 8. see if you can read the file: /usr/bin/mock-helper
Given that the group list appears to be sorted, it stands to reason that if the kernel module is overwriting the first two entries in the group list with the PAG information instead of prepending it, that groups with low GIDs would be lost. Are you running OpenAFS 1.4.1? I can't reproduce this with that version (via sshd using pam_krb5 2.1.15 with the "external=sshd" option on the PAM session line and attachment #713 from bug #918 at bugzilla.mindrot.org, or via console login).
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.