1261093 – nss-altfiles does not fill out group member array

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1261093 - nss-altfiles does not fill out group member array

Summary: nss-altfiles does not fill out group member array

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	nss-altfiles
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	rc
Target Release:	7.3
Assignee:	Colin Walters
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-08 14:46 UTC by Fabian Deutsch
Modified:	2020-12-15 07:36 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-12-15 07:36:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1171291	0	unspecified	CLOSED	Add nss-altfiles to rpcbind user lookup path	2021-02-22 00:41:40 UTC

Internal Links: 1171291

Description Fabian Deutsch 2015-09-08 14:46:53 UTC

Description of problem:
Python's grp package does not use NSS but queries the grp file directly (through grp.h whatever component is owning it).
This is a problem on systems where alternative locations are used for storing groups i.e. RHEL Atomic.

To fix this grp either needs to use NSS, or there needs to be a new way to get a unified view of the groups (and users) on a RHEL Atomic host.

This is similar to bug 1171291.

Version-Release number of selected component (if applicable):
RHEL Atomic 7

How reproducible:
always

Steps to Reproduce:
1. Ensure that a user is only in /usr/lib/passwd and that this user is a member of a group only defined in /usr/lib/group
2. python -c "import grp ; assert any([g.gr_name for g in grp.getgrall() if THE_USER in g.gr_mem])"
3.

Actual results:
This should raise an assertion error, because grp does not see that THE_USER is part of a group

Expected results:


Additional info:

Comment 2 Colin Walters 2015-09-08 15:52:57 UTC

Hmm...I think it's common for non-files based NSS modules to leave the `gr_mem` variable incomplete, because it can be *very* expensive to compute for large setups.

This isn't really a Python bug - it's just wrapping what glibc is doing.

That said, nss-altfiles is introducing this for *system* users, whereas things like SSSD typically would just do a truncated list for *human* users.

The `getgrouplist(3)` system call is an explicit way to retrieve all members if one wants it.  http://bugs.python.org/issue9344 might be relevant here, though of course you could use ctypes or a C extension module too.

Why do you need to do this?

Comment 3 Colin Walters 2015-09-08 15:54:42 UTC

I'm going to use my Phone-A-Friend lifeline.

Comment 4 Stephen Gallagher 2015-09-08 16:32:14 UTC

(In reply to Colin Walters from comment #2)
> Hmm...I think it's common for non-files based NSS modules to leave the
> `gr_mem` variable incomplete, because it can be *very* expensive to compute
> for large setups.
> 
> This isn't really a Python bug - it's just wrapping what glibc is doing.
> 
> That said, nss-altfiles is introducing this for *system* users, whereas
> things like SSSD typically would just do a truncated list for *human* users.
> 

Actually, SSSD has only two settings. The default behavior is for getgr[nam|gid]() to return every member of the group (which can be painfully large, yes). This should always be the complete list, barring bugs in SSSD or odd LDAP configurations.

We have an option to turn that off and always return an EMPTY gr_mem list, which speeds up operation significantly.

> The `getgrouplist(3)` system call is an explicit way to retrieve all members
> if one wants it.  http://bugs.python.org/issue9344 might be relevant here,
> though of course you could use ctypes or a C extension module too.
> 

Well, this is not the same thing. getgrouplist() takes a user as an argument and returns all the groups that the user belongs to. However, this is the inverse of what this bug is addressing, which is wanting to get a list of all users that belong to a group.


> Why do you need to do this?

The most common valid reason would be if you want to display a filtered selection dialog. If you're attempting to do access-control based on group membership, then you absolutely should be using getgrouplist() rather than getgrnam() because it will be MUCH faster and more reliable.


All that said, if python's 'grp' module isn't actually using the full name-service interface, that's DEFINITELY a bug.

Comment 5 Fabian Deutsch 2015-09-08 18:32:07 UTC

(In reply to Colin Walters from comment #2)

[…]

> Why do you need to do this?

We use it to check whether a (system) user is part of a group:

>  60     groups = [g.gr_name for g in grp.getgrall()
>  61               if constants.SANLOCK_USER in g.gr_mem]
>  62     gid = pwd.getpwnam(constants.SANLOCK_USER).pw_gid
>  63     groups.append(grp.getgrgid(gid).gr_name)
>  64     if all(group in groups for group in SANLOCK_GROUPS):
>  65         configured = MAYBE

Source: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/tool/configurators/sanlock.py;hb=HEAD#l59

And the assumption is that we get the complete list of valid users on that system.

The python docs actually say: "This module provides access to the Unix group database." (https://docs.python.org/2/library/grp.html)
This sounds very much like it's basically just accessing the real unix group db (/etc/group), and not doing "fancy" nsswitch stuff.
So it could be that python is doing the right thing, we just made wrong assumptions on it.

But, I'm no expert on those matters.

(In reply to Stephen Gallagher from comment #4)
[…]
> All that said, if python's 'grp' module isn't actually using the full
> name-service interface, that's DEFINITELY a bug.

As said above, I'm not sure if the grp package is intended to just query the groups file, instead of using nss.

From the Atomic - or nss - POV grp should be fixed.

Comment 6 Colin Walters 2015-09-08 19:03:46 UTC

(In reply to Fabian Deutsch from comment #5)

> We use it to check whether a (system) user is part of a group:

Yes but the question was more why are you doing *that* =)  *clicks on the link*

Okay that's kind of strange code, what is...*google search for sanlock*...wait...holy cow!  PAXOS based on shared storage!   I'm not sure I'd trust this until it'd been through a round of https://aphyr.com/tags/jepsen =)

(FWIW in Project Atomic we're currently using etcd for cluster locking beneath Kubernetes)

Anyways...this function looks like it'd be a lot simpler if it just *always* looked in /proc/<pid>/status for the groups it expected to find, and restarted the service if not, or something like that.  Then you wouldn't need to involve the NSS database at all.

> From the Atomic - or nss - POV grp should be fixed.

If anything this would be a nss-altfiles bug.  I am not aware of a way to fix it though short of merging `files` and `altfiles` together.  Which wouldn't be *too* hard...but changing the configurator code to not need it would also likely improve it.

Comment 7 Fabian Deutsch 2015-09-09 09:26:32 UTC

(In reply to Colin Walters from comment #6)
> (In reply to Fabian Deutsch from comment #5)
> 
> > We use it to check whether a (system) user is part of a group:
> 
> Yes but the question was more why are you doing *that* =)  *clicks on the
> link*
> 
> Okay that's kind of strange code, what is...*google search for
> sanlock*...wait...holy cow!  PAXOS based on shared storage!   I'm not sure
> I'd trust this until it'd been through a round of
> https://aphyr.com/tags/jepsen =)

Nice link!

> (FWIW in Project Atomic we're currently using etcd for cluster locking
> beneath Kubernetes)
> 
> Anyways...this function looks like it'd be a lot simpler if it just *always*
> looked in /proc/<pid>/status for the groups it expected to find, and
> restarted the service if not, or something like that.  Then you wouldn't
> need to involve the NSS database at all.

Once we know the gid of a group, we can use procfs - But to resolve the group name into a gid, is where NSS or grp comes into the game.

> > From the Atomic - or nss - POV grp should be fixed.
> 
> If anything this would be a nss-altfiles bug.  I am not aware of a way to
> fix it though short of merging `files` and `altfiles` together.  Which
> wouldn't be *too* hard...but changing the configurator code to not need it
> would also likely improve it.

Yes, I'll also look if we can improve the configurator, but the group name resolution will still be an issue.

Comment 8 Colin Walters 2015-09-10 14:00:48 UTC

(In reply to Fabian Deutsch from comment #7)

> Once we know the gid of a group, we can use procfs - But to resolve the
> group name into a gid, is where NSS or grp comes into the game.

That part definitely works, if it didn't nss-altfiles wouldn't be useful.  Just try it:

# getent group kube
kube:x:994:
#

Comment 9 Fabian Deutsch 2015-09-10 14:03:18 UTC

Let me reprhase: The group name resolution in python using the grp module does not work, because of the mentioned issues.
And thus, we can not resolve the group name to an id (natively in python) and can thus not leverage procfs.

The name group name resolution in general (using nss aware tools) does work.

Comment 10 Colin Walters 2015-09-14 16:35:28 UTC

$ atomic host status
  TIMESTAMP (UTC)         VERSION   ID             OSNAME               REFSPEC                                                        
* 2015-07-30 15:41:21     7.1.4     1dbbbc0bc3     rhel-atomic-host     rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard  
$ python
Python 2.7.5 (default, Apr  9 2015, 11:03:32) 
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import grp
>>> grp.getgrnam('kube')
grp.struct_group(gr_name='kube', gr_passwd='x', gr_gid=994, gr_mem=[])
>>> import pwd
>>> pwd.getpwnam('kube')
pwd.struct_passwd(pw_name='kube', pw_passwd='x', pw_uid=996, pw_gid=994, pw_gecos='Kubernetes user', pw_dir='/', pw_shell='/sbin/nologin')
>>> 

What doesn't work is the gr_mem variable, but since this code just needs to go name -> gid -> scrape /proc, it doesn't need that, right?

Comment 11 Fabian Deutsch 2015-09-15 06:54:44 UTC

I agree that this will fix our problem. But is it expected that every consumer works around this by adjusting their code?

Comment 12 Colin Walters 2015-09-15 14:01:31 UTC

(In reply to Fabian Deutsch from comment #11)
> I agree that this will fix our problem. But is it expected that every
> consumer works around this by adjusting their code?

This is the first program I've seen enumerating *system* group members.  I agree it's a bug though.

Comment 14 RHEL Program Management 2020-12-15 07:36:37 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Note You need to log in before you can comment on or make changes to this bug.