Bug 1261093

Summary:	nss-altfiles does not fill out group member array
Product:	Red Hat Enterprise Linux 7	Reporter:	Fabian Deutsch <fdeutsch>
Component:	nss-altfiles	Assignee:	Colin Walters <walters>
Status:	CLOSED WONTFIX	QA Contact:	Martin Jenner <mjenner>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	7.2	CC:	sgallagh, walters
Target Milestone:	rc	Keywords:	Extras
Target Release:	7.3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-12-15 07:36:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Fabian Deutsch 2015-09-08 14:46:53 UTC

Description of problem:
Python's grp package does not use NSS but queries the grp file directly (through grp.h whatever component is owning it).
This is a problem on systems where alternative locations are used for storing groups i.e. RHEL Atomic.

To fix this grp either needs to use NSS, or there needs to be a new way to get a unified view of the groups (and users) on a RHEL Atomic host.

This is similar to bug 1171291.

Version-Release number of selected component (if applicable):
RHEL Atomic 7

How reproducible:
always

Steps to Reproduce:
1. Ensure that a user is only in /usr/lib/passwd and that this user is a member of a group only defined in /usr/lib/group
2. python -c "import grp ; assert any([g.gr_name for g in grp.getgrall() if THE_USER in g.gr_mem])"
3.

Actual results:
This should raise an assertion error, because grp does not see that THE_USER is part of a group

Expected results:


Additional info:

Comment 2 Colin Walters 2015-09-08 15:52:57 UTC

Hmm...I think it's common for non-files based NSS modules to leave the `gr_mem` variable incomplete, because it can be *very* expensive to compute for large setups.

This isn't really a Python bug - it's just wrapping what glibc is doing.

That said, nss-altfiles is introducing this for *system* users, whereas things like SSSD typically would just do a truncated list for *human* users.

The `getgrouplist(3)` system call is an explicit way to retrieve all members if one wants it.  http://bugs.python.org/issue9344 might be relevant here, though of course you could use ctypes or a C extension module too.

Why do you need to do this?

Comment 3 Colin Walters 2015-09-08 15:54:42 UTC

I'm going to use my Phone-A-Friend lifeline.

Comment 4 Stephen Gallagher 2015-09-08 16:32:14 UTC

(In reply to Colin Walters from comment #2)
> Hmm...I think it's common for non-files based NSS modules to leave the
> `gr_mem` variable incomplete, because it can be *very* expensive to compute
> for large setups.
> 
> This isn't really a Python bug - it's just wrapping what glibc is doing.
> 
> That said, nss-altfiles is introducing this for *system* users, whereas
> things like SSSD typically would just do a truncated list for *human* users.
> 

Actually, SSSD has only two settings. The default behavior is for getgr[nam|gid]() to return every member of the group (which can be painfully large, yes). This should always be the complete list, barring bugs in SSSD or odd LDAP configurations.

We have an option to turn that off and always return an EMPTY gr_mem list, which speeds up operation significantly.

> The `getgrouplist(3)` system call is an explicit way to retrieve all members
> if one wants it.  http://bugs.python.org/issue9344 might be relevant here,
> though of course you could use ctypes or a C extension module too.
> 

Well, this is not the same thing. getgrouplist() takes a user as an argument and returns all the groups that the user belongs to. However, this is the inverse of what this bug is addressing, which is wanting to get a list of all users that belong to a group.


> Why do you need to do this?

The most common valid reason would be if you want to display a filtered selection dialog. If you're attempting to do access-control based on group membership, then you absolutely should be using getgrouplist() rather than getgrnam() because it will be MUCH faster and more reliable.


All that said, if python's 'grp' module isn't actually using the full name-service interface, that's DEFINITELY a bug.

Comment 5 Fabian Deutsch 2015-09-08 18:32:07 UTC

(In reply to Colin Walters from comment #2)

[…]

> Why do you need to do this?

We use it to check whether a (system) user is part of a group:

>  60     groups = [g.gr_name for g in grp.getgrall()
>  61               if constants.SANLOCK_USER in g.gr_mem]
>  62     gid = pwd.getpwnam(constants.SANLOCK_USER).pw_gid
>  63     groups.append(grp.getgrgid(gid).gr_name)
>  64     if all(group in groups for group in SANLOCK_GROUPS):
>  65         configured = MAYBE

Source: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=lib/vdsm/tool/configurators/sanlock.py;hb=HEAD#l59

And the assumption is that we get the complete list of valid users on that system.

The python docs actually say: "This module provides access to the Unix group database." (https://docs.python.org/2/library/grp.html)
This sounds very much like it's basically just accessing the real unix group db (/etc/group), and not doing "fancy" nsswitch stuff.
So it could be that python is doing the right thing, we just made wrong assumptions on it.

But, I'm no expert on those matters.

(In reply to Stephen Gallagher from comment #4)
[…]
> All that said, if python's 'grp' module isn't actually using the full
> name-service interface, that's DEFINITELY a bug.

As said above, I'm not sure if the grp package is intended to just query the groups file, instead of using nss.

From the Atomic - or nss - POV grp should be fixed.

Comment 6 Colin Walters 2015-09-08 19:03:46 UTC

(In reply to Fabian Deutsch from comment #5)

> We use it to check whether a (system) user is part of a group:

Yes but the question was more why are you doing *that* =)  *clicks on the link*

Okay that's kind of strange code, what is...*google search for sanlock*...wait...holy cow!  PAXOS based on shared storage!   I'm not sure I'd trust this until it'd been through a round of https://aphyr.com/tags/jepsen =)

(FWIW in Project Atomic we're currently using etcd for cluster locking beneath Kubernetes)

Anyways...this function looks like it'd be a lot simpler if it just *always* looked in /proc/<pid>/status for the groups it expected to find, and restarted the service if not, or something like that.  Then you wouldn't need to involve the NSS database at all.

> From the Atomic - or nss - POV grp should be fixed.

If anything this would be a nss-altfiles bug.  I am not aware of a way to fix it though short of merging `files` and `altfiles` together.  Which wouldn't be *too* hard...but changing the configurator code to not need it would also likely improve it.

Comment 7 Fabian Deutsch 2015-09-09 09:26:32 UTC

(In reply to Colin Walters from comment #6)
> (In reply to Fabian Deutsch from comment #5)
> 
> > We use it to check whether a (system) user is part of a group:
> 
> Yes but the question was more why are you doing *that* =)  *clicks on the
> link*
> 
> Okay that's kind of strange code, what is...*google search for
> sanlock*...wait...holy cow!  PAXOS based on shared storage!   I'm not sure
> I'd trust this until it'd been through a round of
> https://aphyr.com/tags/jepsen =)

Nice link!

> (FWIW in Project Atomic we're currently using etcd for cluster locking
> beneath Kubernetes)
> 
> Anyways...this function looks like it'd be a lot simpler if it just *always*
> looked in /proc/<pid>/status for the groups it expected to find, and
> restarted the service if not, or something like that.  Then you wouldn't
> need to involve the NSS database at all.

Once we know the gid of a group, we can use procfs - But to resolve the group name into a gid, is where NSS or grp comes into the game.

> > From the Atomic - or nss - POV grp should be fixed.
> 
> If anything this would be a nss-altfiles bug.  I am not aware of a way to
> fix it though short of merging `files` and `altfiles` together.  Which
> wouldn't be *too* hard...but changing the configurator code to not need it
> would also likely improve it.

Yes, I'll also look if we can improve the configurator, but the group name resolution will still be an issue.

Comment 8 Colin Walters 2015-09-10 14:00:48 UTC

(In reply to Fabian Deutsch from comment #7)

> Once we know the gid of a group, we can use procfs - But to resolve the
> group name into a gid, is where NSS or grp comes into the game.

That part definitely works, if it didn't nss-altfiles wouldn't be useful.  Just try it:

# getent group kube
kube:x:994:
#

Comment 9 Fabian Deutsch 2015-09-10 14:03:18 UTC

Let me reprhase: The group name resolution in python using the grp module does not work, because of the mentioned issues.
And thus, we can not resolve the group name to an id (natively in python) and can thus not leverage procfs.

The name group name resolution in general (using nss aware tools) does work.

Comment 10 Colin Walters 2015-09-14 16:35:28 UTC

$ atomic host status
  TIMESTAMP (UTC)         VERSION   ID             OSNAME               REFSPEC                                                        
* 2015-07-30 15:41:21     7.1.4     1dbbbc0bc3     rhel-atomic-host     rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard  
$ python
Python 2.7.5 (default, Apr  9 2015, 11:03:32) 
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import grp
>>> grp.getgrnam('kube')
grp.struct_group(gr_name='kube', gr_passwd='x', gr_gid=994, gr_mem=[])
>>> import pwd
>>> pwd.getpwnam('kube')
pwd.struct_passwd(pw_name='kube', pw_passwd='x', pw_uid=996, pw_gid=994, pw_gecos='Kubernetes user', pw_dir='/', pw_shell='/sbin/nologin')
>>> 

What doesn't work is the gr_mem variable, but since this code just needs to go name -> gid -> scrape /proc, it doesn't need that, right?

Comment 11 Fabian Deutsch 2015-09-15 06:54:44 UTC

I agree that this will fix our problem. But is it expected that every consumer works around this by adjusting their code?

Comment 12 Colin Walters 2015-09-15 14:01:31 UTC

(In reply to Fabian Deutsch from comment #11)
> I agree that this will fix our problem. But is it expected that every
> consumer works around this by adjusting their code?

This is the first program I've seen enumerating *system* group members.  I agree it's a bug though.

Comment 14 RHEL Program Management 2020-12-15 07:36:37 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.