Bug 1964390

Summary: [GSS] Performance issue with winbindd active in RHGS 3.5.4
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: amansan <amanzane>
Component: md-cacheAssignee: Nobody <nobody>
Status: ASSIGNED --- QA Contact: Aditya Ramteke <aramteke>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.5CC: gdeschner, jahernan, mduasope, mhackett, sarora, sheggodu, tochan, vdas
Target Milestone: ---Flags: amanzane: needinfo? (csaba)
amanzane: needinfo? (csaba)
mduasope: needinfo? (mhackett)
sheggodu: needinfo? (mduasope)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description amansan 2021-05-25 11:55:57 UTC
After upgrading to gluster last version the customer is experienced a performance problem.

glusterfs-6.0-56.el7rhgs.x86_64                             Mon May  3 09:42:16 2021
glusterfs-api-6.0-56.el7rhgs.x86_64                         Mon May  3 09:42:18 2021
glusterfs-client-xlators-6.0-56.el7rhgs.x86_64              Mon May  3 09:42:18 2021
glusterfs-cli-6.0-56.el7rhgs.x86_64                         Mon May  3 09:42:19 2021
glusterfs-events-6.0-56.el7rhgs.x86_64                      Mon May  3 09:42:25 2021
glusterfs-fuse-6.0-56.el7rhgs.x86_64                        Mon May  3 09:42:19 2021
glusterfs-geo-replication-6.0-56.el7rhgs.x86_64             Mon May  3 09:42:25 2021
glusterfs-libs-6.0-56.el7rhgs.x86_64                        Mon May  3 09:42:16 2021
glusterfs-rdma-6.0-56.el7rhgs.x86_64                        Mon May  3 09:42:25 2021
glusterfs-server-6.0-56.el7rhgs.x86_64                      Mon May  3 09:42:20 2021
python2-gluster-6.0-56.el7rhgs.x86_64                       Mon May  3 09:42:09 2021
samba-vfs-glusterfs-4.11.6-111.el7rhgs.x86_64               Mon May  3 09:42:23 2021

When the problem starts we can see messages like :

[2021/05/07 10:36:50.552965,  3] ../../source3/winbindd/winbindd_getpwuid.c:52(winbindd_getpwuid_send)
  winbindd_getpwuid_send: [nss_winbind (1924)] getpwuid 27960
[2021/05/07 10:36:50.553448,  3] ../../source3/winbindd/winbindd_getgroups.c:66(winbindd_getgroups_send)
  winbindd_getgroups_send: [nss_winbind (1924)] getgroups INECO\roberto.gil
[2021/05/07 10:36:50.553866,  3] ../../source3/winbindd/winbindd_util.c:1788(lookup_usergroups_cached)
  : lookup_usergroups_cached

Comment 9 Sunil Kumar Acharya 2021-06-17 14:21:12 UTC
*** Bug 1964389 has been marked as a duplicate of this bug. ***

Comment 59 amansan 2022-03-23 08:20:12 UTC
Hi Csaba,

Thanks, did you manage to see any way to correct the problem?

Regards,

Alicia

Comment 97 Csaba Henk 2022-08-02 16:45:05 UTC
Fasten yer belts, here cometh the great 'features.acl' debunking!

Let's go down to Glusterfs history first.

ACL support in Glusterfs in its current form, was carved out in the context of Bug 764547 (GLUSTER-2815), "Server-enforced ACLs", via the following sequence of changes:

 1. https://github.com/gluster/glusterfs/commit/148217634c [v3.2.2~34] byte-order: htole*/letoh* and htobe*/betoh* for forced endian conversions
 2. https://github.com/gluster/glusterfs/commit/6d877a2f8f [v3.2.2~33] dht: set linkto xattr with linkfile create (mknod)
 3. https://github.com/gluster/glusterfs/commit/4722d0000a [v3.2.2~32] fuse: fill frame->root->groups with aux gids of the process
 4. https://github.com/gluster/glusterfs/commit/d8c7cdc734 [v3.2.2~31] fuse: introduce "noacl" option to disable ACL checks
 5. https://github.com/gluster/glusterfs/commit/9f7c50da00 [v3.2.2~30] storage/posix: set ACL keys during new entry/inode creations
 6. https://github.com/gluster/glusterfs/commit/3911634c7f [v3.2.2~29] posix-acl: implementation of POSIX ACL as a translator
 7. https://github.com/gluster/glusterfs/commit/a55c81deb1 [v3.2.2~28] access-control: superseded by posix-acl translator
 8. https://github.com/gluster/glusterfs/commit/c2dc337ea3 [v3.2.2~27] glusterfs: add --acl command line option to load ACLs on the client side
 9. https://github.com/gluster/glusterfs/commit/6ca8604204 [v3.2.2~26] mount.glusterfs: support -o acl parameter
10. https://github.com/gluster/glusterfs/commit/1b01b64894 [v3.2.2~13] posix-acl: perform access checks on read/write/truncate for NFS calls
11. https://github.com/gluster/glusterfs/commit/4d2afaae2f [v3.3.0~855] posix-acl: configurable super user ID

I'm not sure in what manner were ACLs handled prior to this, but here is a summary of what was accomplished:

- A new ACL capable access control xlator was introduced, system/posix-acl.
- The earlier access control xlator, that was loaded on server side, called features/access-control, was drop-in replaced by system/posix-acl. Volfile generation was not changed, simply old features/access-control was erased and a compatibility symlink was created at install time from access-control.so to posix-acl.so (see change 7. in the above list and 'sccess-control-compat' target in xlators/system/posix-acl/src/Makefile.am);. So from this on, features/access-control can be regarded an alias for systm/poxix-acl.
- The client side '--acl' command-line option / '-oacl' mount option was introduced; if they are used, FUSE client loads posix-acl and takes over permission handling from kernel (which implements access control based on POSIX permission bits). ¹

Concerning 'features.acl', it's a relative recent upstream addition via following commit:

- https://github.com/gluster/glusterfs/commit/c8c1829f68 [v9.0~224] volgen: add an option to disable acl

It introduces the 'features.acl', defaulting to enablement. If it's disabled, features/access-control will be removed from brick graphs (taking effect on volume restart).

Some remarks about it:
- It was not introduced either for semantic or performance purposes; it's a diagnostic option that can be used in nailing down ACL related issues. As such, it's not documented and its description includes a warning about its diagnostic nature (to not to use it in production).
- It is actually a misnomer. As much as I can reverse engineer the name, three things were conflating in it:
  1. the issues to be diagnosed through it were ACL related;
  2. the feature (xlator) en/disable through it is called features/access-control, which is tempting to abbreviate to 'feature.acl' (which is incorrect, as ACL stands for "Access Control *Lists*", and is a specific technique within the general domain of access control);
  3. accidentally features/access-control is an alias for system/posix-acl.

However, the option does not disable ACL support in general (as the name would suggest); what it does it disables access control (of any kind) on server side. Calling it eg. 'features/brick-access-control' would have been somewhat more mouthful, but immensely helpful.

The current insight on my part that this option indeed could be used in production, as suggested in Comment #86, to eliminate the hog of server side access control when it's not needed (server has only fuse clients). The semantic correctness of this configuration is demonstrated in my recent commit, already referred to in Comment 90,

- https://github.com/gluster/glusterfs/commit/48b44ea52b test 'features.acl off' (#3643)

Now let's see how 'features.acl' was used in the context of this bug.

Basically, the issues about it:
- it was not backported to downstream so far, so referring to it in downstream context is moot;
- it was used hand-wavily, as a general term referring to ACL enablement, or mistaken for --acl / -oacl.

Let's revisit the particular mentions of features.acl.

Comment #4, by Xavi:

> However I see that they don't have features.acl enabled. This seems to indicate that all acl checks are done by the brick's underlying filesystem. Since this is integrated with AD, I guess the system already gets the full list of groups for the current user to correctly check the ACLs at XFS layer.

Of course 'features.acl' is not enabled, as the product does not know of features.acl :-)

> @Csaba can you confirm that the behavior without features.acl enabled is correct ?

Not correct. Disabling features.acl puts the bricks out of the access control game, so the brick filesystem is not involved in permission checking.

Comment #11, by Csaba:

> - ACL-s are not enforced by kernel at all with FUSE version 7.24 that Glusterfs supports as of now. With features.acl, Glusterfs itself steps in and manages/enforces ACLs. Without this option, ACLs are ignored.

Here 'features.acl' is confused with '--acl'/'-oacl'.


Comment #12, by Xavi:

> I understand that without enabling features.acl Gluster doesn't check any ACL, but is it correct to assume that ACLs will work correctly when the underlying brick filesystem has ACL support enabled and the user/group database is integrated into the same AD domain as clients ?

No, features.acl disables access control (including ACL checking) only on brick side. Seems also being a case of mistaking it for '--acl'/'-oacl'. Furthermore, involving the brick filesystem directly, as the comment suggests, does not happen, as the brick process runs as root; bricks perform access control in userspace, using features/access-control xlator. (Comment #15 and Comment #77 brings up "setfs[ug]id" as a possible mechanism for getting brick kernel to perform access control, but that is a limited mechanism -- does not apply to supplemental groups -- and was ditched in

- https://github.com/gluster/glusterfs/commit/3176ddf99 (commit message irrelevant, being a list of independent features)

which is an ancient commit that is not even part of downstream commit history; indeed, downstream history was initiated with a source snapshot that has already included this commit.²)


Comment #15, by Xavi:

> I've just checked it again, and I see that the volume is mounted with --acl option, which is enough to enable acl checks (features.acl is not explicitly set, but it's enabled by default, so I was wrong and ACL support is really enabled in this volume).

A proper account on '--acl'; however, features.acl is not available downstream.

Comment #86, by Csaba:

> With respect to server: as of the default configuration, access control is performed on bricks, but this can be turned off via features.acl vol option.

> Summary: if the only kind of clients that are used in the setup are fuse clients, the following volume settings provide a safe optimization: 'features.acl off', 'server.manage-gids off'. The volume has to be restarted for these settings to take effect.

Would be a valuable observation in upstream context; however, features.acl is not available downstream.


---

¹ Since then the kernel fuse VFS also implemented handling of ACLs; but that feature can be enabled separately from standard permission based access control and Glusterfs doesn't make use of it as of the time being. (RHEL product line supports this from RHEL8 on.)
² However, It's part of current upstream history which been constructed to include ancient history too.