2024347 – glibc: Optional sched_getcpu acceleration using rseq

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2024347 - glibc: Optional sched_getcpu acceleration using rseq

Summary: glibc: Optional sched_getcpu acceleration using rseq

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	9.0
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Florian Weimer
QA Contact:	Sergey Kolosov
Docs Contact:	mtimar
URL:
Whiteboard:
Depends On:	2030872
Blocks:	1877135
TreeView+	depends on / blocked

Reported:	2021-11-17 21:57 UTC by Jeremy Linton (ARM)
Modified:	2023-07-18 14:29 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glibc-2.34-19.el9
Doc Type:	Enhancement
Doc Text:	.`sched_getcpu` implementation can now, optionally, use `rseq` (restartable sequences) to improve performance on the 64-bit ARM architectures and other architectures The previous implementation of `sched_getcpu` on the 64-bit ARM architectures uses the `getcpu` system call, which is too slow for efficient use in most parallel algorithms. Other architectures use vDSO (virtual dynamic shared object) acceleration to work around this. Implementing `sched_getcpu` using `rseq` greatly improves performance on the 64-bit ARM architectures. Other architectures see a slight improvement. To configure `sched_getcpu` to use `rseq`, set the `GLIBC_TUNABLES=glibc.pthread.rseq=1` environment variable: ---- # GLIBC_TUNABLES=glibc.pthread.rseq=1 # export GLIBC_TUNABLES ----
Clone Of:
Environment:
Last Closed:	2022-05-17 15:48:51 UTC
Type:	Enhancement
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-103091	0	None	None	None	2021-11-17 22:01:23 UTC
Red Hat Product Errata	RHBA-2022:3917	0	None	None	None	2022-05-17 15:49:17 UTC

Internal Links: 2085529

Description Jeremy Linton (ARM) 2021-11-17 21:57:51 UTC

Description of problem: Some applications (mysql for example) rely heavily on sched_getcpu() which on arm traps to a full blown syscall rather than depending on the vdso or some other fairly fast mechanism. This is causing performance problems not present on other architectures.

Following conversations between arm/redhat glibc developers the possibility was presented of fixing rseq() for use in this case and back porting a fix.


Version-Release number of selected component (if applicable): glibc-2.34


Expected results:

Lower overhead in class of applications depending on sched_getcpu()

Comment 1 Florian Weimer 2021-11-18 10:18:56 UTC

I've started an upstream discussion:

Bringing rseq back into glibc
https://sourceware.org/pipermail/libc-alpha/2021-November/133221.html

Comment 2 Florian Weimer 2021-12-06 20:41:00 UTC

I've posted patches:

[PATCH 0/5] Extensible rseq support for glibc
https://sourceware.org/pipermail/libc-alpha/2021-December/133656.html

Comment 3 Florian Weimer 2021-12-09 23:14:28 UTC

The upstream patches have been integrated, and I believe we have fixed the regression.

Jeremy, would you be able to arrange for performance tests once we have a test build?

We are not quite ready to backport this because we need to teach valgrind about rseq first (bug 2030872).

Comment 8 Florian Weimer 2022-01-14 19:53:56 UTC

valgrind is fixed, but criu is not, so this has to be opt-in for now, using the glibc.pthread.rseq tunable.

Comment 16 Jeremy Linton 2022-04-25 15:17:31 UTC

Yes, I will test it again, I had a testing setup over christmas.

Comment 17 mtimar 2022-05-11 11:08:13 UTC

Hi Jeremy,
sorry to bother you, any luck with the testing?
Thanks, Matej

Comment 18 Jeremy Linton 2022-05-11 17:00:25 UTC

Yah, I'm about to post a small benchmark set here. I've been running various sysbench/etc things over the past couple days on an ampere ultra, the gravaton plan hasn't managed to pan out yet (still in progress). Right now the general oltp results show a small uplift, but i'm now running the exact tests that the AST team used last year, so the results should be more noticeable. I'm planning on calling this done in the next ~day.

Comment 19 Jeremy Linton 2022-05-12 00:30:46 UTC

Well, I guess I continue to fail to identify that peak 20%+ uplift in memory/mysql/OLTP style workloads with this patch applied, I think the general OLTP uplift with the mysql specific patch was something like 3% and in some tests and I can see that. Or at least something along those lines since i'm not sure my tests are repeatable enough that 2-3% can be attributed to this patch vs lucky scheduling/memory placement/whatever.  Part of the problem may be some difference between mariadb as shipped with RHEL that i'm using, and the actual mysql originally used in the test environment. For sure there are system configuration/innodb tuning parameter differences that I can't reconcile simply because the code bases have diverged around numa tuning/etc.

I can keep banging on it, but really the core request was to fix sched_getcpu(), for which I can report with a hand rolled "am I still on the same CPU" loop the uplift is a pedestrian ~47X on an Altra. LoL. 

AKA, that much uplift can't help but show up in all sorts of places.

So, for purposes of closing this, I think the answer is overwhelmingly that its been fixed. OTOH, I'm not sure that I can say with confidence that there is a double digit % mariadb/OLTP uplift because of it, at this point.

Comment 20 Florian Weimer 2022-05-13 09:26:57 UTC

Thanks. Was the mysql-specific patch inling the rseq access, by chance? Avoiding the sched_getcpu function call overhead?

We may have to export the GLIBC_2.35 symbols for interoperability purposes once we turn on rseq by default, and if we do that, mysql could switch back to inlining rseq access.

Comment 23 errata-xmlrpc 2022-05-17 15:48:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: glibc), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:3917

Comment 24 Jeremy Linton 2022-05-23 15:10:10 UTC

Yah, just as a FYI, the original testing was with a custom mysql patch, but it was also using a very custom test setup, using stored procedures and backing the DB with ram/etc. So lots of variables that individually could be affecting it.

Note You need to log in before you can comment on or make changes to this bug.