Back to bug 2024347

Who When What Removed Added
Red Hat Bugzilla 2021-11-17 21:57:51 UTC Pool ID sst_pt_gcc_glibc_rhel_9
Red Hat One Jira (issues.redhat.com) 2021-11-17 22:01:23 UTC Link ID Red Hat Issue Tracker RHELPLAN-103091
Florian Weimer 2021-11-18 10:18:56 UTC Doc Type --- If docs needed, set a value
Florian Weimer 2021-12-06 20:14:20 UTC Type Bug Enhancement
Keywords FutureFeature, Triaged
Jeremy Linton (ARM) 2021-12-09 21:33:23 UTC Blocks 1877135
Florian Weimer 2021-12-09 23:14:28 UTC Depends On 2030872
Keywords Patch
Doc Type If docs needed, set a value Enhancement
Hardware aarch64 All
Doc Text Feature: sched_getcpu implementation using rseq (restartable sequences)

Reason: sched_getcpu is implemented in terms of the getcpu system call on AArch64, which is very slow for typical usage of sched_getcpu in parallel algorithms. On other architectures, vDSO acceleration is already used, but the vDSO uses a special register which might stall execution.

Result: sched_getcpu performance on AArch64 is significantly improved. Other architectures see a slight improvement.
Summary Please improve sched_getcpu performance to avoid syscalls glibc: Implement sched_getcpu using rseq
Flags needinfo?(jlinton)
Martin Cermak 2021-12-09 23:30:06 UTC QA Contact qe-baseos-tools-bugs skolosov
Florian Weimer 2021-12-16 20:12:30 UTC Depends On 2033446
Florian Weimer 2022-01-14 19:52:13 UTC Summary glibc: Implement sched_getcpu using rseq glibc: Implement optional sched_getcpu using rseq
Assignee glibc-bugzilla fweimer
Status NEW ASSIGNED
Florian Weimer 2022-01-19 11:29:53 UTC Summary glibc: Implement optional sched_getcpu using rseq glibc: Optional sched_getcpu acceleration using rseq
Florian Weimer 2022-01-20 20:35:15 UTC Fixed In Version glibc-2.34-19.el9
Status ASSIGNED MODIFIED
errata-xmlrpc 2022-01-31 21:03:34 UTC Status MODIFIED ON_QA
Sergey Kolosov 2022-02-03 20:43:02 UTC Status ON_QA VERIFIED
Florian Weimer 2022-04-11 04:55:37 UTC Docs Contact mtimar
CC mtimar
Doc Text Feature: sched_getcpu implementation using rseq (restartable sequences)

Reason: sched_getcpu is implemented in terms of the getcpu system call on AArch64
.`sched_getcpu` implementation now uses `rseq` (restartable sequences) to achieve improved performance on AArch64 and other architectures

Standard implementation of `sched_getcpu` on AArch64 uses `getcpu` system call
Doc Text , which is very slow for typical usage of sched_getcpu in parallel algorithms. On other architectures, vDSO acceleration is already used , which is very slow when called in parallel algorithms. Other architectures use `vDSO` acceleration to get around this. Implementing `sched_getcpu` using `rseq` greatly improves performance on AArch64 architectures. Other architectures see a slight
Doc Text , but the vDSO uses a special register which might stall execution.

Result: sched_getcpu performance on AArch64 is significantly improved. Other architectures see a slight improvement.
improvement.
Flags needinfo?(fweimer)
Flags needinfo?(fweimer)
Jeremy Linton 2022-04-25 15:17:31 UTC Flags needinfo?(fweimer)
CC jeremy.linton
Jeremy Linton 2022-05-11 17:00:25 UTC Flags needinfo?(jeremy.linton)
Flags needinfo?(jeremy.linton)
Florian Weimer 2022-05-13 09:28:16 UTC Flags needinfo?(fweimer)
Florian Weimer 2022-05-13 09:28:32 UTC Flags needinfo?(jlinton)
Florian Weimer 2022-05-13 15:20:24 UTC Depends On 2033446
errata-xmlrpc 2022-05-17 00:33:35 UTC Status VERIFIED RELEASE_PENDING
errata-xmlrpc 2022-05-17 15:48:51 UTC Resolution --- ERRATA
Status RELEASE_PENDING CLOSED
Last Closed 2022-05-17 15:48:51 UTC
errata-xmlrpc 2022-05-17 15:49:17 UTC Link ID Red Hat Product Errata RHBA-2022:3917
Gabi Fialová 2022-06-09 08:00:31 UTC CC gfialova
Flags needinfo?(mtimar)
Gabi Fialová 2022-06-20 13:01:26 UTC Doc Text .`sched_getcpu` implementation now uses `rseq` (restartable sequences) to achieve improved performance on AArch64 and other architectures

Standard implementation of `sched_getcpu` on AArch64 uses `getcpu` system call
.`sched_getcpu` implementation now uses `rseq` (restartable sequences) to achieve improved performance on the 64-bit ARM architectures and other architectures

Standard implementation of `sched_getcpu` on the 64-bit ARM architectures uses `getcpu`
Doc Text , which is very slow when called in parallel algorithms. Other architectures use `vDSO` acceleration to get around this. Implementing `sched_getcpu` using `rseq` greatly improves performance on AArch64 architectures. Other architectures see a slight system call
Doc Text improvement. , which is very slow when called in parallel algorithms. Other architectures use `vDSO` acceleration to get around this. Implementing `sched_getcpu` using `rseq` greatly improves performance on the 64-bit ARM architectures. Other architectures see a
Doc Text slight improvement.
Flags needinfo?(fweimer)
Florian Weimer 2022-06-20 14:22:43 UTC Flags needinfo?(fweimer)
Jacob Taylor Valdez 2022-06-21 07:48:05 UTC CC jvaldez
Flags needinfo?(fweimer)
Florian Weimer 2022-06-21 08:00:08 UTC Flags needinfo?(fweimer)
Jacob Taylor Valdez 2022-06-21 08:39:48 UTC Doc Text .`sched_getcpu` implementation now uses `rseq` (restartable sequences) to achieve improved performance on the 64-bit ARM architectures and other architectures

Standard implementation of `sched_getcpu` on the 64-bit ARM architectures uses `getcpu` system call, which is very slow when called in parallel algorithms. Other architectures use `vDSO` acceleration to get around this. Implementing `sched_getcpu` using `rseq` greatly improves performance on the 64-bit ARM architectures. Other architectures see a slight improvement.
.`sched_getcpu` implementation can now, optionally, use `rseq` (restartable sequences) to improve performance on the 64-bit ARM architectures and other architectures

The previous implementation of `sched_getcpu` on the 64-bit ARM architectures uses the `getcpu` system call, which is too slow for efficient use in most parallel algorithms. Other architectures use vDSO (virtual dynamic shared object) acceleration to work around this. Implementing `sched_getcpu` using `rseq` greatly improves performance on the 64-bit ARM architectures. Other architectures see a slight improvement.

To configure `sched_getcpu` to use `rseq`, set the `GLIBC_TUNABLES=glibc.pthread.rseq=1` environment variable:

----
# GLIBC_TUNABLES=glibc.pthread.rseq=1
# export GLIBC_TUNABLES
----
Florian Weimer 2022-06-22 08:14:02 UTC Flags needinfo?(mtimar)
Mark O'Brien 2023-07-18 14:29:19 UTC Pool ID sst_pt_glibc_rhel_9 sst_pt_libraries_rhel_9

Back to bug 2024347