Bug 1880069 - Only half of the NEON registers are preserved when using LD_AUDIT libs
Summary: Only half of the NEON registers are preserved when using LD_AUDIT libs
Status: CLOSED DUPLICATE of bug 1879780
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: glibc
Version: 8.3
Hardware: aarch64
OS: Linux
Target Milestone: rc
: 8.4
Assignee: glibc team
QA Contact: qe-baseos-tools-bugs
Depends On:
TreeView+ depends on / blocked
Reported: 2020-09-17 15:43 UTC by Ben Woodard
Modified: 2021-09-17 12:34 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-09-18 14:15:23 UTC
Type: Bug
Target Upstream Version:

Attachments (Terms of Use)

Description Ben Woodard 2020-09-17 15:43:06 UTC
Description of problem:
When inspecting the code for bz1879780 we notice that in _dl_runtime_resolve it saves the Q registers q0-q7 but in _dl_runtime_profile it saves the D registers. 

According to the ARM Cortex-A Series Programmer's Guide for ARMv8-A section 9.3.1 Parameters in NEON and floating-point registers

V0-V7 are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).

Therefore saving the d0-d7 registers would only save half of the 128b registers. This could lead to data corruption.

Version-Release number of selected component (if applicable):
ALL this problem has apparently existed since the the aarch64 port was originally made.

We have not yet constructed a reproducer which demonstrates the a problem with this but since it looks like an obvious mismatch, we expect that there would be a problem.

Comment 1 Ben Woodard 2020-09-17 15:45:09 UTC
Note that this affects not only RHEL8, but RHEL7-alt, Fedora and upstream glibc.

Comment 2 Ben Woodard 2020-09-17 15:59:37 UTC
It is explicitly requested that you open this BZ to ARM as we believe that this will help resolve the issue in a timely manner.

Comment 4 Ben Woodard 2020-09-17 20:22:19 UTC
According to the Procedure Call Standard for the ARM 64-bit Architecture (with SVE support) Section 5.1.3 on page 17 and in Section 5.1.4 on page 18 the registers z0-z7 and p0-p3 should also be preserved. 

This would seem to add additional complexity to the dl-trampoline.S file because not only do the Z registers extend the V registers with the same name, the Z registers only exist on processors which have SVE enabled. I am not sure how to correctly identify processors that have SVE in such a low level piece of code and before selecting the Z and P registers over the Q registers.

Furthermore there those paragraphs about preserving z8-z23 and p4-p15 if parameters are passed using those registers. 

This bit of code needs to be reviewed by someone more familiar with ARM assembly to ensure that it meets those requriements otherwise ABI compliant code could have an invariant violated.

Comment 5 Ben Woodard 2020-09-18 14:15:23 UTC
Closing this as a duplicate of 1879780. My first analysis of this bug was that it was a second independent problem. However, as I dug into the code more carefully, I realized that they were in fact two manifestations of the same thing. 

What seems to have happened is that during the initial port of glibc to aarch64 some code was copied over from aarch32 and it was fixed in one place _dl_runtime_resolve but not the companion code in _dl_runtime_profile. The kind of changes needed to fix the x8 problem are so closely related to the problem with the NEON registers that it is just better to fix both at the same time rather than thinking of them as independent. Both problems were likely introduced at the same time and both are cases where the ABI definition wasn’t being followed properly.

Handling the SVE registers properly though I think is a separate problem. Unlike this problem, it was not introduced during the initial port to the architecture, the problem with the SVE registers cropped up due to a change in the ISA and it requires conditional logic to make sure that it handles both the case where there are SVE registers and the case where there are not SVE registers. So I think it should be handled separately. That being said however, for ABI and compatibility sake it would be best to apply the fixes to both the normal case and the SVE case at the same time so that we do not end up with two different versions of the La_aarch64_regs structure.

*** This bug has been marked as a duplicate of bug 1879780 ***

Note You need to log in before you can comment on or make changes to this bug.