RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2003291 - [LLNL 9.X BUG] register x8 and quad sized NEON registers are not properly saved when using ld_audit
Summary: [LLNL 9.X BUG] register x8 and quad sized NEON registers are not properly sav...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: glibc
Version: 9.0
Hardware: aarch64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Carlos O'Donell
QA Contact: Sergey Kolosov
Dominik
URL:
Whiteboard:
Depends On: 1991591 2075713
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-10 22:45 UTC by Ben Woodard
Modified: 2023-11-16 04:25 UTC (History)
18 users (show)

Fixed In Version: glibc-2.34-32.el9
Doc Type: Bug Fix
Doc Text:
.The auditing interface now saves and restores the x8 register and the full width of the NEON registers for AArch64 Previously, a bug in the implementation of the dynamic loader’s audit interface caused the `AArch64` saved register state to be incomplete compared to the procedure call standard. This bug has been fixed and the auditing interface now saves and restores the x8 register and the full width of the NEON registers for `AArch64`. Applications using the dynamic loader auditing interface can now inspect and influence the x8 register for `AArch64`. To use this new x8 register and have access to the full width of the NEON registers on `AArch64`, the audit modules must be recompiled to use the new version of the interface (LAV_CURRENT is 2).
Clone Of: 1879780
Environment:
Last Closed: 2022-11-15 11:11:43 UTC
Type: Bug
Target Upstream Version: 2.35
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-96896 0 None None None 2021-09-10 22:46:51 UTC
Red Hat Product Errata RHBA-2022:8272 0 None None None 2022-11-15 11:12:06 UTC

Description Ben Woodard 2021-09-10 22:45:28 UTC
+++ This bug was initially created as a clone of Bug #1879780 +++

Description of problem:
According to the Arm Cortex-A Series Programmers Guide for ARMv8-A It is part of the ABI for the architecture.

See Section 9.1.1 Parameters in general-purpose registers on page 9-3 where it states:

   Registers with a special purpose (X8, X16-X18, X29, X30)
   • X8 is the indirect result register. This is used to pass the address location of an indirect result, for example, where a function returns a large structure.gister should be preserved. 

You can see that this register is saved in the normal code path which is what is normally run. See sysdeps/aarch64/dl-trampoline.S:48 :
	/* Save arguments.  */
	stp	x8, x9, [sp, #-(80+8*16)]!
	cfi_adjust_cfa_offset (80+8*16)
	cfi_rel_offset (x8, 0)
	cfi_rel_offset (x9, 8)

Side note:
I do not understand why x9 is saved there. According to the ABI spec. X9-X15 are Caller-saved temporary registers and so I would expect X9 to need to be saved if it was going to be used. However, I do not see any use of x9 in the code between the point at which it was saved in line 48 and when it is restored in line 113.

However, when you use LD_AUDIT the rtld shifts into what it calls profiling mode. In that profiling mode x8 is not properly saved. See there is no x8 being saved around line 185 in _dl_runtime_profile:

	stp	x0, x1, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*0]
	cfi_rel_offset (x0, OFFSET_RG + DL_OFFSET_RG_X0 + 16*0 + 0)
	cfi_rel_offset (x1, OFFSET_RG + DL_OFFSET_RG_X0 + 16*0 + 8)
	stp	x2, x3, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*1]
	cfi_rel_offset (x2, OFFSET_RG + DL_OFFSET_RG_X0 + 16*1 + 0)
	cfi_rel_offset (x3, OFFSET_RG + DL_OFFSET_RG_X0 + 16*1 + 8)
	stp	x4, x5, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*2]
	cfi_rel_offset (x4, OFFSET_RG + DL_OFFSET_RG_X0 + 16*2 + 0)
	cfi_rel_offset (x5, OFFSET_RG + DL_OFFSET_RG_X0 + 16*2 + 8)
	stp	x6, x7, [x29, #OFFSET_RG + DL_OFFSET_RG_X0 + 16*3]
	cfi_rel_offset (x6, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 0)
	cfi_rel_offset (x7, OFFSET_RG + DL_OFFSET_RG_X0 + 16*3 + 8)

Yeah you got the Parameter and result registers but where is the Indirect result location register? After that block of code, you immediately start saving the d registers.

Version-Release number of selected component (if applicable):
All - this bug has been there since the beginning of the aarch64 port

How reproducible:
Always

Steps to Reproduce:
There are two reproducers in the uploaded tar.gz which I got for the original reporter. One is very simple and should work but variations in the compiler can change how the structure is returned and cause that reproducer not to work. The other reproducer is a much bigger bit of code which always forces the use of x8 but it is much more code.


Additional info:
This bug report was provided by developers at Rice University Jonathon Anderson, John Mellor-Crummey, Xiaozhu Meng, Mark Krentel working on behalf of LLNL

--- Additional comment from Ben Woodard on 2020-09-17 00:54:53 UTC ---

Note that this bug also affects RHEL7-alt, and Fedora

--- Additional comment from Carlos O'Donell on 2020-09-17 02:28:02 UTC ---

I've reached out to Arm to discuss this issue.

--- Additional comment from Ben Woodard on 2020-09-17 16:01:32 UTC ---

It is explicitly requested that you open this BZ to ARM as we believe that this will help resolve the issue in a timely manner.

Also note that this affects upstream as well.

--- Additional comment from Travis Gummels on 2020-09-17 18:55:11 UTC ---

This bug is being made public at the request of LLNL/RICE/U Wisc.  Confidential groups have been removed.

--- Additional comment from Ben Woodard on 2020-09-17 23:34:26 UTC ---

See also https://bugzilla.redhat.com/show_bug.cgi?id=1880069

When you start looking at the code in _dl_runtime_profile it looks like it has been copied from a previous version of the processor where there were no Q registers in the V register group. In other words the V registers were only uint64_t's not uint128_t's. 

x29 ends up being loaded with a pointer to a struct of type la_aarch64_regs 

/* Registers for entry into PLT on AArch64.  */
typedef struct La_aarch64_regs
{
  uint64_t lr_xreg[8];
  uint64_t lr_dreg[8];
  uint64_t lr_sp;
  uint64_t lr_lr;
} La_aarch64_regs;

This doesn't include space for x8 the indirect result register. Nor does it have space for the full sized v registers. Furthermore when you look at line 136 where it describes the stack frame layout

	   Stack frame layout:
	   [sp,   #...] lr
	   [sp,   #...] &PLTGOT[n]
	   [sp,    #96] La_aarch64_regs
	   [sp,    #48] La_aarch64_retval
	   [sp,    #40] frame size return from pltenter
	   [sp,    #32] dl_profile_call saved x1
	   [sp,    #24] dl_profile_call saved x0
	   [sp,    #16] t1
	   [sp,     #0] x29, lr   <- x29

what _dl_runtime_resolve actually pushes on the stack the La_aarch64_regs part doesn't match the what _dl_runtime_profile is expecting. The size of the structure is quite different.

To be able to support those the structure would have to be:

/* Registers for entry into PLT on AArch64.  */
typedef struct La_aarch64_regs
{
  uint64_t    lr_xreg[10];
  __uint128_t lr_vreg[8]; /* or uint64_t lr_vreg[16]; */
  uint64_t    lr_sp;
  uint64_t    lr_lr;
} La_aarch64_regs;

The fact that the number of X regs that are pushed onto the stack is 10 rather than 9 becomes even more perplexing. Reading the ABI spec I would have thought that there is no reason to push x9.

Even though x9 is on the stack at the time when _dl_runtime_resolve is called my reading is that it doesn't have to be in the La_aarch64_regs structure so the structure could be:

/* Registers for entry into PLT on AArch64.  */
typedef struct La_aarch64_regs
{
  uint64_t    lr_xreg[9];
  __uint128_t lr_vreg[8]; /* or uint64_t lr_vreg[16]; */
  uint64_t    lr_sp;
  uint64_t    lr_lr;
} La_aarch64_regs;

Personally, I would use a sto rather than a stp up in _dl_profile_resolve and just push x8 by itself.

In that modification it should be noted that I changed the name of the NEON registers from dreg to vreg since that more accurately names that part of the register file.

even this is inadequate to contain the registers which are specified by SVE/SVE2

There may be some ABI implications to changing this structure but I would argue that this only impacts users of LD_AUDIT and that the fact that these bugs have been latent for so long suggests that the users who reported this to me are the first to make use of this interface.

--- Additional comment from Ben Woodard on 2020-09-18 04:23:37 UTC ---

This is an untested patch which suggests the approach that I think is needed to fix this problem. I will try to test it tomorrow. However, someone with much more experience with ARM assembly code needs to look at it.

--- Additional comment from Ben Woodard on 2020-09-18 14:15:23 UTC ---



--- Additional comment from Ben Woodard on 2020-09-18 23:10:11 UTC ---

This document is important in understanding the topic: 
https://developer.arm.com/documentation/100986/0000/

--- Additional comment from Ben Woodard on 2020-09-18 23:15:16 UTC ---

Solving the problem for SVE is more challenging due to the large and variable length registers. 
I think that I would like to do something like this:

typedef struct La_sve_bits
{
  __uint128_t lr_zreg[8];
  uint16_t    lr_preg[3];
}; /* 134 bytes in size */

typedef struct La_aarch64_regs
{
  unsigned int version:8;
  unsigned int sve_len:8;
  uint64_t lr_xreg[9];
  uint64_t lr_sp;
  uint64_t lr_lr;
  union {
    __uint128_t        lr_vreg[8];
    struct La_sve_bits lr_sve_reg[1];
  };
} La_aarch64_regs;

There are several ideas here:
1) The structure would be versioned so that if we need to update it at some point you can read the version number of the structure and know how to interpret it. God help us if we ever need more than 2-3 versions of this structure but let's give it a whole 8 bits.
2) I moved the SP and the LR up in the structure so that I could change the size of the SVE registers without overwriting them. 
3) The lr_vreg would be what I've got now. The NEON quad sized floats. However, now it is stuffed in an anonymous union.
4) SVE obviously has different sized registers depending on the processor implementation. However the internal size is a multiple of len and so I put sve_len as one of the fields. Currently it can be  1-16. Let's be generous and give it a whole 8 bits. In the non-sve case the sve_len will be 0; this can happen on an individual function call basis. Most functions will not be using the SVE registers. We don't want to be throwing around more than 2K unless we really have to.
5) Then in the case where we do use SVE sve_len will be set to the processor's true SVE register length in bits/128.
So in the case of a processor like A64FX which has a SVE register length of 512 the sve_len would be set to 4 and there would then be 3 more lr_sve_reg structures immediately following the one found inside the La_aarch64_regs structure.
Admittedly this would take some fiddling to reconstruct the full registers used to pass parameters to SVE but it would seem to me to be a better approach then making all the La_aarch64_regs 2376 bytes long.

In the default case the size of the structure would only be 226 bytes long. IMHO reasonable.
In the case of the A64FX it would be 628 bytes.

--- Additional comment from Ben Woodard on 2020-09-22 01:38:23 UTC ---

I'll squash these into one patch later. This fixes the alignment problem that I introduced and addresses a place problem where missed renaming a structure member.

--- Additional comment from Ben Woodard on 2020-09-22 05:59:55 UTC ---

This patch was tested to work on the x8 reproducer attached to this bug.

$ LD_AUDIT=../break-glibc/arm-audit-bug/simple/libauditor.so ../break-glibc/arm-audit-bug/simple/main
Segmentation fault (core dumped)
$ builddir=`pwd` LD_AUDIT=../break-glibc/arm-audit-bug/simple/libauditor.so CONV_PATH="${builddir}"/iconvdata LOCPATH="${builddir}"/localedata LC_ALL=C  "${builddir}"/elf/ld-linux-aarch64.so.1 --library-path "${builddir}":"${builddir}"/math:"${builddir}"/elf:"${builddir}"/dlfcn:"${builddir}"/nss:"${builddir}"/nis:"${builddir}"/rt:"${builddir}"/resolv:"${builddir}"/mathvec:"${builddir}"/support:"${builddir}"/crypt:"${builddir}"/nptl ../break-glibc/arm-audit-bug/simple/main
$

The NEON Vector register problem is also fixed but a reproducer has not been developed yet.
The analogous SVE problem is not yet solved.

--- Additional comment from Ben Woodard on 2020-09-23 01:05:56 UTC ---

The La_aarch64_retval structure also needed to be expanded to accommodate the quad sized V registers at the same time, the ABI also specifies that additional registers can be used for return values, therefore these were also added to the La_aarch64_retval structure.

--- Additional comment from Ben Woodard on 2020-09-24 19:40:28 UTC ---



--- Additional comment from Ben Woodard on 2020-09-24 19:42:20 UTC ---

Current upstream status - paused pending upstream consideration about what to do with SVE

--- Additional comment from Chris Tatman on 2020-09-24 22:29:59 UTC ---

Removing Arm Confidential group. I didn't realise that this was already open to the public when I added them.

--- Additional comment from Jon Masters on 2020-09-28 15:49:00 UTC ---

Note that AAPCS64 says:

"The first eight registers, v0-v7, are used to pass argument values into a subroutine and to return result values from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).

Registers v8-v15 must be preserved by a callee across subroutine calls; the remaining registers (v0-v7, v16-v31) do not need to be preserved (or should be preserved by the caller). Additionally, only the bottom 64 bits of each value stored in v8-v15 need to be preserved 7; it is the responsibility of the caller to preserve larger values."

Thus you don't need a 128-bit quantity to save those V registers, only the lower 64-bits. It looks like just the indirect result register is at issue here?

--- Additional comment from Jon Masters on 2020-09-28 16:23:42 UTC ---

Ok. Having talked about this some more, it seems there are two separate issues:

1). The existing profile trampoline entry/exit isn't doing the needed save/restore on x86 on the local stack. Non-controversial issue.

2). A separate INTERNAL GLIBC ABI nothing to do with AAPCS64 issue in the audit interface where users desire to log x86 plus full SIMD/SVE register state, requiring a change to the INTERNAL GLIBC audit structures.

We should split this issue into the trivial fix, and the audit fix. As separate issues.

Jon.

--- Additional comment from Jon Masters on 2020-09-28 16:24:08 UTC ---

Oops - x8 not x86 :)

--- Additional comment from RHEL Program Management on 2020-09-30 10:47:56 UTC ---

pm_ack is no longer used for this product. The flag has been reset.

See https://issues.redhat.com/browse/PTT-1821 for additional details or contact lmiksik if you have any questions.

--- Additional comment from Matt Newsome on 2021-02-01 15:00:14 UTC ---

Adding Triaged keyword to help with SST stats on certain dashboards

--- Additional comment from Matt Newsome on 2021-02-01 15:00:34 UTC ---

Adding Triaged keyword to help with SST stats on certain dashboards

--- Additional comment from Red Hat Bugzilla on 2021-05-30 12:50:08 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2021-05-30 12:50:30 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2021-05-30 12:50:44 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Ben Woodard on 2021-08-19 20:24:24 UTC ---

This problem as well as the related one with regards to SVE have been looked at upstream. They have been proposed for glibc-2.35 but they haven't been extensively tested yet nor have they been accepted. 

git clone https://sourceware.org/git/glibc.git
git checkout --track remotes/origin/azanella/ld-audit-fixes

Most of the work was done in: 
0020-elf-Fix-runtime-linker-auditing-on-aarch64-BZ-26643.patch                                                                                                                                        
0021-elf-Add-SVE-support-for-aarch64-rtld-audit.patch  

It just needs to be backported to RHEL8 and RHEL9.

--- Additional comment from Red Hat Bugzilla on 2021-09-08 14:52:09 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2021-09-08 14:52:14 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2021-09-08 14:52:22 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2021-09-08 14:52:27 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2021-09-08 14:52:33 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

--- Additional comment from Red Hat Bugzilla on 2021-09-08 14:53:07 UTC ---

remove performed by PnT Account Manager <pnt-expunge>

Comment 4 Carlos O'Donell 2022-02-24 14:38:19 UTC
I'm going to use this bug to track the full rtld-audit backport to 9.1.0. Retitling as appropriate.

Comment 28 errata-xmlrpc 2022-11-15 11:11:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glibc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8272

Comment 29 Red Hat Bugzilla 2023-11-16 04:25:19 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.