Bug 1929836 - ppc64le: severe ptrace, tracing, and audit regression
Summary: ppc64le: severe ptrace, tracing, and audit regression
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: ppc64le
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL: https://git.kernel.org/torvalds/c/d72...
Whiteboard:
Depends On: PPCTracker
Blocks: 1930252
TreeView+ depends on / blocked
 
Reported: 2021-02-17 18:32 UTC by Dmitry V. Levin
Modified: 2022-09-05 08:10 UTC (History)
34 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1930252 1962971 (view as bug list)
Environment:
Last Closed: 2022-09-05 08:10:08 UTC
Type: Bug


Attachments (Terms of Use)

Description Dmitry V. Levin 2021-02-17 18:32:37 UTC
Hello, strace upstream is speaking. :)

There seems to be a severe ppc64le kernel regression detected by strace %check (625 of 972 tests failed).  This ruins strace and blocks any strace update.

For me it looks as if the kernel function syscall_get_error started to return 0 instead of the error code.

The logs are in the latest strace build task that failed:
https://koji.fedoraproject.org/koji/taskinfo?taskID=62171493

The kernel used for build there was
Linux buildvm-ppc64le-14.iad2.fedoraproject.org 5.10.15-200.fc33.ppc64le #1 SMP Wed Feb 10 17:31:04 UTC 2021 ppc64le ppc64le ppc64le GNU/Linux

For comparison, the same version of strace passes all tests on all other rawhide architectures, as well as on Debian ppc64le (kernel 5.10.13-1).

The latest successfull strace build to rawhide was about 2 months ago:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1658216

The kernel used for build there was
Linux buildvm-ppc64le-38.iad2.fedoraproject.org 5.9.11-200.fc33.ppc64le #1 SMP Tue Nov 24 18:02:53 UTC 2020 ppc64le ppc64le ppc64le GNU/Linux

Comment 1 Eugene Syromiatnikov 2021-02-18 13:45:03 UTC
Fedora's 5.10.7-200.fc33.ppc64le (on buildvm-ppc64le-19.iad2.fedoraproject.org) was broken as well, for the record.

Comment 2 Dmitry V. Levin 2021-03-14 00:26:35 UTC
I'd like to reiterate: this kernel bug breaks strace badly, making it almost unusable.

Comment 3 Dmitry V. Levin 2021-04-07 16:25:50 UTC
This is the reason why the latest strace release didn't get into Fedora, see
https://koji.fedoraproject.org/koji/buildinfo?buildID=1711422
https://koji.fedoraproject.org/koji/buildinfo?buildID=1711415

Until this bug is fixed, no strace updates in Fedora are possible.

Comment 4 Dmitry V. Levin 2021-05-14 10:20:36 UTC
Using the latest strace that can decode the structure returned by PTRACE_GETREGSET,
one can see that's wrong in the data returned by the kernel.

Linux buildvm-ppc64le-22.iad2.fedoraproject.org 5.11.10-200.fc33.ppc64le
on exiting chdir("") syscall did the following:

ptrace(PTRACE_GETREGSET, 702867, NT_PRSTATUS, {iov_base={gpr=[0xc, 0x7fffd8c3f250, 0x7fffa60b7000, 0xfffffffffffffffe, 0x7fffd8c3f158, 0, 0x8, 0x20, 0xfffffffe7fffffff, 0, 0, 0, 0, 0x7fffa615a390, 0, 0, 0, 0x7fffd8c3fa58, 0x7fffd8c3f308, 0x7fffd8c3f304, 0x1228f4f68, 0x1228f4fb0, 0xab98c, 0xab991, 0x7fffa61533a0, 0x1228f5180, 0x1228f51b8, 0x122910010, 0, 0xab98c, 0x7fffa5c6fff0, 0x100000], nip=0x7fffa5fb4ad4, msr=0x800000000000d033, orig_gpr3=0x1228f52e8, ctr=0, link=0, xer=0, ccr=0x44000278, softe=0x1, trap=0x3000, dar=0x7fffa5fb4aa8, dsisr=0x40000000, result=0xfffffffffffffffe, ...}, iov_len=384}) = 0

As you can see, unlike any normal ppc64le kernel, here ccr does not have 0x10000000 bit set, and gpr[3] equals to 0xfffffffffffffffe (== -2L) instead of expected 0x2 (== ENOENT).
So yes, this kernel has an obviously broken ptrace ABI.

You can find this and much more in the scratch build log at https://koji.fedoraproject.org/koji/taskinfo?taskID=67867999

Comment 5 Dan Horák 2021-05-14 13:38:41 UTC
for the record, from my F-32 system (bare-metal Power9, not a VM) with kernel-5.11.19-100.fc32.ppc64le, strace built from master branch

============================================================================
Testsuite summary for strace 5.12.0.54.05c8
============================================================================
# TOTAL: 1002
# PASS:  905
# SKIP:  97
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0

Comment 6 Dan Horák 2021-05-14 18:02:11 UTC
and I get these from a rawhide VM with kernel-5.13.0-0.rc1.20210512git88b06399c9c7.15.fc35.ppc64le

============================================================================
Testsuite summary for strace 5.12.0.54.05c8
============================================================================
# TOTAL: 1002
# PASS:  232
# SKIP:  102
# XFAIL: 0
# FAIL:  668
# XPASS: 0
# ERROR: 0


but in F-32 VM with kernel kernel-5.11.19-100.fc32.ppc64le everything looks good

============================================================================
Testsuite summary for strace 5.12.0.54.05c8
============================================================================
# TOTAL: 1002
# PASS:  905
# SKIP:  97
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================
and I get the same good result in the F-32 VM with kernel-5.12.3-300.fc34.ppc64le

F-32 VM with kernel-5.13.0-0.rc1.20210512git88b06399c9c7.15.fc35.ppc64le then gives
============================================================================
Testsuite summary for strace 5.12.0.54.05c8
============================================================================
# TOTAL: 1002
# PASS:  901
# SKIP:  101
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================


It makes me think that the failures are not related to the kernel version, but to something else in the rawhide buildroot, perhaps glibc?

Comment 7 Dan Horák 2021-05-14 18:05:25 UTC
but kernel-headers-5.11.19-100.fc32.ppc64le was always used for the F-32 tests (if this should make the difference, will check ...)

Comment 8 Dmitry V. Levin 2021-05-15 01:03:06 UTC
(In reply to Dan Horák from comment #6)
> It makes me think that the failures are not related to the kernel version,
> but to something else in the rawhide buildroot, perhaps glibc?

If it's not the kernel but something else, then this something else is also capable of meddling both with struct ptrace_syscall_info returned by ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, info) as described in the first comment, and with struct pt_regs returned by syscall(__NR_ptrace, PTRACE_GETREGSET, pid, NT_PRSTATUS, iov) in a very specific way described in #c4; in other words, this something is changing both ccr and gpr[3] in the kernel for all ptraced processes.

Comment 9 Dan Horák 2021-05-18 14:05:24 UTC
after updating to glibc-2.33.9000-6.fc35 in my F-32 VM, where the tests were passing, I get now the failures

============================================================================
Testsuite summary for strace 5.12.0.54.05c8
============================================================================
# TOTAL: 1002
# PASS:  232
# SKIP:  102
# XFAIL: 0
# FAIL:  668
# XPASS: 0
# ERROR: 0
============================================================================

Comment 10 Dan Horák 2021-05-18 15:15:01 UTC
let's switch to glibc for their feedback ...

Comment 11 Florian Weimer 2021-05-18 20:14:57 UTC
Is it possible to reproduce this issue with an already-built strace binary?

We added definitions of PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP to <sys/ptrace.h>, maybe that causes different execution paths in strace to be taken?

Comment 12 Carlos O'Donell 2021-05-18 20:19:39 UTC
(In reply to Florian Weimer from comment #11)
> Is it possible to reproduce this issue with an already-built strace binary?
> 
> We added definitions of PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP to
> <sys/ptrace.h>, maybe that causes different execution paths in strace to be
> taken?

We also have the 'sc' vs. 'svc' instruction selection in the same timeframe for power?

Comment 13 Carlos O'Donell 2021-05-18 20:20:48 UTC
(In reply to Carlos O'Donell from comment #12)
> (In reply to Florian Weimer from comment #11)
> > Is it possible to reproduce this issue with an already-built strace binary?
> > 
> > We added definitions of PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP to
> > <sys/ptrace.h>, maybe that causes different execution paths in strace to be
> > taken?
> 
> We also have the 'sc' vs. 'svc' instruction selection in the same timeframe
> for power?

Sorry, 'scv'.

Comment 14 Carlos O'Donell 2021-05-18 21:16:04 UTC
A recent scratch build failed on a POWER9 box, which would be the hardware under which glibc would switch to 'scv' for syscalls.

Just two quick snippets:

--- exp 2021-05-18 20:26:11.715277248 +0000
+++ log 2021-05-18 20:26:11.715277248 +0000
-chdir(0x7fffb053ffe0) = -1 EFAULT (Bad address)
-chdir(0x7fffb053ffe1) = -1 EFAULT (Bad address)
+chdir(0x7fffb053ffe0) = 18446744073709551602
+chdir(0x7fffb053ffe1) = 18446744073709551602
FAIL printpath-umovestr-legacy.test (exit status: 1)
FAIL: printstrn-umoven-legacy

--- exp 2021-05-18 20:26:44.524009554 +0000
+++ log 2021-05-18 20:26:44.524009554 +0000
@@ -1,2 +1,2 @@
-writev(-1, [{iov_base="", iov_len=0}, {iov_base="f", iov_len=1}, {iov_base="ef", iov_len=2}, {iov_base="def", iov_len=3}, {iov_base="cdef", iov_len=4}, {iov_base="bcdef", iov_len=5}, {iov_base="abcdef", iov_len=6}, {iov_base="zabcdef", iov_len=7}, {iov_base="yzabcdef", iov_len=8}, {iov_base="xyzabcdef", iov_len=9}, {iov_base="wxyzabcdef", iov_len=10}, {iov_base="vwxyzabcdef", iov_len=11}, {iov_base="uvwxyzabcdef", iov_len=12}, {iov_base="tuvwxyzabcdef", iov_len=13}, {iov_base="stuvwxyzabcdef", iov_len=14}, {iov_base="rstuvwxyzabcdef", iov_len=15}, {iov_base="qrstuvwxyzabcdef", iov_len=16}, {iov_base="pqrstuvwxyzabcdef", iov_len=17}, {iov_base="opqrstuvwxyzabcdef", iov_len=18}, {iov_base="nopqrstuvwxyzabcdef", iov_len=19}, {iov_base="mnopqrstuvwxyzabcdef", iov_len=20}, {iov_base="lmnopqrstuvwxyzabcdef", iov_len=21}, {iov_base="klmnopqrstuvwxyzabcdef", iov_len=22}, {iov_base="jklmnopqrstuvwxyzabcdef", iov_len=23}, {iov_base="ijklmnopqrstuvwxyzabcdef", iov_len=24}, {iov_base="hijklmnopqrstuvwxyzabcdef", iov_len=25}, {iov_base="ghijklmnopqrstuvwxyzabcdef", iov_len=26}, {iov_base="fghijklmnopqrstuvwxyzabcdef", iov_len=27}, {iov_base="efghijklmnopqrstuvwxyzabcdef", iov_len=28}, {iov_base="defghijklmnopqrstuvwxyzabcdef", iov_len=29}, {iov_base="cdefghijklmnopqrstuvwxyzabcdef", iov_len=30}, {iov_base="bcdefghijklmnopqrstuvwxyzabcdef", iov_len=31}], 32) = -1 EBADF (Bad file descriptor)
+writev(-1, [{iov_base="", iov_len=0}, {iov_base="f", iov_len=1}, {iov_base="ef", iov_len=2}, {iov_base="def", iov_len=3}, {iov_base="cdef", iov_len=4}, {iov_base="bcdef", iov_len=5}, {iov_base="abcdef", iov_len=6}, {iov_base="zabcdef", iov_len=7}, {iov_base="yzabcdef", iov_len=8}, {iov_base="xyzabcdef", iov_len=9}, {iov_base="wxyzabcdef", iov_len=10}, {iov_base="vwxyzabcdef", iov_len=11}, {iov_base="uvwxyzabcdef", iov_len=12}, {iov_base="tuvwxyzabcdef", iov_len=13}, {iov_base="stuvwxyzabcdef", iov_len=14}, {iov_base="rstuvwxyzabcdef", iov_len=15}, {iov_base="qrstuvwxyzabcdef", iov_len=16}, {iov_base="pqrstuvwxyzabcdef", iov_len=17}, {iov_base="opqrstuvwxyzabcdef", iov_len=18}, {iov_base="nopqrstuvwxyzabcdef", iov_len=19}, {iov_base="mnopqrstuvwxyzabcdef", iov_len=20}, {iov_base="lmnopqrstuvwxyzabcdef", iov_len=21}, {iov_base="klmnopqrstuvwxyzabcdef", iov_len=22}, {iov_base="jklmnopqrstuvwxyzabcdef", iov_len=23}, {iov_base="ijklmnopqrstuvwxyzabcdef", iov_len=24}, {iov_base="hijklmnopqrstuvwxyzabcdef", iov_len=25}, {iov_base="ghijklmnopqrstuvwxyzabcdef", iov_len=26}, {iov_base="fghijklmnopqrstuvwxyzabcdef", iov_len=27}, {iov_base="efghijklmnopqrstuvwxyzabcdef", iov_len=28}, {iov_base="defghijklmnopqrstuvwxyzabcdef", iov_len=29}, {iov_base="cdefghijklmnopqrstuvwxyzabcdef", iov_len=30}, {iov_base="bcdefghijklmnopqrstuvwxyzabcdef", iov_len=31}], 32) = 18446744073709551607
FAIL umovestr_cached.test (exit status: 1)

The last successful strace build is with glibc-2.32.9000-20.fc34.ppc64le, which is ~4 builds before 'scv' is introduced.

All builds after 'scv' is introduced in glibc-2.32.9000-24 (2021-01-08) appear to fail.

I don't see any 'scv' enablement in strace for ppc64le.

Comment 15 Tulio Magno Quites Machado Filho 2021-05-18 21:18:21 UTC
(In reply to Carlos O'Donell from comment #14)
> I don't see any 'scv' enablement in strace for ppc64le.

I agree with you. strace upstream is missing support for scv.
e.g. in this function https://github.com/strace/strace/blob/master/src/linux/powerpc/get_error.c
We can't trust on the value of CR0 anymore.  An error is returned when r3 is negative.

Comment 16 Dmitry V. Levin 2021-05-18 22:40:03 UTC
(In reply to Tulio Magno Quites Machado Filho from comment #15)
> (In reply to Carlos O'Donell from comment #14)
> > I don't see any 'scv' enablement in strace for ppc64le.
> 
> I agree with you. strace upstream is missing support for scv.
> e.g. in this function
> https://github.com/strace/strace/blob/master/src/linux/powerpc/get_error.c
> We can't trust on the value of CR0 anymore.  An error is returned when r3 is
> negative.

Does this mean that the kernel function syscall_get_error and all its users including PTRACE_GET_SYSCALL_INFO API is broken when scv is used?

Comment 17 Dmitry V. Levin 2021-05-18 22:56:59 UTC
(In reply to Dmitry V. Levin from comment #16)
> (In reply to Tulio Magno Quites Machado Filho from comment #15)
> > (In reply to Carlos O'Donell from comment #14)
> > > I don't see any 'scv' enablement in strace for ppc64le.
> > 
> > I agree with you. strace upstream is missing support for scv.
> > e.g. in this function
> > https://github.com/strace/strace/blob/master/src/linux/powerpc/get_error.c
> > We can't trust on the value of CR0 anymore.  An error is returned when r3 is
> > negative.
> 
> Does this mean that the kernel function syscall_get_error and all its users
> including PTRACE_GET_SYSCALL_INFO API are broken when scv is used?

Looks like the answer is yes, the kernel commit v5.9-rc1~100^2~164 that introduced scv support was incomplete, all users of ccr in arch/powerpc/include/asm/ptrace.h and arch/powerpc/include/asm/syscall.h are broken when scv is used.

So while the idea of changing the error handling convention is questionable,
the bug is on the kernel side that failed to implement the new error handling convention.

Comment 18 Dmitry V. Levin 2021-05-18 23:32:22 UTC
(In reply to Tulio Magno Quites Machado Filho from comment #15)
> (In reply to Carlos O'Donell from comment #14)
> > I don't see any 'scv' enablement in strace for ppc64le.
> 
> I agree with you. strace upstream is missing support for scv.
> e.g. in this function
> https://github.com/strace/strace/blob/master/src/linux/powerpc/get_error.c

Please note that this file is not used when PTRACE_GET_SYSCALL_INFO is working properly,
there is a runtime test for this, see

$ strace -d -enone / 2>&1 |grep PTRACE_GET_SYSCALL_INFO

However, there is going to be an issue with src/linux/powerpc/set_error.c and its users, e.g. syscall tampering.

> We can't trust on the value of CR0 anymore.  An error is returned when r3 is
> negative.

I hope somebody who knows scv semantics would contribute missing parts of scv support to strace and the kernel before strace is kicked out from Fedora due to ftbfs.

Comment 19 Dmitry V. Levin 2021-05-20 03:47:43 UTC
There was a short discussion in linux-api mailing list that identified two bugs:
1. incomplete scv abi support in the kernel that breaks users of asm/syscall.h and asm/ptrace.h, these users include ptrace, tracing, and audit subsystems; detected by hundereds of tests from the strace test suite, proposed patch can be found at https://lore.kernel.org/linux-api/1621385544.nttlk5qugb.astroid@bobo.none/ ;
2. a glibc bug detected by signal test from the strace test suite, proposed patch can be found at https://lore.kernel.org/linux-api/1621400263.gf0mbqhkrf.astroid@bobo.none/ .

Feel free to split this bug report into two if necessary.

Comment 20 Dmitry V. Levin 2021-05-20 23:06:30 UTC
Since the glibc bug is now tracked by #1962971, switching this bug back to the kernel component.

Meanwhile, Nicholas Piggin submitted a patch that's likely to be the final, see
https://lore.kernel.org/linuxppc-dev/20210520111931.2597127-2-npiggin@gmail.com/

I'd like to remind that this bug was found by strace test suite and it blocks strace updates in Fedora.

I hope the fix will be applied before strace is kicked out from Fedora due to ftbfs which is 19 weeks already.

Comment 21 Dmitry V. Levin 2021-05-23 17:43:31 UTC
(In reply to Dmitry V. Levin from comment #20)
> Meanwhile, Nicholas Piggin submitted a patch that's likely to be the final,
> see
> https://lore.kernel.org/linuxppc-dev/20210520111931.2597127-2-npiggin@gmail.
> com/

This patch has been merged upstream, see
https://git.kernel.org/torvalds/c/d72500f992849d31ebae8f821a023660ddd0dcc2

Comment 22 Justin M. Forbes 2021-05-24 14:19:12 UTC
That should be in the 5.13-rc3 build today for rawhide. I have also cherry picked it for 5.12, so it should land in stable fedora with 5.12.7 and newer.

Comment 23 Dan Horák 2021-06-14 15:35:34 UTC
Dmitry, is there still something blocking you from updating strace? I believe we already have all the kernel/glibc bits included for F-34+.

Comment 24 Dmitry V. Levin 2021-06-16 12:06:57 UTC
Everything should have been fixed now, but I cannot tell for sure because something else is broken: a scratch build of strace HEAD failed with the following diagnostics: BuildrootError: could not init mock buildroot, mock exited with status 30; see root.log for more information

The corresponding root.log contains the following diagnostics:
DEBUG util.py:444:  Error: 
DEBUG util.py:444:   Problem: package util-linux-2.37-2.fc35.x86_64 requires util-linux-core = 2.37-2.fc35, but none of the providers can be installed
DEBUG util.py:444:    - conflicting requests
DEBUG util.py:444:    - nothing provides libpcre2-posix.so.2()(64bit) needed by util-linux-core-2.37-2.fc35.x86_64
DEBUG util.py:446:  (try to add '--skip-broken' to skip uninstallable packages)
DEBUG util.py:598:  Child return code was: 1

This looks as if all builds to fc35 are currently broken.
For more details see the scratch build task: https://koji.fedoraproject.org/koji/taskinfo?taskID=70224999

Comment 25 Dan Horák 2021-06-16 12:17:33 UTC
The buildroot should be OK again, there was a pcre2 update with unexpected consequences, but it has been untagged.

Comment 26 Dan Horák 2021-06-16 14:59:25 UTC
for the record, my local F-34 rebuild of strace HEAD with kernel-5.12.10-300.fc34.ppc64le and glibc-2.33-16.fc34.ppc64le was successful

Comment 27 Dmitry V. Levin 2021-06-16 23:19:42 UTC
(In reply to Dan Horák from comment #26)
> for the record, my local F-34 rebuild of strace HEAD with
> kernel-5.12.10-300.fc34.ppc64le and glibc-2.33-16.fc34.ppc64le was successful

My scratch build to fc34 was successful, too:
https://koji.fedoraproject.org/koji/taskinfo?taskID=70227524

Comment 28 Martin Cermak 2022-09-05 08:05:57 UTC
Maybe this one could be flipped to MODIFIED or further?  Thanks!

Comment 29 Dan Horák 2022-09-05 08:10:08 UTC
It was fixed long time ago, closing.


Note You need to log in before you can comment on or make changes to this bug.