Hello, strace upstream is speaking. :) There seems to be a severe ppc64le kernel regression detected by strace %check (625 of 972 tests failed). This ruins strace and blocks any strace update. For me it looks as if the kernel function syscall_get_error started to return 0 instead of the error code. The logs are in the latest strace build task that failed: https://koji.fedoraproject.org/koji/taskinfo?taskID=62171493 The kernel used for build there was Linux buildvm-ppc64le-14.iad2.fedoraproject.org 5.10.15-200.fc33.ppc64le #1 SMP Wed Feb 10 17:31:04 UTC 2021 ppc64le ppc64le ppc64le GNU/Linux For comparison, the same version of strace passes all tests on all other rawhide architectures, as well as on Debian ppc64le (kernel 5.10.13-1). The latest successfull strace build to rawhide was about 2 months ago: https://koji.fedoraproject.org/koji/buildinfo?buildID=1658216 The kernel used for build there was Linux buildvm-ppc64le-38.iad2.fedoraproject.org 5.9.11-200.fc33.ppc64le #1 SMP Tue Nov 24 18:02:53 UTC 2020 ppc64le ppc64le ppc64le GNU/Linux
Fedora's 5.10.7-200.fc33.ppc64le (on buildvm-ppc64le-19.iad2.fedoraproject.org) was broken as well, for the record.
I'd like to reiterate: this kernel bug breaks strace badly, making it almost unusable.
This is the reason why the latest strace release didn't get into Fedora, see https://koji.fedoraproject.org/koji/buildinfo?buildID=1711422 https://koji.fedoraproject.org/koji/buildinfo?buildID=1711415 Until this bug is fixed, no strace updates in Fedora are possible.
Using the latest strace that can decode the structure returned by PTRACE_GETREGSET, one can see that's wrong in the data returned by the kernel. Linux buildvm-ppc64le-22.iad2.fedoraproject.org 5.11.10-200.fc33.ppc64le on exiting chdir("") syscall did the following: ptrace(PTRACE_GETREGSET, 702867, NT_PRSTATUS, {iov_base={gpr=[0xc, 0x7fffd8c3f250, 0x7fffa60b7000, 0xfffffffffffffffe, 0x7fffd8c3f158, 0, 0x8, 0x20, 0xfffffffe7fffffff, 0, 0, 0, 0, 0x7fffa615a390, 0, 0, 0, 0x7fffd8c3fa58, 0x7fffd8c3f308, 0x7fffd8c3f304, 0x1228f4f68, 0x1228f4fb0, 0xab98c, 0xab991, 0x7fffa61533a0, 0x1228f5180, 0x1228f51b8, 0x122910010, 0, 0xab98c, 0x7fffa5c6fff0, 0x100000], nip=0x7fffa5fb4ad4, msr=0x800000000000d033, orig_gpr3=0x1228f52e8, ctr=0, link=0, xer=0, ccr=0x44000278, softe=0x1, trap=0x3000, dar=0x7fffa5fb4aa8, dsisr=0x40000000, result=0xfffffffffffffffe, ...}, iov_len=384}) = 0 As you can see, unlike any normal ppc64le kernel, here ccr does not have 0x10000000 bit set, and gpr[3] equals to 0xfffffffffffffffe (== -2L) instead of expected 0x2 (== ENOENT). So yes, this kernel has an obviously broken ptrace ABI. You can find this and much more in the scratch build log at https://koji.fedoraproject.org/koji/taskinfo?taskID=67867999
for the record, from my F-32 system (bare-metal Power9, not a VM) with kernel-5.11.19-100.fc32.ppc64le, strace built from master branch ============================================================================ Testsuite summary for strace 5.12.0.54.05c8 ============================================================================ # TOTAL: 1002 # PASS: 905 # SKIP: 97 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0
and I get these from a rawhide VM with kernel-5.13.0-0.rc1.20210512git88b06399c9c7.15.fc35.ppc64le ============================================================================ Testsuite summary for strace 5.12.0.54.05c8 ============================================================================ # TOTAL: 1002 # PASS: 232 # SKIP: 102 # XFAIL: 0 # FAIL: 668 # XPASS: 0 # ERROR: 0 but in F-32 VM with kernel kernel-5.11.19-100.fc32.ppc64le everything looks good ============================================================================ Testsuite summary for strace 5.12.0.54.05c8 ============================================================================ # TOTAL: 1002 # PASS: 905 # SKIP: 97 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0 ============================================================================ and I get the same good result in the F-32 VM with kernel-5.12.3-300.fc34.ppc64le F-32 VM with kernel-5.13.0-0.rc1.20210512git88b06399c9c7.15.fc35.ppc64le then gives ============================================================================ Testsuite summary for strace 5.12.0.54.05c8 ============================================================================ # TOTAL: 1002 # PASS: 901 # SKIP: 101 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0 ============================================================================ It makes me think that the failures are not related to the kernel version, but to something else in the rawhide buildroot, perhaps glibc?
but kernel-headers-5.11.19-100.fc32.ppc64le was always used for the F-32 tests (if this should make the difference, will check ...)
(In reply to Dan Horák from comment #6) > It makes me think that the failures are not related to the kernel version, > but to something else in the rawhide buildroot, perhaps glibc? If it's not the kernel but something else, then this something else is also capable of meddling both with struct ptrace_syscall_info returned by ptrace(PTRACE_GET_SYSCALL_INFO, pid, size, info) as described in the first comment, and with struct pt_regs returned by syscall(__NR_ptrace, PTRACE_GETREGSET, pid, NT_PRSTATUS, iov) in a very specific way described in #c4; in other words, this something is changing both ccr and gpr[3] in the kernel for all ptraced processes.
after updating to glibc-2.33.9000-6.fc35 in my F-32 VM, where the tests were passing, I get now the failures ============================================================================ Testsuite summary for strace 5.12.0.54.05c8 ============================================================================ # TOTAL: 1002 # PASS: 232 # SKIP: 102 # XFAIL: 0 # FAIL: 668 # XPASS: 0 # ERROR: 0 ============================================================================
let's switch to glibc for their feedback ...
Is it possible to reproduce this issue with an already-built strace binary? We added definitions of PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP to <sys/ptrace.h>, maybe that causes different execution paths in strace to be taken?
(In reply to Florian Weimer from comment #11) > Is it possible to reproduce this issue with an already-built strace binary? > > We added definitions of PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP to > <sys/ptrace.h>, maybe that causes different execution paths in strace to be > taken? We also have the 'sc' vs. 'svc' instruction selection in the same timeframe for power?
(In reply to Carlos O'Donell from comment #12) > (In reply to Florian Weimer from comment #11) > > Is it possible to reproduce this issue with an already-built strace binary? > > > > We added definitions of PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP to > > <sys/ptrace.h>, maybe that causes different execution paths in strace to be > > taken? > > We also have the 'sc' vs. 'svc' instruction selection in the same timeframe > for power? Sorry, 'scv'.
A recent scratch build failed on a POWER9 box, which would be the hardware under which glibc would switch to 'scv' for syscalls. Just two quick snippets: --- exp 2021-05-18 20:26:11.715277248 +0000 +++ log 2021-05-18 20:26:11.715277248 +0000 -chdir(0x7fffb053ffe0) = -1 EFAULT (Bad address) -chdir(0x7fffb053ffe1) = -1 EFAULT (Bad address) +chdir(0x7fffb053ffe0) = 18446744073709551602 +chdir(0x7fffb053ffe1) = 18446744073709551602 FAIL printpath-umovestr-legacy.test (exit status: 1) FAIL: printstrn-umoven-legacy --- exp 2021-05-18 20:26:44.524009554 +0000 +++ log 2021-05-18 20:26:44.524009554 +0000 @@ -1,2 +1,2 @@ -writev(-1, [{iov_base="", iov_len=0}, {iov_base="f", iov_len=1}, {iov_base="ef", iov_len=2}, {iov_base="def", iov_len=3}, {iov_base="cdef", iov_len=4}, {iov_base="bcdef", iov_len=5}, {iov_base="abcdef", iov_len=6}, {iov_base="zabcdef", iov_len=7}, {iov_base="yzabcdef", iov_len=8}, {iov_base="xyzabcdef", iov_len=9}, {iov_base="wxyzabcdef", iov_len=10}, {iov_base="vwxyzabcdef", iov_len=11}, {iov_base="uvwxyzabcdef", iov_len=12}, {iov_base="tuvwxyzabcdef", iov_len=13}, {iov_base="stuvwxyzabcdef", iov_len=14}, {iov_base="rstuvwxyzabcdef", iov_len=15}, {iov_base="qrstuvwxyzabcdef", iov_len=16}, {iov_base="pqrstuvwxyzabcdef", iov_len=17}, {iov_base="opqrstuvwxyzabcdef", iov_len=18}, {iov_base="nopqrstuvwxyzabcdef", iov_len=19}, {iov_base="mnopqrstuvwxyzabcdef", iov_len=20}, {iov_base="lmnopqrstuvwxyzabcdef", iov_len=21}, {iov_base="klmnopqrstuvwxyzabcdef", iov_len=22}, {iov_base="jklmnopqrstuvwxyzabcdef", iov_len=23}, {iov_base="ijklmnopqrstuvwxyzabcdef", iov_len=24}, {iov_base="hijklmnopqrstuvwxyzabcdef", iov_len=25}, {iov_base="ghijklmnopqrstuvwxyzabcdef", iov_len=26}, {iov_base="fghijklmnopqrstuvwxyzabcdef", iov_len=27}, {iov_base="efghijklmnopqrstuvwxyzabcdef", iov_len=28}, {iov_base="defghijklmnopqrstuvwxyzabcdef", iov_len=29}, {iov_base="cdefghijklmnopqrstuvwxyzabcdef", iov_len=30}, {iov_base="bcdefghijklmnopqrstuvwxyzabcdef", iov_len=31}], 32) = -1 EBADF (Bad file descriptor) +writev(-1, [{iov_base="", iov_len=0}, {iov_base="f", iov_len=1}, {iov_base="ef", iov_len=2}, {iov_base="def", iov_len=3}, {iov_base="cdef", iov_len=4}, {iov_base="bcdef", iov_len=5}, {iov_base="abcdef", iov_len=6}, {iov_base="zabcdef", iov_len=7}, {iov_base="yzabcdef", iov_len=8}, {iov_base="xyzabcdef", iov_len=9}, {iov_base="wxyzabcdef", iov_len=10}, {iov_base="vwxyzabcdef", iov_len=11}, {iov_base="uvwxyzabcdef", iov_len=12}, {iov_base="tuvwxyzabcdef", iov_len=13}, {iov_base="stuvwxyzabcdef", iov_len=14}, {iov_base="rstuvwxyzabcdef", iov_len=15}, {iov_base="qrstuvwxyzabcdef", iov_len=16}, {iov_base="pqrstuvwxyzabcdef", iov_len=17}, {iov_base="opqrstuvwxyzabcdef", iov_len=18}, {iov_base="nopqrstuvwxyzabcdef", iov_len=19}, {iov_base="mnopqrstuvwxyzabcdef", iov_len=20}, {iov_base="lmnopqrstuvwxyzabcdef", iov_len=21}, {iov_base="klmnopqrstuvwxyzabcdef", iov_len=22}, {iov_base="jklmnopqrstuvwxyzabcdef", iov_len=23}, {iov_base="ijklmnopqrstuvwxyzabcdef", iov_len=24}, {iov_base="hijklmnopqrstuvwxyzabcdef", iov_len=25}, {iov_base="ghijklmnopqrstuvwxyzabcdef", iov_len=26}, {iov_base="fghijklmnopqrstuvwxyzabcdef", iov_len=27}, {iov_base="efghijklmnopqrstuvwxyzabcdef", iov_len=28}, {iov_base="defghijklmnopqrstuvwxyzabcdef", iov_len=29}, {iov_base="cdefghijklmnopqrstuvwxyzabcdef", iov_len=30}, {iov_base="bcdefghijklmnopqrstuvwxyzabcdef", iov_len=31}], 32) = 18446744073709551607 FAIL umovestr_cached.test (exit status: 1) The last successful strace build is with glibc-2.32.9000-20.fc34.ppc64le, which is ~4 builds before 'scv' is introduced. All builds after 'scv' is introduced in glibc-2.32.9000-24 (2021-01-08) appear to fail. I don't see any 'scv' enablement in strace for ppc64le.
(In reply to Carlos O'Donell from comment #14) > I don't see any 'scv' enablement in strace for ppc64le. I agree with you. strace upstream is missing support for scv. e.g. in this function https://github.com/strace/strace/blob/master/src/linux/powerpc/get_error.c We can't trust on the value of CR0 anymore. An error is returned when r3 is negative.
(In reply to Tulio Magno Quites Machado Filho from comment #15) > (In reply to Carlos O'Donell from comment #14) > > I don't see any 'scv' enablement in strace for ppc64le. > > I agree with you. strace upstream is missing support for scv. > e.g. in this function > https://github.com/strace/strace/blob/master/src/linux/powerpc/get_error.c > We can't trust on the value of CR0 anymore. An error is returned when r3 is > negative. Does this mean that the kernel function syscall_get_error and all its users including PTRACE_GET_SYSCALL_INFO API is broken when scv is used?
(In reply to Dmitry V. Levin from comment #16) > (In reply to Tulio Magno Quites Machado Filho from comment #15) > > (In reply to Carlos O'Donell from comment #14) > > > I don't see any 'scv' enablement in strace for ppc64le. > > > > I agree with you. strace upstream is missing support for scv. > > e.g. in this function > > https://github.com/strace/strace/blob/master/src/linux/powerpc/get_error.c > > We can't trust on the value of CR0 anymore. An error is returned when r3 is > > negative. > > Does this mean that the kernel function syscall_get_error and all its users > including PTRACE_GET_SYSCALL_INFO API are broken when scv is used? Looks like the answer is yes, the kernel commit v5.9-rc1~100^2~164 that introduced scv support was incomplete, all users of ccr in arch/powerpc/include/asm/ptrace.h and arch/powerpc/include/asm/syscall.h are broken when scv is used. So while the idea of changing the error handling convention is questionable, the bug is on the kernel side that failed to implement the new error handling convention.
(In reply to Tulio Magno Quites Machado Filho from comment #15) > (In reply to Carlos O'Donell from comment #14) > > I don't see any 'scv' enablement in strace for ppc64le. > > I agree with you. strace upstream is missing support for scv. > e.g. in this function > https://github.com/strace/strace/blob/master/src/linux/powerpc/get_error.c Please note that this file is not used when PTRACE_GET_SYSCALL_INFO is working properly, there is a runtime test for this, see $ strace -d -enone / 2>&1 |grep PTRACE_GET_SYSCALL_INFO However, there is going to be an issue with src/linux/powerpc/set_error.c and its users, e.g. syscall tampering. > We can't trust on the value of CR0 anymore. An error is returned when r3 is > negative. I hope somebody who knows scv semantics would contribute missing parts of scv support to strace and the kernel before strace is kicked out from Fedora due to ftbfs.
There was a short discussion in linux-api mailing list that identified two bugs: 1. incomplete scv abi support in the kernel that breaks users of asm/syscall.h and asm/ptrace.h, these users include ptrace, tracing, and audit subsystems; detected by hundereds of tests from the strace test suite, proposed patch can be found at https://lore.kernel.org/linux-api/1621385544.nttlk5qugb.astroid@bobo.none/ ; 2. a glibc bug detected by signal test from the strace test suite, proposed patch can be found at https://lore.kernel.org/linux-api/1621400263.gf0mbqhkrf.astroid@bobo.none/ . Feel free to split this bug report into two if necessary.
Since the glibc bug is now tracked by #1962971, switching this bug back to the kernel component. Meanwhile, Nicholas Piggin submitted a patch that's likely to be the final, see https://lore.kernel.org/linuxppc-dev/20210520111931.2597127-2-npiggin@gmail.com/ I'd like to remind that this bug was found by strace test suite and it blocks strace updates in Fedora. I hope the fix will be applied before strace is kicked out from Fedora due to ftbfs which is 19 weeks already.
(In reply to Dmitry V. Levin from comment #20) > Meanwhile, Nicholas Piggin submitted a patch that's likely to be the final, > see > https://lore.kernel.org/linuxppc-dev/20210520111931.2597127-2-npiggin@gmail. > com/ This patch has been merged upstream, see https://git.kernel.org/torvalds/c/d72500f992849d31ebae8f821a023660ddd0dcc2
That should be in the 5.13-rc3 build today for rawhide. I have also cherry picked it for 5.12, so it should land in stable fedora with 5.12.7 and newer.
Dmitry, is there still something blocking you from updating strace? I believe we already have all the kernel/glibc bits included for F-34+.
Everything should have been fixed now, but I cannot tell for sure because something else is broken: a scratch build of strace HEAD failed with the following diagnostics: BuildrootError: could not init mock buildroot, mock exited with status 30; see root.log for more information The corresponding root.log contains the following diagnostics: DEBUG util.py:444: Error: DEBUG util.py:444: Problem: package util-linux-2.37-2.fc35.x86_64 requires util-linux-core = 2.37-2.fc35, but none of the providers can be installed DEBUG util.py:444: - conflicting requests DEBUG util.py:444: - nothing provides libpcre2-posix.so.2()(64bit) needed by util-linux-core-2.37-2.fc35.x86_64 DEBUG util.py:446: (try to add '--skip-broken' to skip uninstallable packages) DEBUG util.py:598: Child return code was: 1 This looks as if all builds to fc35 are currently broken. For more details see the scratch build task: https://koji.fedoraproject.org/koji/taskinfo?taskID=70224999
The buildroot should be OK again, there was a pcre2 update with unexpected consequences, but it has been untagged.
for the record, my local F-34 rebuild of strace HEAD with kernel-5.12.10-300.fc34.ppc64le and glibc-2.33-16.fc34.ppc64le was successful
(In reply to Dan Horák from comment #26) > for the record, my local F-34 rebuild of strace HEAD with > kernel-5.12.10-300.fc34.ppc64le and glibc-2.33-16.fc34.ppc64le was successful My scratch build to fc34 was successful, too: https://koji.fedoraproject.org/koji/taskinfo?taskID=70227524
Maybe this one could be flipped to MODIFIED or further? Thanks!
It was fixed long time ago, closing.