Bug 2218998

Summary: strace: strace--relative-timestamps-* tests expect a limited elapsed time and fail in CI.
Product: [Fedora] Fedora Reporter: Carlos O'Donell <codonell>
Component: straceAssignee: Dmitry V. Levin <ldv>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 39CC: esyr, fweimer, ldv
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Carlos O'Donell 2023-06-30 21:19:10 UTC
The glibc Fedora Rawhide CI tests rebuild strace to ensure that we don't break it when synchronizing with upstream glibc development.

I have noticed that 2 strace tests fail:

++ printf '#%d: %s\n' 2 '[ ]{5}(1\.[01]|0\.9)[[:digit:]]{2} \+\+\+ exited with 0 \+\+\+'
#2: [ ]{5}(1\.[01]|0\.9)[[:digit:]]{2} \+\+\+ exited with 0 \+\+\+
++ cnt=3
++ read -r pattern
++ test -z 1
++ echo 'Actual output:'
Actual output:
++ cat
     0.000 execve("../sleep", ["../sleep", "1"], 0x7fffe6c50798 /* 61 vars */) = 0
     1.330 +++ exited with 0 +++
++ fail_ '../../src/strace --relative-timestamps=ms -eexecve ../sleep 1 output mismatch'
++ warn_ 'strace--relative-timestamps-ms.gen.test: failed test: ../../src/strace --relative-timestamps=ms -eexecve ../sleep 1 output mismatch'
++ printf '%s\n' 'strace--relative-timestamps-ms.gen.test: failed test: ../../src/strace --relative-timestamps=ms -eexecve ../sleep 1 output mismatch'
strace--relative-timestamps-ms.gen.test: failed test: ../../src/strace --relative-timestamps=ms -eexecve ../sleep 1 output mismatch
++ exit 1
FAIL strace--relative-timestamps-ms.gen.test (exit status: 1)
FAIL: strace--relative-timestamps-us.gen

++ printf '#%d: %s\n' 2 '[ ]{5}(1\.[01]|0\.9)[[:digit:]]{5} \+\+\+ exited with 0 \+\+\+'
#2: [ ]{5}(1\.[01]|0\.9)[[:digit:]]{5} \+\+\+ exited with 0 \+\+\+
++ cnt=3
++ read -r pattern
++ test -z 1
++ echo 'Actual output:'
Actual output:
++ cat
     0.000000 execve("../sleep", ["../sleep", "1"], 0x7fffd4940f08 /* 61 vars */) = 0
     1.790001 +++ exited with 0 +++
++ fail_ '../../src/strace --relative-timestamps=us -eexecve ../sleep 1 output mismatch'
++ warn_ 'strace--relative-timestamps-us.gen.test: failed test: ../../src/strace --relative-timestamps=us -eexecve ../sleep 1 output mismatch'
++ printf '%s\n' 'strace--relative-timestamps-us.gen.test: failed test: ../../src/strace --relative-timestamps=us -eexecve ../sleep 1 output mismatch'
strace--relative-timestamps-us.gen.test: failed test: ../../src/strace --relative-timestamps=us -eexecve ../sleep 1 output mismatch
++ exit 1
FAIL strace--relative-timestamps-us.gen.test (exit status: 1)

============================================================================
Testsuite summary for strace 6.4
============================================================================
# TOTAL: 1304
# PASS:  1150
# SKIP:  152
# XFAIL: 0
# FAIL:  2
# XPASS: 0
# ERROR: 0
============================================================================
See tests/test-suite.log
Please report to strace-devel.io
============================================================================
make[5]: Leaving directory '/builddir/build/BUILD/strace-6.4/tests'
make[2]: Nothing to be done for 'check-am'.
RPM build errors:
make[5]: *** [Makefile:17652: test-suite.log] Error 1
make[4]: *** [Makefile:17760: check-TESTS] Error 2
make[3]: *** [Makefile:17827: check-am] Error 2
make[2]: *** [Makefile:17830: check] Error 2
make[1]: *** [Makefile:596: check-recursive] Error 1
make: *** [Makefile:901: check] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.GixbZK (%check)
    Bad exit status from /var/tmp/rpm-tmp.GixbZK (%check)
Child return code was: 1

Is there any way we can make these tests less prone to false positives in the Fedora Rawhide CI infrastructure?

Reproducible: Always

Comment 1 Dmitry V. Levin 2023-06-30 22:05:50 UTC
strace 6.4 was built to Rawhide just a few days ago.
Has something changed with the CI infrastructure so that sleep(1) now takes so long?

Comment 2 Carlos O'Donell 2023-07-03 03:03:58 UTC
(In reply to Dmitry V. Levin from comment #1)
> strace 6.4 was built to Rawhide just a few days ago.
> Has something changed with the CI infrastructure so that sleep(1) now takes
> so long?

Let me explain a little more about the glibc Fedora Rawhide CI.

This is the bodhi update:
https://bodhi.fedoraproject.org/updates/FEDORA-2023-fd52ab13db

The failing glibc Fedora Rawhide CI was "baseos-qe.koji-build.scratch-build.validation"

This is the glibc Fedora Rawhide CI test:
https://osci-jenkins-1.ci.fedoraproject.org/job/scratch-build-test/1013/console

The test in question builds key packages in a side tag with the new glibc, and it does so using Koji to build scratch builds.

What I think we are seeing is a loaded koji server where sleep takes longer to complete.

There is no limit on how long sleep will take in this case and the tests will be prone to false positive.

I want to keep strace in the list of packages we build as reverse dependencies because we have broken it in the past.

Dmitry, What might you recommend to make the test results more stable under load?

Comment 3 Fedora Release Engineering 2023-08-16 08:11:40 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.