Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1362283

Summary:	[RHELSA-7.3] perf-trace-test: process attach failed
Product:	Red Hat Enterprise Linux 7	Reporter:	PaulB <pbunyan>
Component:	kernel-aarch64	Assignee:	Christopher Covington <cov>
kernel-aarch64 sub component:	Platform Enablement	QA Contact:	Jeff Bastian <jbastian>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	medium
Priority:	medium	CC:	ernunes, jbastian, jfeeney, mlangsdo, mpetlan, pbunyan
Version:	7.3
Target Milestone:	rc
Target Release:	---
Hardware:	aarch64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-08-09 17:57:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description PaulB 2016-08-01 19:55:58 UTC

Description of problem:
 The following failure was seen during perf/Sanity/basic-test-aarch64 
KT1 testing:
 Regexp not found: "sleep"
 Regexp not found: "close"
 Regexp not found: "exit"

Version-Release number of selected component (if applicable):
 distro: RHEL-7.3-20160707.2
 kernel: 4.5.0-0.47.el7

How reproducible:
 Consistently on target vendor hardware listed in comment#1

Steps to Reproduce:
1. Install target vendor hardware listed in comment#1 with RHEL-7.3-20160707.2 
2. Install kernel-4.5.0-0.47.el7
3. run /tools/perf/Sanity/basic-test-aarch64 task

Actual results:
https://beaker.engineering.redhat.com/recipes/2900823#task43526204
http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2016/07/14167/1416721/2900823/43526204/taskout.log
---<-snip->---
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: perf trace test
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [  BEGIN   ] :: Running setup for trace :: actually running './setup.sh'
[1;32m## [ PASS ] ##[m perf_trace :: setup SUMMARY
:: [[0;32m   PASS   [0m] :: Running setup for trace (Expected 0, got 0)
:: [  BEGIN   ] :: Runnning test test_basic.sh :: actually running './test_basic.sh > /mnt/tests/tools/perf/Sanity/basic-test-aarch64/LOGS/trace/test_basic.sh.log'
:: [[0;31m   FAIL   [0m] :: Runnning test test_basic.sh (Expected 0, got 1)
-- [ SKIP ] -- perf_trace :: test_basic :: help message :: testcase skipped
-- [ PASS ] -- perf_trace :: test_basic :: basic execution
-- [ PASS ] -- perf_trace :: test_basic :: duration threshold
-- [ PASS ] -- perf_trace :: test_basic :: systemwide
-- [ PASS ] -- perf_trace :: test_basic :: full timestamp
-- [ PASS ] -- perf_trace :: test_basic :: summary
Regexp not found: "sleep"
Regexp not found: "close"
Regexp not found: "exit"
-- [ FAIL ] -- perf_trace :: test_basic :: attach process (command exitcode + output regexp parsing)
## [ FAIL ] ## perf_trace :: test_basic SUMMARY :: 1 failures found
:: [  BEGIN   ] :: Runnning test test_overhead.sh :: actually running './test_overhead.sh > /mnt/tests/tools/perf/Sanity/basic-test-aarch64/LOGS/trace/test_overhead.sh.log'
:: [[0;32m   PASS   [0m] :: Runnning test test_overhead.sh (Expected 0, got 0)
-- [ PASS ] -- perf_trace :: test_overhead :: systemwide
## [ PASS ] ## perf_trace :: test_overhead SUMMARY
:: [  BEGIN   ] :: Runnning test test_record.sh :: actually running './test_record.sh > /mnt/tests/tools/perf/Sanity/basic-test-aarch64/LOGS/trace/test_record.sh.log'
:: [[0;32m   PASS   [0m] :: Runnning test test_record.sh (Expected 0, got 0)
-- [ PASS ] -- perf_trace :: test_record :: basic execution - record
## [ PASS ] ## perf_trace :: test_record SUMMARY
:: [  BEGIN   ] :: Running clean-up for trace :: actually running './cleanup.sh'
[1;32m## [ PASS ] ##[m perf_trace :: cleanup SUMMARY
:: [[0;32m   PASS   [0m] :: Running clean-up for trace (Expected 0, got 0)
** perf-trace-test FAIL Score:/var/tmp/beakerlib-43526204/journal.txt
Uploading resultoutputfile.log .done
---<-snip->---

Expected results:
perf trace test completes successfully

Additional info:

Comment 3 Mark Langsdorf 2016-08-02 18:52:05 UTC

Cov,

We've only reproduced this on Hawker. Can you look into it, please?

Comment 4 Christopher Covington 2016-08-02 20:25:11 UTC

Are the actual log files like /mnt/tests/tools/perf/Sanity/basic-test-aarch64/LOGS/trace/test_basic.sh.log available anywhere?

Comment 5 Jeff Bastian 2016-08-02 20:40:17 UTC

Yes, they're in the Beaker job:
https://beaker.engineering.redhat.com/recipes/2900823#task43526204

Click the "+3" link next to taskout.log and you'll see 3 more logs including logs.tar.bz2.  Here's a direct link:
https://beaker.engineering.redhat.com/recipes/2900823/tasks/43526204/logs/logs.tar.bz2

Extract that and you'll find trace/test_basic.sh.log

Comment 6 Jeff Bastian 2016-08-02 20:47:11 UTC

Michael, can you take a look at this perf failure too?  Thanks!

Comment 7 Christopher Covington 2016-08-03 13:44:50 UTC

Manually re-running the task, I get the following basic_attach.log

    0.000 ( 0.000 ms):  ... [continued]: nanosleep()) = 0
     0.047 ( 0.011 ms): close(fd: 1                                                           ) = 0
     0.064 ( 0.008 ms): close(fd: 2                                                           ) = 0
     0.073 ( 0.000 ms): exit_group(

Comment 8 Michael Petlan 2016-08-05 14:35:15 UTC

(In reply to Christopher Covington from comment #7)
> Manually re-running the task, I get the following basic_attach.log
> 
>     0.000 ( 0.000 ms):  ... [continued]: nanosleep()) = 0
>      0.047 ( 0.011 ms): close(fd: 1                                         
> ) = 0
>      0.064 ( 0.008 ms): close(fd: 2                                         
> ) = 0
>      0.073 ( 0.000 ms): exit_group(

The test does not fail with this log, does it.


This test should attach perf trace to a sleep command and check whether all the syscalls are present. The "*sleep" one needs to be there, since it is finished when perf has been already attached. The "exit*" one should appear there and be unfinished. I am not sure whether "close" needs to be there too on all architectures, however, it seems to always be.

Regexp not found: "sleep"
Regexp not found: "close"
Regexp not found: "exit"
-- [ FAIL ] -- perf_trace :: test_basic :: attach process (command exitcode + output regexp parsing)

^^^ This failure means that none of the three was found. Also, it means that perf exited with non-zero exitcode, so maybe the attach failed, or something else happened to it. However, stderr is not redirected anywhere in this particular testcase, so the fact we don't see anything in the logs, is a bit strange. Can you reproduce the issue manually on that box? Just by running:

sleep 10 &
perf trace -p $! -o trace.log

You also might try adding verbosity level by '-vv' option. This is how it looks like on an aarch64 box when it passes, so you may compare:

------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           0x11
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
  read_format                      ID
  disabled                         1
  mmap                             1
  comm                             1
  task                             1
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  { wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 1944  cpu -1  group_fd -1  flags 0x8
------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           0x10
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
  read_format                      ID
  disabled                         1
  sample_id_all                    1
  exclude_guest                    1
  { wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 1944  cpu -1  group_fd -1  flags 0x8
mmap size 589824B
perf event ring buffer mmapped per thread


You may also try to compare perf trace against strace, just attach strace to an already runnining sleep command.

Comment 9 Jeff Bastian 2016-08-08 16:29:45 UTC

(In reply to Michael Petlan from comment #8)
> Can you reproduce the issue manually on that box? Just by running:
> 
> sleep 10 &
> perf trace -p $! -o trace.log

[root@qualcomm-hawker-01 ~]# uname -rn
qualcomm-hawker-01.ml3.eng.bos.redhat.com 4.5.0-3.el7.aarch64

[root@qualcomm-hawker-01 ~]# sleep 10 &
[1] 13943

[root@qualcomm-hawker-01 ~]# perf trace -p $! -o trace.log
[1]+  Done                    sleep 10

[root@qualcomm-hawker-01 ~]# cat trace.log
     0.000 ( 0.000 ms):  ... [continued]: nanosleep()) = 0
     0.130 ( 0.025 ms): close(fd: 1                                                           ) = 0
     0.169 ( 0.018 ms): close(fd: 2                                                           ) = 0
     0.192 ( 0.000 ms): exit_group(                                                           

###########################################################################

[root@qualcomm-hawker-01 ~]# sleep 10 &
[1] 14100

[root@qualcomm-hawker-01 ~]# perf trace -vv -p $! -o trace.log
------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           0x11
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
  read_format                      ID
  disabled                         1
  mmap                             1
  comm                             1
  task                             1
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
  { wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 14100  cpu -1  group_fd -1  flags 0x8
------------------------------------------------------------
perf_event_attr:
  type                             2
  size                             112
  config                           0x10
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|TIME|ID|CPU|PERIOD|RAW
  read_format                      ID
  disabled                         1
  sample_id_all                    1
  exclude_guest                    1
  { wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 14100  cpu -1  group_fd -1  flags 0x8
mmap size 589824B
perf event ring buffer mmapped per thread
[1]+  Done                    sleep 10

[root@qualcomm-hawker-01 ~]# cat trace.log
     0.000 ( 0.000 ms):  ... [continued]: nanosleep()) = 0
     0.113 ( 0.025 ms): close(fd: 1                                                           ) = 0
     0.151 ( 0.017 ms): close(fd: 2                                                           ) = 0
     0.173 ( 0.000 ms): exit_group(                                                           

###########################################################################

[root@qualcomm-hawker-01 ~]# sleep 10 &
[1] 14188

[root@qualcomm-hawker-01 ~]# strace -p $!
Process 14188 attached
restart_syscall(<... resuming interrupted call ...>) = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++
[1]+  Done                    sleep 10

Comment 10 Jeff Bastian 2016-08-08 16:59:19 UTC

The /tools/perf/Sanity/basic-test-aarch64 task runs a much shorter 'sleep 0.1', and when I try that it fails like this:

[root@qualcomm-hawker-01 ~]# sleep 0.1 & perf trace -vv -p $! -o trace.log
[1] 32040
couldn't open /proc/32040/status
[1]+  Done                    sleep 0.1

[root@qualcomm-hawker-01 ~]# cat trace.log
Problems initializing symbol libraries!

Comment 11 Mark Langsdorf 2016-08-08 17:05:48 UTC

I get the same failure with a sleep of 0.1 on x86_64 with Fedora 24.

Comment 12 Jeff Bastian 2016-08-08 19:36:49 UTC

I cloned the /tools/perf/Sanity/basic-test-aarch64 task to a personal git repo and modified the sleep times to be a bit longer:
   -export CMD_BASIC_SLEEP="sleep 0.1"
   -export CMD_QUICK_SLEEP="sleep 0.01"
   -export CMD_LONGER_SLEEP="sleep 2"
   +export CMD_BASIC_SLEEP="sleep 10"
   +export CMD_QUICK_SLEEP="sleep 1"
   +export CMD_LONGER_SLEEP="sleep 20"

See http://git.engineering.redhat.com/git/users/jbastian/perf-tests.git/commit/?id=36e6284a1dc7


This time, the test passed with no reg-exp failures:

[root@qualcomm-hawker-01 trace]# cat test_basic.sh.log
-- [ SKIP ] -- perf_trace :: test_basic :: help message :: testcase skipped
-- [ PASS ] -- perf_trace :: test_basic :: basic execution
-- [ PASS ] -- perf_trace :: test_basic :: duration threshold
-- [ PASS ] -- perf_trace :: test_basic :: systemwide
-- [ PASS ] -- perf_trace :: test_basic :: full timestamp
-- [ PASS ] -- perf_trace :: test_basic :: summary
-- [ PASS ] -- perf_trace :: test_basic :: attach process
## [ PASS ] ## perf_trace :: test_basic SUMMARY


The test took about 15 minutes to run with this modification which is about the same as the original version.
https://beaker.engineering.redhat.com/recipes/2944129#task44133904

Comment 13 Michael Petlan 2016-08-08 22:56:15 UTC

You mean that the test runs for the same time when the sleep commands are 10 times longer, don't you?

AFAIK the test should take less time than 15 minutes... It could be 15 minutes if the full tracepoint check was enabled.

---

However, I would just replace the CMD_BASIC_SLEEP by CMD_LONGER_SLEEP in that particular place. I think that should be sufficient. Ideas?

Comment 14 Michael Petlan 2016-08-09 00:27:24 UTC

A strange thing is that the "sleep 0.1" was always OK and it has stopped working quite recently. That could point on a performance flaw somewhere.

Comment 15 Jeff Bastian 2016-08-09 13:52:26 UTC

Oh, I'm sorry, I was reading the timestamps incorrectly yesterday.  It took 5 minutes in the original version, but 15 minutes with the longer sleep times.

Original:  https://beaker.engineering.redhat.com/recipes/2900811#task43525827
   Start:  +05:27:08
     End:  +05:32:22
 Elapsed:  5m 14s

Modified:  https://beaker.engineering.redhat.com/recipes/2944129#task44133904
   Start:  +00:37:40
     End:  +00:52:19
 Elapsed:  14m 39s


I will try again with CMD_LONGER_SLEEP instead of the 10x longer sleeps.

Comment 16 Jeff Bastian 2016-08-09 16:32:31 UTC

I tried it again with CMD_LONGER_SLEEP (and original sleep values) and it took 10 minutes to run:

     Task:  https://beaker.engineering.redhat.com/recipes/2948217#task44168770
    Start:  +00:33:20
      End:  +00:43:12
  Elapsed:  9m 52s

Comment 17 Jeff Bastian 2016-08-09 16:33:03 UTC

(In reply to Jeff Bastian from comment #16)
> I tried it again with CMD_LONGER_SLEEP (and original sleep values) and it
> took 10 minutes to run:

Oh, and the most important thing: it was successful!

Comment 18 Jeff Bastian 2016-08-09 16:37:38 UTC

Instead of attaching to a background process, why not launch the process with perf?  This works much better:

[root@qualcomm-hawker-02 ~]# perf trace -o trace.log sleep 0.1
[root@qualcomm-hawker-02 ~]# cat trace.log
     0.790 ( 0.026 ms): sleep/32174 brk(                                                                  ) = 0x16a20000
     0.956 ( 0.051 ms): sleep/32174 faccessat(dfd: CWD, filename: 0x847ca1a0, mode: 4                     ) = -1 ENOENT No such file or directory
     1.028 ( 0.048 ms): sleep/32174 openat(dfd: CWD, filename: 0x847ca048, flags: CLOEXEC                 ) = 4
     1.064 ( 0.020 ms): sleep/32174 newfstat(fd: 4, statbuf: 0x3ffcf899c10                                ) = 0
     1.138 ( 0.094 ms): sleep/32174  ... [continued]: mmap()) = 0x3ff84780000
     1.171 ( 0.017 ms): sleep/32174 close(fd: 4                                                           ) = 0
     1.336 ( 0.126 ms): sleep/32174 openat(dfd: CWD, filename: 0x847e2228, flags: CLOEXEC                 ) = 4
     1.383 ( 0.031 ms): sleep/32174 read(fd: 4, buf: 0x3ffcf899e00, count: 832                            ) = 832
     1.419 ( 0.018 ms): sleep/32174 newfstat(fd: 4, statbuf: 0x3ffcf899c90                                ) = 0
     1.489 ( 0.051 ms): sleep/32174 mmap(arg0: 0, arg1: 1528816, arg2: 5, arg3: 2050, arg4: 4, arg5: 0    ) = 0x3ff84600000
     1.580 ( 0.076 ms): sleep/32174 mmap(arg0: 4395973869568, arg1: 131072, arg2: 3, arg3: 2066, arg4: 4, arg5: 1376256) = 0x3ff84760000
     1.745 ( 0.023 ms): sleep/32174 close(fd: 4                                                           ) = 0
     1.988 ( 0.065 ms): sleep/32174 mprotect(start: 0x3ff84760000, len: 65536, prot: READ                 ) = 0
     2.069 ( 0.047 ms): sleep/32174 mprotect(start: 0x410000, len: 65536, prot: READ                      ) = 0
     2.147 ( 0.048 ms): sleep/32174 mprotect(start: 0x3ff847d0000, len: 65536, prot: READ                 ) = 0
     2.208 ( 0.046 ms): sleep/32174 munmap(addr: 0x3ff84780000, len: 36257                                ) = 0
     2.658 ( 0.021 ms): sleep/32174 brk(                                                                  ) = 0x16a20000
     2.709 ( 0.038 ms): sleep/32174 brk(brk: 0x16a50000                                                   ) = 0x16a50000
     2.740 ( 0.015 ms): sleep/32174 brk(                                                                  ) = 0x16a50000
     2.850 ( 0.061 ms): sleep/32174 openat(dfd: CWD, filename: 0x84733e10, flags: CLOEXEC                 ) = 4
     2.902 ( 0.022 ms): sleep/32174 newfstat(fd: 4, statbuf: 0x3ff84771848                                ) = 0
     2.961 ( 0.045 ms): sleep/32174 mmap(arg0: 0, arg1: 109675168, arg2: 1, arg3: 2, arg4: 4, arg5: 0     ) = 0x3ff7dd60000
     3.020 ( 0.018 ms): sleep/32174 close(fd: 4                                                           ) = 0
   103.384 (100.170 ms): sleep/32174 nanosleep(rqtp: 0x3ffcf89a928                                         ) = 0
   103.478 ( 0.027 ms): sleep/32174 close(fd: 1                                                           ) = 0
   103.523 ( 0.021 ms): sleep/32174 close(fd: 2                                                           ) = 0
   103.549 ( 0.000 ms): sleep/32174 exit_group(

Comment 19 Michael Petlan 2016-08-09 17:18:31 UTC

(In reply to Jeff Bastian from comment #18)
> Instead of attaching to a background process, why not launch the process
> with perf? 

Because the particular test tests the perf-trace's ability to attach something already running.

Comment 20 Michael Petlan 2016-08-09 17:36:08 UTC

Testsuite fixed upstream by the following commit [1], which has been also backported to both RHEL versions. This makes the testsuite run longer by 2 seconds, since the CMD_LONGER_SLEEP is used only where it is really necessary. And, the most important thing, it should pass next time... ;-)

I think we might close this bug as NOTABUG, thoughts?


-------------------------------------------------------------

[1] commit f86cc2ad1609723a3ae5fe8419f465be3082516e
Author: Michael Petlan <mpetlan>
Date:   Tue Aug 9 19:22:54 2016 +0200

    base_trace/test_basic.sh: fix the trace attach test to be more robust
    
    When trying to attach to $CMD_BASIC_SLEEP, there may be too long latency,
    so the sleep command does not run or is being finished when perf starts.
    Thus, the testcase often fails on some environments.
    
    That's why we need $CMD_LONGER_SLEEP instead.

diff --git a/base_trace/test_basic.sh b/base_trace/test_basic.sh
index 91a3593..9bcf036 100755
--- a/base_trace/test_basic.sh
+++ b/base_trace/test_basic.sh
@@ -112,7 +112,7 @@ print_results $PERF_EXIT_CODE $CHECK_EXIT_CODE "summary"
 ### attach process
 
 # perf-trace should be able to attach an existing process by '-p PID'
-$CMD_BASIC_SLEEP &
+$CMD_LONGER_SLEEP &
 $CMD_PERF trace -p $! -o $LOGS_DIR/basic_attach.log
 PERF_EXIT_CODE=$?

Comment 21 Jeff Bastian 2016-08-09 17:56:06 UTC

(In reply to Michael Petlan from comment #19)
> (In reply to Jeff Bastian from comment #18)
> > Instead of attaching to a background process, why not launch the process
> > with perf? 
> 
> Because the particular test tests the perf-trace's ability to attach
> something already running.

Oh...  I guess that's a pretty good reason!  Sorry for the stupid question. :)

Comment 22 Jeff Bastian 2016-08-09 17:57:58 UTC

(In reply to Michael Petlan from comment #20)
> I think we might close this bug as NOTABUG, thoughts?

Agreed.  It's been working well for me with the longer sleep time.

We can re-open it if it still occurs with this change.