Bug 1892179 - System crashes at 'utrace_report_syscall_exit+0x81' while running systemtap script
Summary: System crashes at 'utrace_report_syscall_exit+0x81' while running systemtap s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Developer Toolset
Classification: Red Hat
Component: systemtap
Version: DTS 10.1 RHEL 7
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: alpha
: ---
Assignee: Frank Ch. Eigler
QA Contact: Martin Cermak
URL:
Whiteboard:
Depends On:
Blocks: 1898288
TreeView+ depends on / blocked
 
Reported: 2020-10-28 05:57 UTC by Daniel Kwon
Modified: 2024-03-25 16:50 UTC (History)
10 users (show)

Fixed In Version: devtoolset-10-systemtap-4.4-4.el7
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1898288 (view as bug list)
Environment:
Last Closed: 2021-06-03 11:21:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker DTS-69 0 None None None 2022-01-05 11:55:59 UTC
Red Hat Product Errata RHBA-2021:2223 0 None None None 2021-06-03 11:21:45 UTC

Comment 4 Frank Ch. Eigler 2020-11-10 19:13:16 UTC
Hi, thanks for the report.  Yeah from inspecting stp_utrace.c, there is an opportunity to use a NULL pointer, which we will fix upstream, and hand-patching the appropriate runtime/... file should get you going again.

One thing in the mean time, try:

% stap -DUTRACE_TASK_WORK_POOL_SIZE=nnnn   for a nnnn number much larger than 288 (the default in runtime/stp_utrace.c).

Comment 5 Frank Ch. Eigler 2020-11-11 03:18:07 UTC
I believe this patch should fix the particular problems you encountered,
though the -DUTRACE_TASK_WORK_POOL_SIZE=nnnn number should still be used
to avoid the resource exhaustion.  Can you describe your workload, to help
us tune this parameter better?

https://sourceware.org/git/?p=systemtap.git;a=commitdiff;h=34e62f15da5adf06361ac66489936d0ffa1cc430

It's unlikely we can ship a RHEL7 baseos fix.  But please try applying the
patch to your copy of the runtime sources (under /usr/share/systemtap/runtime).

Comment 8 Frank Ch. Eigler 2020-11-12 22:52:39 UTC
Thanks for the test report, so indeed the crash is fixed.  I don't have a good theory why it should run out of resources on your workload (with which I'm unfamiliar).  How high a number for UTRACE_TASK_WORK_POOL_SIZE did you need?  For example, I estimate =1200 would increase runtime memory consumption by about 192 kB, which is small enough to be a default perhaps.

Comment 9 Daniel Kwon 2020-11-12 23:01:21 UTC
I also couldn't see any specific reason why only this customer's systems are showing this issue. One thing I could see is those systems are having 'intel_fpga*' modules, but might not be related to this.

Anyways, the customer had done test with size 1500. I can ask to try with 1200, but could we go with 1500 if that value is Okay with you as it is already tested on the customer system and bigger than 1200. I reckon the usage might be around 240kB?

Comment 11 Frank Ch. Eigler 2020-11-13 16:45:19 UTC
> In general, what is using UTRACE_TASK_WORK_POOL?

This is related to a part of the systemtap runtime that tracks thread lifecycle and system call events.  The code must do certain work in a process context that cannot be done immediately, so is temporarily deferred.  If the rate of those events is very high, so many that more system calls / thread lifecycle events occur than completions of those events, these queues can get exhausted.

We can bump up the default in the code somewhat, and users are welcome to use the -D flag to find a value that works well for their workload.  (We cannot raise it super high, because that uses up some static kernel memory, and we can't dynamically allocate from that context.)

Comment 12 Frank Ch. Eigler 2020-11-16 14:41:45 UTC
With the lifecycle stage of RHEL7, it is unlikely that we can fix this in base RHEL7.  Reassigning to DTS.

Comment 15 Frank Ch. Eigler 2021-01-29 18:05:52 UTC
For the new devtoolset 10.1 systemtap build, this fix was backported from gcc-toolset-10 via bug #1898288.

Comment 20 errata-xmlrpc 2021-06-03 11:21:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (devtoolset-10-systemtap bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2223


Note You need to log in before you can comment on or make changes to this bug.