Bug 1986206 - stap: Ruby stp crashes with "user string copy fault"
Summary: stap: Ruby stp crashes with "user string copy fault"
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: systemtap
Version: 8.5
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: beta
: ---
Assignee: Frank Ch. Eigler
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1993504
TreeView+ depends on / blocked
 
Reported: 2021-07-26 23:25 UTC by Jarek Prokop
Modified: 2021-12-16 09:58 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-08 19:22:12 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Comment 1 Frank Ch. Eigler 2021-07-28 01:59:29 UTC
I have not been able to reproduce this problem on the ruby-2.5.9-107.module* build, 4.18.0-323 kernel, on my local VM.  Please note that it is possible to encounter errors like that due to transient low memory conditions where parts of a program are not yet paged in.  In such a case, there is no bug to solve.

See also [man error::fault].  This man page gives advice about how to make systemtap scripts more robust in light of such problems.

Comment 2 Jarek Prokop 2021-07-28 15:15:13 UTC
> I have not been able to reproduce this problem on the ruby-2.5.9-107.module*

This happens only with Ruby 3.0.

I have not encountered failure with Ruby 2.5 or Ruby 2.7.

Comment 8 Stan Cox 2021-11-05 20:23:32 UTC
On rhel 8.6 with ruby-3.0.2-140.el8 I noticed bogus values for the cmethod.entry.*set_method_alias lines.  e.g.
 vm_call_cfunc_with_frame ruby.cmethod.entry -> @n?V(0xffffb96901c7e350)::core#set_method_alias /usr/share/rubygems/rubygems/platform.rb:140
(I added the function and classname address to probe output)
But building libruby with semaphores turned off I still see what appears to be bogus values
 vm_call_cfunc_with_frame ruby.cmethod.entry -> Akinori MUSHA(0xffffb96901c7e350)::core#set_method_alias /usr/share/rubygems/rubygems/dependency.rb:207
(Akinori MUSHA as classname)  So I'm suspecting that is not a semaphore problem but perhaps gc frees before properly displayed (or other)

With kernel-5.13.6-200.fc34 and ruby-3.0.2-149.fc34 I don't see that behavior and see fewer invocations of that probe:
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:422
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:431
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:587
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:709
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:prelude>:9
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:prelude>:19

(So no smoking gun yet)

Comment 9 Stan Cox 2021-11-22 17:24:09 UTC
On rhel 8.5 running kernel-4.18.0-326.el8.kpq1.x86_64 and ruby-3.0.2-140.el8.x86_64 I was unable to reproduce the user string copy fault but occasionally saw unusual values.  To display them I built ruby with this vm_insnhelper.patch to display the same values as stap (while admittingly adding to the code flow)  Even outside systemtap I sometimes see off values: e.g. with the patch, not run with stap

   DBG (vm_call_cfunc_with_frame) Module(0x55ad5fa841a0) attr_reader(0x55ad5fb0d158) /usr/share/rubygems/rubygems/errors.rb(0x55ad5fbd63e0)
   DBG (vm_call_cfunc_with_frame) Module(0x55ad5fa84088) attr_reader(0x55ad5fb0d158) /usr/share/rubygems/rubygems/errors.rb(0x55ad5fbd63e0)
=> DBG (vm_call_cfunc_with_frame) SIGSEGV 0x55ad5fa4e01e 0x90 0x80
   DBG (vm_call_cfunc_with_frame) File(0x55ad5fa7fdf8) expand_path(0x55ad5fad7378) /usr/share/rubygems/rubygems.rb(0x55ad5fba3880)
   DBG (vm_call_cfunc_with_frame) File(0x55ad5fa7fd58) dirname(0x55ad5fad7198) /usr/share/rubygems/rubygems.rb(0x55ad5fba3880)
...
   DBG (vm_call_cfunc_with_frame) Module(0x55ad5fbf1bf0) private_class_method(0x55ad5fafd320) /usr/share/rubygems/rubygems/platform.rb(0x55ad5fc58250)
   DBG (vm_call_cfunc_with_frame) SIGSEGV 0x7f7f5c5eede0 0x7f7f5c098a00 0x7ffc0ae109f0

amd will sometimes see odd values at the same place when run with stap (the gcc options are a random string)
DBG (vm_call_cfunc_with_frame) Kernel(0x561be52674f0) freeze(0x561be506f698) /usr/share/rubygems/rubygems/version.rb(0x561be51b91f0)
DBG (vm_call_cfunc_with_frame) -Wall -Wextra -Wdeprecated-declarations -Wduplicated-cond -Wimplicit-function-declaration -Wimplicit-int -Wmisleading-indentation...(0x561be506eb80) /usr/share/rubygems/rubygems/version.rb(0x561be51b91f0)
DBG (RUBY_DTRACE_METHOD_HOOK) RUBY_DTRACE_CMETHOD_ENTRY -Wall -Wextra -Wdeprecated-declarations -Wduplicated-cond -Wimplicit-function-declaration -Wimplicit-int -Wmisleading-indentation ...(0x561be5096760) core#set_method_alias(0x561be506eb80) /usr/share/rubygems/rubygems/version.rb:167(0x561be51b91f0)

Comment 11 Frank Ch. Eigler 2021-12-07 20:23:28 UTC
Please let us know whether you've been able to make any progress on this ruby issue, and  whether we need to keep this bz open, just in case there is a systemtap element too.

Comment 12 Vít Ondruch 2021-12-08 14:08:14 UTC
@Frank I have spent some quality time with DGB and Ruby and these are my findings:

https://bugs.ruby-lang.org/issues/18257#note-8

1) I don't know whether this is Ruby issue.
2) I don't know if the Ruby 3.0 issue is different then the Ruby 2.0.0
3) I still don't understand, why the issue is more prominent then it used to be.
4) I'd appreciate some Ruby upstream feedback prior digging into this once again.

Comment 13 Frank Ch. Eigler 2021-12-08 19:22:12 UTC
Thanks for talking to upstream!  Much of that discussion is over my head, but the "previously swept by GC" part, if true, sounds like a prerequisite problem to fix.  I hope you don't mind me closing this bug, until y'all get the ruby VM into a shape where the inserted printfs before the stap probes all produce correct data.  Let's reopen when it passes that test and if the stap problems stay.


Note You need to log in before you can comment on or make changes to this bug.