Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1986206

Summary: stap: Ruby stp crashes with "user string copy fault"
Product: Red Hat Enterprise Linux 8 Reporter: Jarek Prokop <jprokop>
Component: systemtapAssignee: Frank Ch. Eigler <fche>
systemtap sub component: system-version QA Contact: qe-baseos-tools-bugs
Status: CLOSED CANTFIX Docs Contact:
Severity: urgent    
Priority: unspecified CC: fche, lberk, lzachar, mcermak, mjw, scox, vondruch
Version: 8.5Keywords: Triaged
Target Milestone: betaFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-08 19:22:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1993504    

Comment 1 Frank Ch. Eigler 2021-07-28 01:59:29 UTC
I have not been able to reproduce this problem on the ruby-2.5.9-107.module* build, 4.18.0-323 kernel, on my local VM.  Please note that it is possible to encounter errors like that due to transient low memory conditions where parts of a program are not yet paged in.  In such a case, there is no bug to solve.

See also [man error::fault].  This man page gives advice about how to make systemtap scripts more robust in light of such problems.

Comment 2 Jarek Prokop 2021-07-28 15:15:13 UTC
> I have not been able to reproduce this problem on the ruby-2.5.9-107.module*

This happens only with Ruby 3.0.

I have not encountered failure with Ruby 2.5 or Ruby 2.7.

Comment 8 Stan Cox 2021-11-05 20:23:32 UTC
On rhel 8.6 with ruby-3.0.2-140.el8 I noticed bogus values for the cmethod.entry.*set_method_alias lines.  e.g.
 vm_call_cfunc_with_frame ruby.cmethod.entry -> @n?V(0xffffb96901c7e350)::core#set_method_alias /usr/share/rubygems/rubygems/platform.rb:140
(I added the function and classname address to probe output)
But building libruby with semaphores turned off I still see what appears to be bogus values
 vm_call_cfunc_with_frame ruby.cmethod.entry -> Akinori MUSHA(0xffffb96901c7e350)::core#set_method_alias /usr/share/rubygems/rubygems/dependency.rb:207
(Akinori MUSHA as classname)  So I'm suspecting that is not a semaphore problem but perhaps gc frees before properly displayed (or other)

With kernel-5.13.6-200.fc34 and ruby-3.0.2-149.fc34 I don't see that behavior and see fewer invocations of that probe:
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:422
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:431
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:587
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:709
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:prelude>:9
vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:prelude>:19

(So no smoking gun yet)

Comment 9 Stan Cox 2021-11-22 17:24:09 UTC
On rhel 8.5 running kernel-4.18.0-326.el8.kpq1.x86_64 and ruby-3.0.2-140.el8.x86_64 I was unable to reproduce the user string copy fault but occasionally saw unusual values.  To display them I built ruby with this vm_insnhelper.patch to display the same values as stap (while admittingly adding to the code flow)  Even outside systemtap I sometimes see off values: e.g. with the patch, not run with stap

   DBG (vm_call_cfunc_with_frame) Module(0x55ad5fa841a0) attr_reader(0x55ad5fb0d158) /usr/share/rubygems/rubygems/errors.rb(0x55ad5fbd63e0)
   DBG (vm_call_cfunc_with_frame) Module(0x55ad5fa84088) attr_reader(0x55ad5fb0d158) /usr/share/rubygems/rubygems/errors.rb(0x55ad5fbd63e0)
=> DBG (vm_call_cfunc_with_frame) SIGSEGV 0x55ad5fa4e01e 0x90 0x80
   DBG (vm_call_cfunc_with_frame) File(0x55ad5fa7fdf8) expand_path(0x55ad5fad7378) /usr/share/rubygems/rubygems.rb(0x55ad5fba3880)
   DBG (vm_call_cfunc_with_frame) File(0x55ad5fa7fd58) dirname(0x55ad5fad7198) /usr/share/rubygems/rubygems.rb(0x55ad5fba3880)
...
   DBG (vm_call_cfunc_with_frame) Module(0x55ad5fbf1bf0) private_class_method(0x55ad5fafd320) /usr/share/rubygems/rubygems/platform.rb(0x55ad5fc58250)
   DBG (vm_call_cfunc_with_frame) SIGSEGV 0x7f7f5c5eede0 0x7f7f5c098a00 0x7ffc0ae109f0

amd will sometimes see odd values at the same place when run with stap (the gcc options are a random string)
DBG (vm_call_cfunc_with_frame) Kernel(0x561be52674f0) freeze(0x561be506f698) /usr/share/rubygems/rubygems/version.rb(0x561be51b91f0)
DBG (vm_call_cfunc_with_frame) -Wall -Wextra -Wdeprecated-declarations -Wduplicated-cond -Wimplicit-function-declaration -Wimplicit-int -Wmisleading-indentation...(0x561be506eb80) /usr/share/rubygems/rubygems/version.rb(0x561be51b91f0)
DBG (RUBY_DTRACE_METHOD_HOOK) RUBY_DTRACE_CMETHOD_ENTRY -Wall -Wextra -Wdeprecated-declarations -Wduplicated-cond -Wimplicit-function-declaration -Wimplicit-int -Wmisleading-indentation ...(0x561be5096760) core#set_method_alias(0x561be506eb80) /usr/share/rubygems/rubygems/version.rb:167(0x561be51b91f0)

Comment 11 Frank Ch. Eigler 2021-12-07 20:23:28 UTC
Please let us know whether you've been able to make any progress on this ruby issue, and  whether we need to keep this bz open, just in case there is a systemtap element too.

Comment 12 Vít Ondruch 2021-12-08 14:08:14 UTC
@Frank I have spent some quality time with DGB and Ruby and these are my findings:

https://bugs.ruby-lang.org/issues/18257#note-8

1) I don't know whether this is Ruby issue.
2) I don't know if the Ruby 3.0 issue is different then the Ruby 2.0.0
3) I still don't understand, why the issue is more prominent then it used to be.
4) I'd appreciate some Ruby upstream feedback prior digging into this once again.

Comment 13 Frank Ch. Eigler 2021-12-08 19:22:12 UTC
Thanks for talking to upstream!  Much of that discussion is over my head, but the "previously swept by GC" part, if true, sounds like a prerequisite problem to fix.  I hope you don't mind me closing this bug, until y'all get the ruby VM into a shape where the inserted printfs before the stap probes all produce correct data.  Let's reopen when it passes that test and if the stap problems stay.