Bug 1986206
| Summary: | stap: Ruby stp crashes with "user string copy fault" | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Jarek Prokop <jprokop> |
| Component: | systemtap | Assignee: | Frank Ch. Eigler <fche> |
| systemtap sub component: | system-version | QA Contact: | qe-baseos-tools-bugs |
| Status: | CLOSED CANTFIX | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | unspecified | CC: | fche, lberk, lzachar, mcermak, mjw, scox, vondruch |
| Version: | 8.5 | Keywords: | Triaged |
| Target Milestone: | beta | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-08 19:22:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1993504 | ||
|
Comment 1
Frank Ch. Eigler
2021-07-28 01:59:29 UTC
> I have not been able to reproduce this problem on the ruby-2.5.9-107.module*
This happens only with Ruby 3.0.
I have not encountered failure with Ruby 2.5 or Ruby 2.7.
On rhel 8.6 with ruby-3.0.2-140.el8 I noticed bogus values for the cmethod.entry.*set_method_alias lines. e.g. vm_call_cfunc_with_frame ruby.cmethod.entry -> @n?V(0xffffb96901c7e350)::core#set_method_alias /usr/share/rubygems/rubygems/platform.rb:140 (I added the function and classname address to probe output) But building libruby with semaphores turned off I still see what appears to be bogus values vm_call_cfunc_with_frame ruby.cmethod.entry -> Akinori MUSHA(0xffffb96901c7e350)::core#set_method_alias /usr/share/rubygems/rubygems/dependency.rb:207 (Akinori MUSHA as classname) So I'm suspecting that is not a semaphore problem but perhaps gc frees before properly displayed (or other) With kernel-5.13.6-200.fc34 and ruby-3.0.2-149.fc34 I don't see that behavior and see fewer invocations of that probe: vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:422 vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:431 vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:587 vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:ractor>:709 vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:prelude>:9 vm_call_cfunc_with_frame ruby.cmethod.entry -> RubyVM::FrozenCore(0xffffb9dd8294e350)::core#set_method_alias <internal:prelude>:19 (So no smoking gun yet) On rhel 8.5 running kernel-4.18.0-326.el8.kpq1.x86_64 and ruby-3.0.2-140.el8.x86_64 I was unable to reproduce the user string copy fault but occasionally saw unusual values. To display them I built ruby with this vm_insnhelper.patch to display the same values as stap (while admittingly adding to the code flow) Even outside systemtap I sometimes see off values: e.g. with the patch, not run with stap DBG (vm_call_cfunc_with_frame) Module(0x55ad5fa841a0) attr_reader(0x55ad5fb0d158) /usr/share/rubygems/rubygems/errors.rb(0x55ad5fbd63e0) DBG (vm_call_cfunc_with_frame) Module(0x55ad5fa84088) attr_reader(0x55ad5fb0d158) /usr/share/rubygems/rubygems/errors.rb(0x55ad5fbd63e0) => DBG (vm_call_cfunc_with_frame) SIGSEGV 0x55ad5fa4e01e 0x90 0x80 DBG (vm_call_cfunc_with_frame) File(0x55ad5fa7fdf8) expand_path(0x55ad5fad7378) /usr/share/rubygems/rubygems.rb(0x55ad5fba3880) DBG (vm_call_cfunc_with_frame) File(0x55ad5fa7fd58) dirname(0x55ad5fad7198) /usr/share/rubygems/rubygems.rb(0x55ad5fba3880) ... DBG (vm_call_cfunc_with_frame) Module(0x55ad5fbf1bf0) private_class_method(0x55ad5fafd320) /usr/share/rubygems/rubygems/platform.rb(0x55ad5fc58250) DBG (vm_call_cfunc_with_frame) SIGSEGV 0x7f7f5c5eede0 0x7f7f5c098a00 0x7ffc0ae109f0 amd will sometimes see odd values at the same place when run with stap (the gcc options are a random string) DBG (vm_call_cfunc_with_frame) Kernel(0x561be52674f0) freeze(0x561be506f698) /usr/share/rubygems/rubygems/version.rb(0x561be51b91f0) DBG (vm_call_cfunc_with_frame) -Wall -Wextra -Wdeprecated-declarations -Wduplicated-cond -Wimplicit-function-declaration -Wimplicit-int -Wmisleading-indentation...(0x561be506eb80) /usr/share/rubygems/rubygems/version.rb(0x561be51b91f0) DBG (RUBY_DTRACE_METHOD_HOOK) RUBY_DTRACE_CMETHOD_ENTRY -Wall -Wextra -Wdeprecated-declarations -Wduplicated-cond -Wimplicit-function-declaration -Wimplicit-int -Wmisleading-indentation ...(0x561be5096760) core#set_method_alias(0x561be506eb80) /usr/share/rubygems/rubygems/version.rb:167(0x561be51b91f0) Please let us know whether you've been able to make any progress on this ruby issue, and whether we need to keep this bz open, just in case there is a systemtap element too. @Frank I have spent some quality time with DGB and Ruby and these are my findings: https://bugs.ruby-lang.org/issues/18257#note-8 1) I don't know whether this is Ruby issue. 2) I don't know if the Ruby 3.0 issue is different then the Ruby 2.0.0 3) I still don't understand, why the issue is more prominent then it used to be. 4) I'd appreciate some Ruby upstream feedback prior digging into this once again. Thanks for talking to upstream! Much of that discussion is over my head, but the "previously swept by GC" part, if true, sounds like a prerequisite problem to fix. I hope you don't mind me closing this bug, until y'all get the ruby VM into a shape where the inserted printfs before the stap probes all produce correct data. Let's reopen when it passes that test and if the stap problems stay. |