Bug 1195206 - Ruby's TestException#test_machine_stackoverflow fail very often
Summary: Ruby's TestException#test_machine_stackoverflow fail very often
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-23 11:22 UTC by Vít Ondruch
Modified: 2016-11-24 12:06 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-01-04 15:56:17 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ruby 9198 0 None None None Never

Description Vít Ondruch 2015-02-23 11:22:32 UTC
Description of problem:
This quite longstanding problem. I reported this issue upstream [1], however they asked me to ask glibc maintainers about it. Since this seems to happen more often (see the Koschei builds [2], every other fails due to this issue) with glibc 2.21.90 then it used to be, is there any chance to fix this issue in glibc? Or is there something to recommend to Ruby upstream to eliminate this issue?

[1] https://bugs.ruby-lang.org/issues/9198
[2] http://koschei.cloud.fedoraproject.org/package/ruby


Version-Release number of selected component (if applicable):
glibc-2.21.90-3.fc23.1

Comment 1 Carlos O'Donell 2015-02-23 14:45:43 UTC
Can you put together minimal ruby reproducer that will let me debug this?

For example, a ruby file, and a command to run that shows the problem would be perfect. That way we can debug this and see what's going on during the segfault.

Comment 2 Vít Ondruch 2015-02-23 15:20:36 UTC
Sure, this is the test case:

$ ruby --disable-gem -e "h = {a: ->{h[:a].call}}; h[:a].call"

It should not segfault according to upstream. It should be properly handled an raise SystemStackError exception instead. E.g. it works when wrapped in thread:

$ ruby --disable-gem -e "t = Thread.new do; begin h = {a: ->{h[:a].call}}; h[:a].call; rescue => e; puts e; end; end; t.join"
-e:1: stack level too deep (SystemStackError)

Comment 3 Jan Kurik 2015-07-15 14:30:33 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 4 Vít Ondruch 2015-12-21 14:34:17 UTC
ping ... any progress here?

Comment 5 Carlos O'Donell 2015-12-21 15:32:09 UTC
(In reply to Vít Ondruch from comment #4)
> ping ... any progress here?

[carlos@athas ~]$ ruby --disable-gem -e "h = {a: ->{h[:a].call}}; h[:a].call"
-e:1:in `block in <main>': stack level too deep (SystemStackError)
	from -e:1:in `call'
	from -e:1:in `block in <main>'
	from -e:1:in `call'
	from -e:1:in `block in <main>'
	from -e:1:in `call'
	from -e:1:in `block in <main>'
	from -e:1:in `call'
	from -e:1:in `block in <main>'
	 ... 9864 levels...
	from -e:1:in `call'
	from -e:1:in `block in <main>'
	from -e:1:in `call'
	from -e:1:in `<main>'
[carlos@athas ~]$ echo $?
1

[carlos@athas ~]$ ruby --disable-gem -e "h = {a: ->{h[:a].call}}; h[:a].call"
-e:1:in `call': stack level too deep (SystemStackError)
	from -e:1:in `block in <main>'
	from -e:1:in `call'
	from -e:1:in `block in <main>'
	from -e:1:in `call'
	from -e:1:in `block in <main>'
	from -e:1:in `call'
	from -e:1:in `block in <main>'
	from -e:1:in `call'
	 ... 9871 levels...
	from -e:1:in `call'
	from -e:1:in `block in <main>'
	from -e:1:in `call'
	from -e:1:in `<main>'
[carlos@athas ~]$ echo $?
1

This is F23 with glibc-2.22-6.fc23.x86_64

How do I reproduce the segfault?

Comment 6 Vít Ondruch 2015-12-22 08:48:08 UTC
TBH, don't know minimal reproducer :/ But trying to build stable update of Ruby, it failed every time so far, also you can take a look on Koschei and the error happens quite often.

Actually, looking at Koschei results, it might be just issue for 32 platforms, i.e. i386 and arm. I'll try to reproduce it for i386 once more.

Comment 7 Vít Ondruch 2015-12-22 12:06:09 UTC
Actually, I have a bit different reproducer. This test fails:

https://github.com/ruby/ruby/blob/trunk/test/ruby/test_exception.rb#L588

and the similar reproducer I am using from command line is:

bash -c 'set -e; for i in {1..1000}; do echo $i; ruby -e "define_method(:foo) {self.foo}; self.foo" |& grep SystemStackError; done'

This should fire the SystemStackError exception in Ruby and it usually does. But the reliability is not really high. It quite often just segfaults. It is much worser on F24 (glibc-2.22.90-25.fc24.x86_64) then it used to be on F23 (glibc-2.22-6.fc23.x86_64).

Comment 8 Vít Ondruch 2015-12-22 12:11:52 UTC
Interestingly, it seems that upstream gave up on these tests in upcoming Ruby:

https://github.com/ruby/ruby/commit/af775f2b2c841f71e2574ce37e327ab810aaeb78

Comment 9 Carlos O'Donell 2015-12-22 17:33:53 UTC
(In reply to Vít Ondruch from comment #8)
> Interestingly, it seems that upstream gave up on these tests in upcoming
> Ruby:
> 
> https://github.com/ruby/ruby/commit/af775f2b2c841f71e2574ce37e327ab810aaeb78

The test still doesn't fail for me in F23 on a 4 core machine.

The tests may invoke undefined behaviour. I haven't looked deeply enough to determine this, if I can't reproduce it, then I can't look much deeper.

Shall we mark this CLOSED/WORKSFORME for now? You can always reopen if you have further problems? You'd have to just ignore these potentially unreliable tests in the current Fedora releases?

Comment 10 Vít Ondruch 2016-01-04 14:17:27 UTC
It seems there must be some change on ARM since glibc-2.22.90-26.fc24

https://apps.fedoraproject.org/koschei/package/ruby

Comment 11 Carlos O'Donell 2016-01-04 15:56:17 UTC
(In reply to Vít Ondruch from comment #10)
> It seems there must be some change on ARM since glibc-2.22.90-26.fc24
> 
> https://apps.fedoraproject.org/koschei/package/ruby

Diff the two sources and see if there are any changes to files in the `arm` subdirectories? There haven't been that many changes for 32-bit ARM. The toolchain is a complex moving target though, and compiler and linker changes could result in a sufficient change that the problem is no longer triggered but still there.

I'm going to mark this CLOSED/INSUFFICIENT DATA, and if you can dig up what might be the problem we'll reopen it.


Note You need to log in before you can comment on or make changes to this bug.