Bug 2040380 - libffi: Ruby fiddle corrupts dynamically allocated closures with call to fork
Summary: libffi: Ruby fiddle corrupts dynamically allocated closures with call to fork
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: libffi
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: DJ Delorie
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 2123772
Blocks: 1990553
TreeView+ depends on / blocked
 
Reported: 2022-01-13 15:42 UTC by Vít Ondruch
Modified: 2023-07-11 13:14 UTC (History)
11 users (show)

Fixed In Version: libffi-3.4.4-3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-11 13:14:16 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ruby fiddle issues 102 0 None open Ruby test suite failures with libffi-3.4.2 2022-01-13 15:50:04 UTC

Description Vít Ondruch 2022-01-13 15:42:14 UTC
Description of problem:
Since libffi-3.4.2 landed in Fedora, the Ruby test suite fails on several occasions:

~~~
... snip ...

  1) Failure:
TestAutoload#test_autoload_fork [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_autoload.rb:380]:
[ruby-core:86410] [Bug #14634].
Expected #<Test::Unit::AssertionFailedError: Expected #<Process::Status: pid 3249430 SIGABRT (signal 6) (core dumped)> to be success?.> to be nil.
  2) Failure:
TestAutoload#test_autoload_fork [/builddir/build/BUILD/ruby-3.1.0/tool/lib/zombie_hunter.rb:6]:
Expected [[3249431, #<Process::Status: pid 3249431 SIGABRT (signal 6) (core dumped)>]] to be empty.
  3) Failure:
TestRand#test_fork_shuffle [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_rand.rb:276]:
#<Process::Status: pid 3249809 SIGABRT (signal 6) (core dumped)>.
Expected #<Process::Status: pid 3249809 SIGABRT (signal 6) (core dumped)> to be success?.
  4) Failure:
TestRand#test_rand_reseed_on_fork [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_rand.rb:306]:
[ruby-core:41209]
pid 3249817 killed by SIGABRT (signal 6) (core dumped)
  5) Failure:
TestIO#test_copy_stream_socket7 [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_io.rb:995]:
Expected #<Process::Status: pid 3251775 SIGABRT (signal 6) (core dumped)> to be success?.
  6) Failure:
JSONGeneratorTest#test_broken_bignum [/builddir/build/BUILD/ruby-3.1.0/test/json/json_generator_test.rb:305]:
Failed assertion, no message given.
  7) Failure:
TestBeginEndBlock#test_internal_errinfo_at_exit [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_beginendblock.rb:175]:
Expected #<Process::Status: pid 3252235 SIGABRT (signal 6) (core dumped)> to not be signaled?.
  8) Failure:
TestProcess#test_signals_work_after_exec_fail [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_process.rb:2428]:
Expected #<Process::Status: pid 3252558 SIGABRT (signal 6) (core dumped)> to be success?.
  9) Failure:
TestProcess#test_threading_works_after_exec_fail [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_process.rb:2464]:
Expected #<Process::Status: pid 3252910 SIGABRT (signal 6) (core dumped)> to be success?.
 10) Failure:
TestProcess#test_process_detach [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_process.rb:2348]:
Expected #<Process::Status: pid 3253011 SIGABRT (signal 6) (core dumped)> to be success?.
 11) Failure:
TestThread#test_blocking_mutex_unlocked_on_fork [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_thread.rb:1223]:
[ruby-core:55102] [Bug #8433].
<false> expected but was
<nil>.
 12) Failure:
TestThread#test_fork_in_thread [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_thread.rb:1243]:
[ruby-core:62070] [Bug #9751]
pid 3253221 killed by SIGABRT (signal 6) (core dumped).
Expected #<Process::Status: pid 3253221 SIGABRT (signal 6) (core dumped)> to not be signaled?.
 13) Failure:
TestThread#test_fork_while_locked [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_thread.rb:1254]:
[ruby-core:85940] [Bug #14578].
Expected #<Process::Status: pid 3253241 SIGABRT (signal 6) (core dumped)> to be success?.
 14) Failure:
TestThread#test_fork_while_locked [/builddir/build/BUILD/ruby-3.1.0/tool/lib/zombie_hunter.rb:6]:
Expected [[3253249, #<Process::Status: pid 3253249 SIGABRT (signal 6) (core dumped)>],
 [3253250, #<Process::Status: pid 3253250 SIGABRT (signal 6) (core dumped)>]] to be empty.
Finished tests in 885.270988s, 24.2186 tests/s, 3106.1596 assertions/s.
21440 tests, 2749793 assertions, 14 failures, 0 errors, 56 skips
ruby -v: ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]
make: *** [uncommon.mk:822: yes-test-all] Aborted (core dumped)
~~~


Version-Release number of selected component (if applicable):
$ rpm -q libffi
libffi-3.4.2-6.fc36.x86_64

Ruby 3.1.0
Ruby 3.0.0

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
Ruby test suite is failing


Expected results:
Ruby test suite is passing


Additional info:



[1] https://koschei.fedoraproject.org/package/ruby?collection=f36

Comment 1 Vít Ondruch 2022-01-13 15:46:58 UTC
So far, I was able to reproduce the test case to this shape:

~~~
$ gdb --args ./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems "./test/runner.rb" --ruby="./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems" --excludes-dir=./test/excludes --name='!/memory_leak/'  test/fiddle/test_import.rb test/ruby/test_autoload.rb -v -n '/TestAutoload#test_autoload_fork/'
GNU gdb (GDB) Fedora 11.1-6.fc36
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./miniruby...
warning: File "/builddir/build/BUILD/ruby-3.1.0/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
	add-auto-load-safe-path /builddir/build/BUILD/ruby-3.1.0/.gdbinit
line to your configuration file "/builddir/.config/gdb/gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/builddir/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
(gdb) r
Starting program: /builddir/build/BUILD/ruby-3.1.0/miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems ./test/runner.rb --ruby=./miniruby\ -I./lib\ -I.\ -I.ext/common\ \ ./tool/runruby.rb\ --extout=.ext\ \ --\ --disable-gems --excludes-dir=./test/excludes --name=\!/memory_leak/ test/fiddle/test_import.rb test/ruby/test_autoload.rb -v -n /TestAutoload\#test_autoload_fork/
Download failed: No route to host.  Continuing without debug info for /builddir/build/BUILD/ruby-3.1.0/system-supplied DSO at 0x7ffff7fc4000.
Download failed: No route to host.  Continuing without debug info for /lib64/libz.so.1.
Download failed: No route to host.  Continuing without debug info for /lib64/libgmp.so.10.
Download failed: No route to host.  Continuing without debug info for /lib64/libcrypt.so.2.
Download failed: No route to host.  Continuing without debug info for /lib64/libm.so.6.
Download failed: No route to host.  Continuing without debug info for /lib64/libc.so.6.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
process 1987 is executing new program: /builddir/build/BUILD/ruby-3.1.0/ruby
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34.9000-36.fc36.x86_64 gmp-6.2.1-1.fc36.x86_64 libxcrypt-4.4.27-1.fc36.x86_64 zlib-1.2.11-30.fc35.x86_64
Download failed: No route to host.  Continuing without debug info for /lib64/libz.so.1.
Download failed: No route to host.  Continuing without debug info for /lib64/libgmp.so.10.
Download failed: No route to host.  Continuing without debug info for /lib64/libcrypt.so.2.
Download failed: No route to host.  Continuing without debug info for /lib64/libm.so.6.
Download failed: No route to host.  Continuing without debug info for /lib64/libc.so.6.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
11111111111111111111111
Run options: 
  --seed=39139
  "--ruby=./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems"
  --excludes-dir=./test/excludes
  --name=!/memory_leak/
  -v
  -n
  /TestAutoload#test_autoload_fork/

# Running tests:

[Detaching after vfork from child process 2024]
[1/0] TestAutoload#test_autoload_fork[New Thread 0x7ffff4ccf640 (LWP 2025)]
[New Thread 0x7ffff4bae640 (LWP 2026)]
[New Thread 0x7ffff4a8d640 (LWP 2027)]
[New Thread 0x7ffff496c640 (LWP 2028)]
[New Thread 0x7ffff484b640 (LWP 2029)]
[New Thread 0x7ffff472a640 (LWP 2030)]
[Detaching after fork from child process 2031]
[Detaching after fork from child process 2032]
[Detaching after fork from child process 2033]
 = 0.42 s

  1) Failure:
TestAutoload#test_autoload_fork [/builddir/build/BUILD/ruby-3.1.0/test/ruby/test_autoload.rb:380]:
[ruby-core:86410] [Bug #14634].
Expected #<Test::Unit::AssertionFailedError: Expected #<Process::Status: pid 2032 SIGABRT (signal 6) (core dumped)> to be success?.> to be nil.

  2) Failure:
TestAutoload#test_autoload_fork [/builddir/build/BUILD/ruby-3.1.0/tool/lib/zombie_hunter.rb:6]:
Expected [[2033, #<Process::Status: pid 2033 SIGABRT (signal 6) (core dumped)>]] to be empty.

Finished tests in 0.429828s, 2.3265 tests/s, 11.6326 assertions/s.
1 tests, 5 assertions, 2 failures, 0 errors, 0 skips

ruby -v: ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]

Thread 1 "ruby" received signal SIGABRT, Aborted.
0x00007ffff78a764c in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34.9000-36.fc36.x86_64 gmp-6.2.1-1.fc36.x86_64 libxcrypt-4.4.27-1.fc36.x86_64 zlib-1.2.11-30.fc35.x86_64
~~~

Where the I have reduced the test/fiddle/test_import.rb to this form:

~~~
$ cat test/fiddle/test_import.rb
# coding: US-ASCII
# frozen_string_literal: true
begin
  require_relative 'helper'
  require 'fiddle/import'
rescue LoadError
end

module Fiddle
  module LIBC
    extend Importer
    dlload LIBC_SO, LIBM_SO

    CallCallback = bind("void call_callback(void*, void*)"){ | ptr1, ptr2|
#      f = Function.new(ptr1.to_i, [TYPE_VOIDP], TYPE_VOID)
#      f.call(ptr2)
    }
  end


end if defined?(Fiddle)
~~~

So far, I believe that this issue has something to do with `ecec` and `fork`, because the `miniruby` actually sets environment to be possible to execute `ruby`. But omitting this step, executing just Ruby, the issue is not reproducible:

~~~
$ LD_LIBRARY_PATH=. ./ruby -I./lib -I. -I./tool/lib -I.ext/common --disable-gems -rtest/fiddle/test_import.rb -rtest/ruby/test_autoload.rb -e ''  -- -v -n '/TestAutoload#test_autoload_fork/'
Run options: 
  --seed=28080
  -v
  -n
  /TestAutoload#test_autoload_fork/

# Running tests:

[1/0] TestAutoload#test_autoload_fork = 0.31 s
Finished tests in 0.313873s, 3.1860 tests/s, 12.7440 assertions/s.
1 tests, 4 assertions, 0 failures, 0 errors, 0 skips

ruby -v: ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]
~~~

Comment 2 Vít Ondruch 2022-01-13 17:06:11 UTC
Looking around, I have find commits into alternative Ruby ffi library:

https://github.com/ffi/ffi/commit/9f257af19e6088c2986f85dea17455e52a2be405
https://github.com/ffi/ffi/commit/94441aa5f8b694b62f70528011b32c5db3d42dd4

which gets me to:

https://github.com/ffi/ffi/issues/621

and ultimately to issues such as:

https://bugs.python.org/issue25653
https://bugzilla.redhat.com/show_bug.cgi?id=1977410

So is it the same or is it different issue? And actually there already was closure related issue in Ruby:

https://bugzilla.redhat.com/show_bug.cgi?id=1727832

Comment 3 Vít Ondruch 2022-01-13 17:28:37 UTC
I have simplified the test case even further:

~~~
$ LD_LIBRARY_PATH=. gdb --args ./ruby ./tool/runruby.rb -- --disable-gems "./test/runner.rb" test/fiddle/test_import.rb test/ruby/test_beginendblock.rb -v -n '/TestBeginEndBlock#test_internal_errinfo_at_exit/'
GNU gdb (GDB) Fedora 11.1-6.fc36
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./ruby...
warning: File "/builddir/build/BUILD/ruby-3.1.0/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
	add-auto-load-safe-path /builddir/build/BUILD/ruby-3.1.0/.gdbinit
line to your configuration file "/builddir/.config/gdb/gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/builddir/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
(gdb) r
Starting program: /builddir/build/BUILD/ruby-3.1.0/ruby ./tool/runruby.rb -- --disable-gems ./test/runner.rb test/fiddle/test_import.rb test/ruby/test_beginendblock.rb -v -n /TestBeginEndBlock\#test_internal_errinfo_at_exit/
Download failed: No route to host.  Continuing without debug info for /builddir/build/BUILD/ruby-3.1.0/system-supplied DSO at 0x7ffff7fc4000.
Download failed: No route to host.  Continuing without debug info for /lib64/libz.so.1.
Download failed: No route to host.  Continuing without debug info for /lib64/libgmp.so.10.
Download failed: No route to host.  Continuing without debug info for /lib64/libcrypt.so.2.
Download failed: No route to host.  Continuing without debug info for /lib64/libm.so.6.
Download failed: No route to host.  Continuing without debug info for /lib64/libc.so.6.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
`RubyGems' were not loaded.
`error_highlight' was not loaded.
`did_you_mean' was not loaded.
process 2967 is executing new program: /builddir/build/BUILD/ruby-3.1.0/ruby
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34.9000-36.fc36.x86_64 gmp-6.2.1-1.fc36.x86_64 libxcrypt-4.4.27-1.fc36.x86_64 zlib-1.2.11-30.fc35.x86_64
Download failed: No route to host.  Continuing without debug info for /lib64/libz.so.1.
Download failed: No route to host.  Continuing without debug info for /lib64/libgmp.so.10.
Download failed: No route to host.  Continuing without debug info for /lib64/libcrypt.so.2.
Download failed: No route to host.  Continuing without debug info for /lib64/libm.so.6.
Download failed: No route to host.  Continuing without debug info for /lib64/libc.so.6.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Run options: 
  --seed=9286
  -v
  -n
  /TestBeginEndBlock#test_internal_errinfo_at_exit/

# Running tests:

[Detaching after vfork from child process 3004]
[1/0] TestBeginEndBlock#test_internal_errinfo_at_exit[Detaching after fork from child process 3005]
 = 0.00 s
Finished tests in 0.007636s, 130.9668 tests/s, 654.8339 assertions/s.
1 tests, 5 assertions, 0 failures, 0 errors, 0 skips

ruby -v: ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]

Program received signal SIGABRT, Aborted.
0x00007ffff78a764c in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34.9000-36.fc36.x86_64 gmp-6.2.1-1.fc36.x86_64 libxcrypt-4.4.27-1.fc36.x86_64 zlib-1.2.11-30.fc35.x86_64
(gdb) 
~~~

Comment 4 Vít Ondruch 2022-01-13 17:50:30 UTC
And even shorter:

~~~
$ RUBYLIB=.:lib:.ext/common:.ext/x86_64-linux LD_LIBRARY_PATH=. ./ruby -e 'exec "./ruby", *ARGV'  -- --disable-gems "./test/runner.rb" test/fiddle/test_import.rb test/ruby/test_beginendblock.rb -v -n '/TestBeginEndBlock#test_internal_errinfo_at_exit/'
Run options: 
  --seed=24431
  -v
  -n
  /TestBeginEndBlock#test_internal_errinfo_at_exit/

# Running tests:

[1/0] TestBeginEndBlock#test_internal_errinfo_at_exit = 0.00 s
Finished tests in 0.006300s, 158.7318 tests/s, 793.6592 assertions/s.
1 tests, 5 assertions, 0 failures, 0 errors, 0 skips

ruby -v: ruby 3.1.0p0 (2021-12-25 revision fb4df44d16) [x86_64-linux]
Aborted (core dumped)
~~~

Comment 5 Vít Ondruch 2022-01-13 18:16:28 UTC
And here is probably the most minimal from Ruby POV:

~~~
$ cat fiddle_fork.rb 
require 'fiddle/import'

module Fiddle
  module LIBC
    extend Importer
    dlload "libc.so.6", "libm.so.6"

    CallCallback = bind("void call_callback(void*, void*)"){ | ptr1, ptr2| }
  end
end

error, pid, status = IO.pipe do |r, w|
  pid = fork {}
  w.close
  [r.read, *Process.wait2(pid)]
end


$ RUBYLIB=.:lib:.ext/common:.ext/x86_64-linux:tool/lib LD_LIBRARY_PATH=. ./ruby fiddle_fork.rb
Aborted (core dumped)
~~~

Comment 6 Carlos O'Donell 2022-01-13 23:06:20 UTC
(In reply to Vít Ondruch from comment #5)
> And here is probably the most minimal from Ruby POV:
> 
> ~~~
> $ cat fiddle_fork.rb 
> require 'fiddle/import'
> 
> module Fiddle
>   module LIBC
>     extend Importer
>     dlload "libc.so.6", "libm.so.6"
> 
>     CallCallback = bind("void call_callback(void*, void*)"){ | ptr1, ptr2| }
>   end
> end
> 
> error, pid, status = IO.pipe do |r, w|
>   pid = fork {}
>   w.close
>   [r.read, *Process.wait2(pid)]
> end
> 
> 
> $ RUBYLIB=.:lib:.ext/common:.ext/x86_64-linux:tool/lib LD_LIBRARY_PATH=.
> ./ruby fiddle_fork.rb
> Aborted (core dumped)
> ~~~

Thanks for reducing this to the smallest reproducer.

Closures may be inherited by the child, and so the parent and child must coordinate the usage.

This is a known issue for 13+ years, particularly when using SELinux:
https://sourceware.org/legacy-ml/libffi-discuss/2009/msg00320.html

DJ Delorie reviewed this bug and confirmed it's the same issue as the issue you quote.

In modern libffi 3.4.2 we use memfd_create quite eagerly for SELinux protected systems and such closures are shared between the parent and child process.

Currently the static trampolines are disabled due to ghc and gobject-introspection bugs.

I would need to double check if the static trampolines could work across a fork, but they might be one solution.

Comment 7 Vít Ondruch 2022-01-14 09:58:57 UTC
I wonder what are my options, because I need fresh build of Ruby. So

1) Should I just disable the failing test cases for the moment and will this be fixed in near future?
2) Is there some workaround? If it worked with libffi 3.1, there should be a way.

Comment 8 Miro Hrončok 2022-01-14 10:20:03 UTC
> Is there some workaround?

Build against libffi 3.1?


$ fedpkg request-side-tag
$ koji tag f36-build-side-XXXXX libffi-3.1-29.fc35 python3.10-3.10.1-1.fc36 glib2-2.70.2-2.fc36 p11-kit-0.23.22-4.fc35 guile22-2.2.7-3.fc35 wayland-1.20.0-1.fc36 # possibly more depending on Ruby's deps
$ koji wait-repo f36-build-side-XXXXX --build wayland-1.20.0-1.fc36
$ fedpkg build --target=f36-build-side-XXXXX
...
$ koji tag f36-updates-candidate ruby-...fc36

Comment 9 Carlos O'Donell 2022-01-14 17:02:46 UTC
(In reply to Miro Hrončok from comment #8)
> > Is there some workaround?
> 
> Build against libffi 3.1?

That's very temporary though, and it still has the same issue in some cases.

(In reply to Vít Ondruch from comment #7)
> I wonder what are my options, because I need fresh build of Ruby. So
> 
> 1) Should I just disable the failing test cases for the moment and will this
> be fixed in near future?

Yes, if you want a clean testsuite run then disable the failed fork-related test in the short term.

> 2) Is there some workaround? If it worked with libffi 3.1, there should be a
> way.

I have tested libffi 3.4.2 with static trampolines enabled, and rebuilt ruby, all in a mock
chroot with SELinux enabled (F35 host).

sestatus 
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

libffi 3.4.2 with static trampolines scratch build here:
https://koji.fedoraproject.org/koji/taskinfo?taskID=81240960

Ruby build looks clean:
...
[ 5292/21271] TestAutoload#test_autoload_deprecate_constant = 0.04 s
[ 5293/21271] TestAutoload#test_autoload_deprecate_constant_before_autoload = 0.08 s
[ 5294/21271] TestAutoload#test_autoload_fork = 0.35 s
[ 5295/21271] TestAutoload#test_autoload_p = 0.00 s
[ 5296/21271] TestAutoload#test_autoload_private_constant = 0.04 s
[ 5297/21271] TestAutoload#test_autoload_private_constant_before_autoload = 0.08 s
[ 5298/21271] TestAutoload#test_autoload_same_file = 0.12 s
[ 5299/21271] TestAutoload#test_autoload_same_file_with_raise = 0.04 s
[ 5300/21271] TestAutoload#test_autoload_so = 0.01 s
[ 5301/21271] TestAutoload#test_autoload_while_autoloading = 0.00 s
[ 5302/21271] TestAutoload#test_autoload_with_unqualified_file_name = 0.00 s
[ 5303/21271] TestAutoload#test_bug_13526 = 0.04 s
[ 5304/21271] TestAutoload#test_nameerror_when_autoload_did_not_define_the_constant = 0.00 s
[ 5305/21271] TestAutoload#test_non_realpath_in_loadpath = 0.01 s
[ 5306/21271] TestAutoload#test_override_autoload = 0.00 s
[ 5307/21271] TestAutoload#test_override_while_autoloading = 0.50 s
[ 5308/21271] TestAutoload#test_require_explicit = 0.00 s
[ 5309/21271] TestAutoload#test_require_implemented_in_ruby_is_called = 0.00 s
[ 5310/21271] TestAutoload#test_source_location = 0.04 s
[ 5311/21271] TestAutoload#test_threaded_accessing_constant = 0.50 s
[ 5312/21271] TestAutoload#test_threaded_accessing_inner_constant = 0.50 s
...
Finished tests in 645.855723s, 32.9176 tests/s, 4236.6877 assertions/s.
21260 tests, 2736289 assertions, 0 failures, 0 errors, 57 skips

So if we can turn on static trampolines I think we can fix Ruby.

Comment 10 Carlos O'Donell 2022-01-14 17:37:01 UTC
OK, so f36 would need this fix in gobject-introspection:
https://gitlab.gnome.org/GNOME/gobject-introspection/-/merge_requests/301

We don't have a released gobject-introspection with this fix yet, and so it blocks turning on the libffi static trampolines.

OK, so f36 would need this fix in ghc:
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/6155

We don't have a released ghc 8.10.5 or 8.10.7 with this fix backported.

So it looks like gobject-introspection and ghc are going to block me turning on the static trampolines.

I have requested a ghc backport to 8.10.5:
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/6155#note_402130

I have requested a gobject-introspection backport to 1.70.0:
https://gitlab.gnome.org/GNOME/gobject-introspection/-/merge_requests/301#note_1355560

If the upstream backports happen, then we should be able to update Fedora 36, and then turn static trampolines on again.

Comment 11 Vít Ondruch 2022-01-14 18:23:54 UTC
(In reply to Carlos O'Donell from comment #9)
> > 2) Is there some workaround? If it worked with libffi 3.1, there should be a
> > way.
> 
> I have tested libffi 3.4.2 with static trampolines enabled, and rebuilt
> ruby, all in a mock
> chroot with SELinux enabled (F35 host).
> 
> sestatus 
> SELinux status:                 enabled
> SELinuxfs mount:                /sys/fs/selinux
> SELinux root directory:         /etc/selinux
> Loaded policy name:             targeted
> Current mode:                   enforcing
> Mode from config file:          enforcing
> Policy MLS status:              enabled
> Policy deny_unknown status:     allowed
> Memory protection checking:     actual (secure)
> Max kernel policy version:      33
> 
> libffi 3.4.2 with static trampolines scratch build here:
> https://koji.fedoraproject.org/koji/taskinfo?taskID=81240960

I can confirm I passes also in my local mock.

(In reply to Carlos O'Donell from comment #10)
> If the upstream backports happen, then we should be able to update Fedora
> 36, and then turn static trampolines on again.

Both PRs were merged upstream, so we could apply downstream patches, can't we?

Comment 12 Carlos O'Donell 2022-01-14 18:26:43 UTC
(In reply to Vít Ondruch from comment #11)
> Both PRs were merged upstream, so we could apply downstream patches, can't
> we?

That's true, we could. The upstream backports make it easier for our maintainers.

Would you like to file bugs for them in Fedora 36 for gobject-introspection and ghc?

Comment 13 Vít Ondruch 2022-01-27 12:09:29 UTC
BTW I suspect this breaks rubygem-ffi in similar manner:

https://koschei.fedoraproject.org/package/rubygem-ffi
https://koji.fedoraproject.org/koji/taskinfo?taskID=81678260

Comment 14 Ben Cotton 2022-02-08 20:28:25 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 36 development cycle.
Changing version to 36.

Comment 15 Vít Ondruch 2022-02-24 13:50:00 UTC
Ping, any progress here?

Comment 16 Carlos O'Donell 2022-02-24 20:49:04 UTC
(In reply to Vít Ondruch from comment #15)
> Ping, any progress here?

Somewhat. It appears a potential solution is for ghc to bundle the older libffi until we move Fedora onto ghc 9.0/9.2 which has the fix.

So next steps could be:
- Work with Jens (ghc maintainer) to bundle libffi3.1
- Once ghc is uploaded with bundled libffi3.1 then turn static-trampolines on for libffi 3.4.2.

Then we have a more robust libffi and ruby is fixed.

Eventually we would upgrade to ghc 9.0/9.2 and the upstream fix is incorporated.

Thoughts?

Comment 17 Miro Hrončok 2022-02-24 22:37:24 UTC
Why bundle the old libffi when we have it packaged in libffi3.1?

Comment 18 Carlos O'Donell 2022-03-22 20:33:34 UTC
(In reply to Miro Hrončok from comment #17)
> Why bundle the old libffi when we have it packaged in libffi3.1?

The packaged libffi3.1 is there only for compatibility purposes and cannot be linked against.

This is on purpose to encourage package migration to the new libffi.

I don't want to enable libffi3.1 for use in linking new applications.

I think it is technically simpler if ghc bundles a libffi that works, and eventually removes it when needed.

This allows us to drop libffi3.1 as a distinct step without coordination with ghc.

It's not optimal, but avoids confusion if we enable liffi3.1 to be used to link new applications.

Comment 19 Carlos O'Donell 2022-05-31 13:46:54 UTC
Given the GHC compiler in Fedora 36 is not easily fixable, we aren't going to fix this in Fedora 36.

In Fedora 37 I expect the new GHC compiler will have all the fixes we need to turn on static trampolines.

I'm moving this bug Fedora 37 (Rawhide) for tracking here.

Comment 20 Ben Cotton 2022-08-09 13:12:28 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 37 development cycle.
Changing version to 37.

Comment 21 Jarek Prokop 2022-08-29 16:54:31 UTC
Ping, any progress?

Comment 22 Carlos O'Donell 2022-08-29 17:40:11 UTC
(In reply to Jarek Prokop from comment #21)
> Ping, any progress?

No. The status is that Fedora 37 will still have GHC8:
https://fedoraproject.org/wiki/Changes/Haskell_GHC_8.10.7

One possible solution is that GHC8 bundles the non-static-trampoline libffi, GHC9 uses the system libffi, and we change the libffi default.

I haven't yet followed up with Jens to see about having GHC8 bundle libffi and if there are any technical consequences to that.

Comment 23 Jun Aruga 2022-09-01 15:19:35 UTC
Why does this issue started to appear in Ruby from libffi-3.4.2? And the previous version in rpms/libffi was 3.1.[1]
So, is there the caused commit between the tag v3.1 and v3.4.2?[2] Why don't you apply the patch to revert the upstream commit to the Fedora rpms/libffi?

```
$ git clone https://github.com/libffi/libffi.git
$ cd libffi
$ git log v3.1..v3.4.2
```

[1] https://src.fedoraproject.org/rpms/libffi/blob/rawhide/f/libffi.spec#_146
[2] https://github.com/libffi/libffi.git

Comment 24 Carlos O'Donell 2022-09-01 15:38:58 UTC
(In reply to Jun Aruga from comment #23)
> Why does this issue started to appear in Ruby from libffi-3.4.2? And the
> previous version in rpms/libffi was 3.1.[1]
> So, is there the caused commit between the tag v3.1 and v3.4.2?[2] Why don't
> you apply the patch to revert the upstream commit to the Fedora rpms/libffi?

The "issue" has always been present for 13+ years, it is present in v3.1 also.

The solution is to switch to "static trampolines", but switching without
breaking ghc requires some additional work in ghc.

The new libffi is an ABI change, so it cannot be "reverted" without a mass
rebuild of all dependent binaries.

Notes:
- Change Request: https://fedoraproject.org/wiki/Changes/LIBFFI34
- I say "issue" in quotes because upstream considers inherited closures in
  the child process to be the responsibility of the parent process to
  coordinate. Few applications did this coordination so the solution is to
  make the closures more robust and avoid the issue altogether by using
  static trampolines. This is a high quality approach IMO because it makes
  the closures easier to use.

Comment 25 Vít Ondruch 2022-09-02 15:06:58 UTC
(In reply to Carlos O'Donell from comment #24)
> The solution is to switch to "static trampolines", but switching without
> breaking ghc requires some additional work in ghc.

I have reported bug 2123772 against GHC, requesting to bundle libffi.

Comment 26 Fedora Update System 2022-09-02 16:58:32 UTC
FEDORA-2022-f04a3c5e20 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2022-f04a3c5e20

Comment 27 Fedora Update System 2022-09-02 17:00:50 UTC
FEDORA-2022-f04a3c5e20 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 28 Jarek Prokop 2022-09-02 17:03:12 UTC
It is not fixed, it was just commenting out more cases related to FFI closures, hopefully resulting in more stable test suite.

Bodhi understands either "Resolves: ..." in changelog, or simply "rhbz#1234", I forgot to take the latter into account :/

Sorry for the noise.

Comment 29 Carlos O'Donell 2022-10-25 13:52:33 UTC
Reviewed bug 2123772. We are waiting for GHC8 to bundle libffi so we can change the system libffi to static trampolines.

Comment 31 Ben Cotton 2023-02-07 14:52:45 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 38 development cycle.
Changing version to 38.

Comment 32 Carlos O'Donell 2023-02-28 14:22:28 UTC
This is now fixed in ghc8.10 for Fedora 38, where libffi is bundled.

We are going to fix this in Fedora Rawhide first and test things out.

This is ready to move forward to fix the ruby issues.

Comment 33 Carlos O'Donell 2023-05-09 13:11:22 UTC
We are in the process of transitioning to libffi using static trampolines to fix this.

DJ is doing the transition and found that cjs broke, but has just been fixed (thank you to the maintainers for the partial upstream rebase).

We are tracking this in the following system-wide change request:
https://fedoraproject.org/wiki/Changes/LIBFFI34_static_trampolines

Fedora Devel proposal:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/7PPI2WF3M5WJA6IKTVYN3U2I4GTCXQMI/#QAV5CBDW25EKFU4QFBYPW7IA6FQZPI2O

Comment 34 Carlos O'Donell 2023-07-11 13:14:16 UTC
This is now fixed with static trampolines:
https://bodhi.fedoraproject.org/updates/FEDORA-2023-b4549bce25


Note You need to log in before you can comment on or make changes to this bug.