Bug 2171888

Summary: ocaml-dune segfault on i386 when inlining
Product: [Fedora] Fedora Reporter: Jerry James <loganjerry>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: aoliva, awilliam, dmalcolm, fweimer, jakub, jlaw, jwakely, mcermak, mpolacek, msebor, nickc, rjones, robatino, sipoyare
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Unspecified   
Whiteboard: AcceptedFreezeException
Fixed In Version: gcc-13.0.1-0.5.fc38 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-23 00:45:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2083910, 2083911    

Description Jerry James 2023-02-20 18:15:32 UTC
Description of problem:
The ocaml-dune package started failing to build from source on i386 only when gcc-13.0.1-0.4 landed in the repos (see bug 2168932).  The immediate problem is a segfault inside a call to pthread_sigmask:

pthread_sigmask(SIG_SETMASK, &saved_procmask, NULL);

Although saved_procmask is a local variable, gdb shows that pthread_sigmask receives the number 9 as its second argument.  The first and third arguments are passed correctly.  Downgrading to gcc-13.0.1-0.3 makes the issue go away.  Adding -fno-inline-functions to the build flags also makes the issue go away.

I tried to make a reduced test case to illustrate the problem.  The code in question (vendor/spawn/src/spawn_stubs.c) has calls into the OCaml runtime, mostly for memory handling.  I tried to replace all such calls with generic C library functionality.  The resulting program does not segfault.  I suspect that means that the OCaml C API practice of converting back and forth between pointers and integers, and bit shifting in the process, somehow plays a role.  At this point, I don't know how to produce a small reproducer.

Version-Release number of selected component (if applicable):
gcc-13.0.1-0.4.fc39.i686

How reproducible:
Always

Steps to Reproduce:
1. Build the ocaml-dune package for Rawhide on i386
2.
3.

Actual results:
The bootstrap dune binary segfaults.

Expected results:
No segfaults.

Additional info:

Comment 1 Richard W.M. Jones 2023-02-20 21:15:20 UTC
I can't even get dune to compile on x86-64.  It fails at:

+ /usr/bin/make -O -j24 V=1 VERBOSE=1 release
ocamlc -output-complete-exe -w -24 -g -o .duneboot.exe -I boot unix.cma boot/libs.ml boot/duneboot.ml
./.duneboot.exe
Internal error, please report upstream including the contents of _build/log.
Description:
  ("Unexpected find result", { found = Not_found; lib.name = "pp" })
Raised at Stdune__code_error.raise in file
  "otherlibs/stdune/src/code_error.ml", line 11, characters 30-62
Called from Fiber__scheduler.exec in file "otherlibs/fiber/src/scheduler.ml",
  line 73, characters 8-11
-> required by ("<unnamed>", ())
-> required by ("<unnamed>", ())
-> required by ("<unnamed>", ())
-> required by ("load-dir", In_build_dir "default/src/fsevents/bin")
-> required by ("toplevel", ())

I must not crash.  Uncertainty is the mind-killer. Exceptions are the
little-death that brings total obliteration.  I will fully express my cases. 
Execution will pass over me and through me.  And when it has gone past, I
will unwind the stack along its path.  Where the cases are handled there will
be nothing.  Only I will remain.
make: *** [Makefile:47: release] Error 1

Comment 2 Richard W.M. Jones 2023-02-20 21:19:35 UTC
As for the original bug, I don't think there's an easy way to reproduce it
without a full i686 environment.  In theory this might work:

$ setarch i386 fedpkg local

but in practice it fails because of lack of OCaml dependencies compiled for
i686 like ocaml-pp.

Comment 3 Fedora Update System 2023-02-22 08:43:45 UTC
FEDORA-2023-213a7f0317 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-213a7f0317

Comment 4 Fedora Blocker Bugs Application 2023-02-22 10:05:44 UTC
Proposed as a Blocker for 38-beta by Fedora user jakub using the blocker tracking app because:

 An important wrong-code bug has been introduced in gcc-13.0.1-0.4.fc38 (tagged into stable Feb 17), which results in likely miscompilation of all code that uses vfork.  I'm afraid every day the fixed gcc-13.0.1-0.5.fc38 isn't tagged into f38 means risk of further miscompiled packages.

Comment 5 Jakub Jelinek 2023-02-22 10:08:08 UTC
Proposed as F38 beta blocker.
Details in https://gcc.gnu.org/PR108868 and https://gcc.gnu.org/PR108691

Comment 6 Fedora Blocker Bugs Application 2023-02-22 10:33:03 UTC
Proposed as a Freeze Exception for 38-beta by Fedora user frantisekz using the blocker tracking app because:

 Proposing as a FE, this will have the same effect as Blocker in this case (we have a fix), but makes things wording-wise easier procedurally.

Comment 7 Fedora Update System 2023-02-22 13:32:33 UTC
FEDORA-2023-213a7f0317 has been pushed to the Fedora 38 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-213a7f0317

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 8 Adam Williamson 2023-02-22 17:15:49 UTC
+4 FE in https://pagure.io/fedora-qa/blocker-review/issue/1045 , marking accepted FE.

Comment 9 Fedora Update System 2023-02-23 00:45:23 UTC
FEDORA-2023-213a7f0317 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.