Bug 2168932

Summary: debugedit not producing file lists for some OCaml packages
Product: [Fedora] Fedora Reporter: Jerry James <loganjerry>
Component: ocaml-duneAssignee: Andy Li <andy>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 39CC: andy, loganjerry, mjw, rjones
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-27 21:04:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Compiled OCaml object none

Description Jerry James 2023-02-10 15:53:02 UTC
Description of problem:
During the mass rebuild, several OCaml packages failed to build like this:

Processing files: ocaml-version-debugsource-3.6.0-2.fc38.x86_64

RPM build errors:
error: Empty %files file /builddir/build/BUILD/ocaml-version-3.6.0/debugsourcefiles.list
    Empty %files file /builddir/build/BUILD/ocaml-version-3.6.0/debugsourcefiles.list

These packages previously built successfully.  It appears that debugedit has started failing to generate file lists.  The affected packages include at least the following:

- ocaml-astring
- ocaml-jane-street-headers
- ocaml-jst-config
- ocaml-odoc
- ocaml-version

Each of them now has something like this at the top to disable the debug package:

%ifnarch %{ocaml_native_compiler}
%global debug_package %{nil}
%endif

If you remove that and try a Rawhide mock build, you will see the failure mentioned.  I just tried again with ocaml-version to see if the bug has been fixed, but it persists.

Version-Release number of selected component (if applicable):
debugedit-5.0-7.fc38.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Enable debug packages for ocaml-version
2. Do a Rawhide mock build of ocaml-version

Actual results:
Debugedit fails to generate a file list.

Expected results:
Debugedit should generate a file list like it does in F37 and earlier.

Additional info:

Comment 1 Mark Wielaard 2023-02-10 17:32:12 UTC
Would you be able to attach one of these ocaml generated files?

Comment 2 Jerry James 2023-02-10 18:50:01 UTC
Created attachment 1943383 [details]
Compiled OCaml object

I'm attaching a cmxs file, the OCaml equivalent of a shared library.  It ought to refer to source file ocaml_version.ml, unless something has gone wrong with generating debug information in the OCaml toolchain, of course.

Comment 3 Mark Wielaard 2023-02-10 22:09:14 UTC
Thanks, looking at this file we find:

DWARF section [25] '.debug_info' at offset 0x13390:
 [Offset]
 Compilation unit at offset 0:
 Version: 3, Abbreviation section offset: 0, Address size: 8, Offset size: 4
 [     b]  compile_unit         abbrev: 1
           stmt_list            (data4) 0
           low_pc               (addr) +0x0000000000006b30 <camlOcaml_version__v_273>
           high_pc              (addr) +0x000000000000b6d9 <camlOcaml_version__code_end>
           name                 (strp) "ocaml_version.ml"
           comp_dir             (strp) "/workspace_root"
           producer             (strp) "GNU AS 2.39"
           language             (data2) Mips_Assembler (32769)

So here it would be looking for a file/path called /workspace_root/ocaml_version.ml
Which I expect doesn't really exist.

Similarly the debug_line directory table is empty and so all file names are resolved against the comp_dir:

Directory table:

File name table:
 Entry Dir   Time      Size      Name
 1     0     0         0         ocaml_version.ml
 2     0     0         0         string.ml
 3     0     0         0         scanf.ml

So what we need to figure out is what this comp_dir "/workspace_root" is and if it is expected.

Comment 4 Jerry James 2023-02-10 22:15:00 UTC
I see "workspace_root" in the ocaml-dune sources.  Dune, what have you done?!?  Let me poke through the dune sources and see if I can figure out what that is for.  Thanks for the hint!

Comment 5 Mark Wielaard 2023-02-10 22:18:53 UTC
(In reply to Jerry James from comment #4)
> I see "workspace_root" in the ocaml-dune sources.  Dune, what have you
> done?!?  Let me poke through the dune sources and see if I can figure out
> what that is for.  Thanks for the hint!

Please compare with the fedora 37 version of ocaml.
Did it change versions?
Please take a look at what eu-readelf --debug-dump=info days for the comp_dir in the debuginfo.

Comment 6 Jerry James 2023-02-10 22:40:11 UTC
For the Fedora 37 package, I get this:

$ eu-readelf --debug-dump=info ocaml_version.cmxs-3.5.0-2.fc37.x86_64.debug

DWARF section [25] '.debug_info' at offset 0x390:
 [Offset]
 Compilation unit at offset 0:
 Version: 3, Abbreviation section offset: 0, Address size: 8, Offset size: 4
 [     b]  compile_unit         abbrev: 1
           stmt_list            (data4) 0
           low_pc               (addr) +0x0000000000006b40 <camlOcaml_version__v_273>
           high_pc              (addr) +0x000000000000af06 <camlOcaml_version__code_end>
           name                 (strp) "ocaml_version.ml"
           comp_dir             (strp) "/usr/src/debug/ocaml-version-3.5.0-2.fc37.x86_64/_build/default"
           producer             (strp) "GNU AS 2.38"
           language             (data2) Mips_Assembler (32769)

It appears to me that workspace_root is a dune variable that is supposed to expand to the absolute path of the build directory; in a mock build, it would probably be something like /builddir/build/BUILD/ocaml-version-3.6.0/_build/default.

My mental timeline was a little off.  This did not show up during the mass rebuild, but shortly after.  The mass rebuild unfortunately mangled the OCaml stack, so Richard Jones rebuilt all OCaml packages on January 24, which is when this issue first appeared.

Regarding the ocaml-version package, I built it on January 20 with no trouble.  When Richard Jones built it 4 days later, the build failed as noted above.  Differences in packages in the build root between the two builds:
ocaml: 4.14.0-2.fc37 vs. 4.14.0-5.fc38
ocaml-dune: 3.6.1-1.fc38 vs. 3.6.1-3.fc38

So the same base versions.  It looks like the rebuilt ocaml-dune package is now misbehaving, and failing to expand the workspace_root variable in some circumstances.  That sounds fun to track down.

In any case, it appears that I unfairly blamed debugedit.  This looks like an OCaml toolchain problem after all.

Comment 7 Mark Wielaard 2023-02-12 13:44:01 UTC
(In reply to Jerry James from comment #6)
> For the Fedora 37 package, I get this:
> 
> $ eu-readelf --debug-dump=info ocaml_version.cmxs-3.5.0-2.fc37.x86_64.debug
> 
> DWARF section [25] '.debug_info' at offset 0x390:
>  [Offset]
>  Compilation unit at offset 0:
>  Version: 3, Abbreviation section offset: 0, Address size: 8, Offset size: 4
>  [     b]  compile_unit         abbrev: 1
>            stmt_list            (data4) 0
>            low_pc               (addr) +0x0000000000006b40
> <camlOcaml_version__v_273>
>            high_pc              (addr) +0x000000000000af06
> <camlOcaml_version__code_end>
>            name                 (strp) "ocaml_version.ml"
>            comp_dir             (strp)
> "/usr/src/debug/ocaml-version-3.5.0-2.fc37.x86_64/_build/default"
>            producer             (strp) "GNU AS 2.38"
>            language             (data2) Mips_Assembler (32769)
> 
> It appears to me that workspace_root is a dune variable that is supposed to
> expand to the absolute path of the build directory; in a mock build, it
> would probably be something like
> /builddir/build/BUILD/ocaml-version-3.6.0/_build/default.

Yes, that is what find-debuginfo and debugedit rely on.

> My mental timeline was a little off.  This did not show up during the mass
> rebuild, but shortly after.  The mass rebuild unfortunately mangled the
> OCaml stack, so Richard Jones rebuilt all OCaml packages on January 24,
> which is when this issue first appeared.
> 
> Regarding the ocaml-version package, I built it on January 20 with no
> trouble.  When Richard Jones built it 4 days later, the build failed as
> noted above.  Differences in packages in the build root between the two
> builds:
> ocaml: 4.14.0-2.fc37 vs. 4.14.0-5.fc38
> ocaml-dune: 3.6.1-1.fc38 vs. 3.6.1-3.fc38
> 
> So the same base versions.  It looks like the rebuilt ocaml-dune package is
> now misbehaving, and failing to expand the workspace_root variable in some
> circumstances.  That sounds fun to track down.
> 
> In any case, it appears that I unfairly blamed debugedit.  This looks like
> an OCaml toolchain problem after all.

No worries. This is a common issue with DWARF debuginfo. It requires absolute paths (which are then rewritten by debuginfo) and has no concept of a relative "working directory".
In general it wouldn't really be clear what it refers to since build and source dirs can be separate and sometime "source" is generated and placed in yet another directory like /tmp.
Even though something like that (e.g. a workspace_root) would seem to be useful.

Lets move this bug to ocaml-dune. But let me know if I can help.

Comment 8 Jerry James 2023-02-18 15:59:01 UTC
I see this in the dune 3.7.0 changelog:

Add map_workspace_root dune-project stanza to allow disabling of mapping of workspace root to /workspace_root. (#6988, fixes #6929, @richardlford)

That gives me hope this was a dune 3.6 bug and is now fixed.  I'll update dune in Rawhide and try rebuilding the affected packages.

Comment 9 Jerry James 2023-02-19 04:03:48 UTC
For anyone following along, the ocaml-dune 3.7.0 build failed on i386 only, with the dreaded "warning: relocation in read-only section `.text'" warning; see https://github.com/ocaml/ocaml/issues/9800.  I can inject -Wl,-z,notext into the linker flags as upstream suggests.  That leads to a successful link, but then the dune binary segfaults when run.  The segfault happens in vendor/spawn/src/spawn_stubs.c line 481, function spawn_unix:

pthread_sigmask(SIG_SETMASK, &saved_procmask, NULL);

The saved_procmask variable is a local variable.  However, GDB shows that the value 9 was passed into pthread_sigmask as the second argument.

None of this happens if I build dune-3.6.1 in Rawhide, so something in the dune-3.7.0 sources is triggering this behavior.  I'm open to suggestions on what to look for.

Comment 10 Jerry James 2023-02-19 04:10:01 UTC
Argh, no, muscle memory betrayed me.  I did an ocaml-dune-3.6.1 build for x86_64.  Doing an ocaml-dune-3.6.1 build for i386 shows exactly the same issue.  So the issue is somewhere in the toolchain.  I'll try some selective downgrades and see if I can figure out what change broke us.

Comment 11 Mark Wielaard 2023-02-19 13:11:07 UTC
(In reply to Jerry James from comment #8)
> I see this in the dune 3.7.0 changelog:
> 
> Add map_workspace_root dune-project stanza to allow disabling of mapping of
> workspace root to /workspace_root. (#6988, fixes #6929)

This seems to refer to:
https://github.com/ocaml/dune/pull/6988
https://github.com/ocaml/dune/issues/6929

Which seems to describe exactly what we are seeing.

So you have to add a map_workspace_root false to every dune project file for any project build using ocaml-dune (note that if missing, it defaults to true, so not having it will produce garbage debuginfo and break the build).

Comment 12 Jerry James 2023-02-20 04:42:43 UTC
(In reply to Mark Wielaard from comment #11)
> So you have to add a map_workspace_root false to every dune project file for
> any project build using ocaml-dune (note that if missing, it defaults to
> true, so not having it will produce garbage debuginfo and break the build).

I think anyone doing serious OCaml development work will use opam. The Fedora RPMs are, I believe, intended to support building RPMs of user-facing applications that are written in OCaml.  If that is true, it would make sense for us to simply change the dune default, and document the change of course.

(In reply to Jerry James from comment #10)
> Argh, no, muscle memory betrayed me.  I did an ocaml-dune-3.6.1 build for
> x86_64.  Doing an ocaml-dune-3.6.1 build for i386 shows exactly the same
> issue.  So the issue is somewhere in the toolchain.  I'll try some selective
> downgrades and see if I can figure out what change broke us.

Downgrading gcc from 13.0.1-0.4 to 13.0.1-0.3 makes the i386 build work again, so I believe an i386-specific bug has been introduced in 0.4.  Forcing the optimization level down to -O1 (with sed -i 's/"-c"; "-g"/&; "-ccopt"; "-O1"/' boot/duneboot.ml) also makes the i386 build work again.  Tomorrow I will try to distill the code down to a small reproducer and file a bug.

Richard, I am adding you on CC so you are aware of the source of the debuginfo problem you encountered when you last built all of the OCaml packages.  What is your opinion on the question of changing the dune default to produce good debuginfo?

Comment 13 Richard W.M. Jones 2023-02-20 08:32:21 UTC
I'm using Fedora packages, not opam for development.  opam is quite
unsuitable for making tested, distributable binaries and libraries
because it rebuilds everything in the home directory.

I'm not sure I follow the purpose behind the dune change, but
I have asked a question on https://github.com/ocaml/dune/issues/6929

Comment 14 Jerry James 2023-02-20 16:24:50 UTC
Okay, I won't take any action until we have a solution that works for you.

In the meantime, I have discovered that simply adding -fno-inline to the CFLAGS is enough to avoid the i386 segfault.  Next I'll try to find a small C reproducer for submitting to the GCC developers.

Comment 15 Jerry James 2023-02-20 18:16:39 UTC
I am running out of time to work on this today.  So far my efforts to produce a small reproducer have failed.  I filed bug 2171888 anyway in hopes that the GCC maintainers will have more insight into the problem than I have.

Comment 16 Jerry James 2023-03-21 16:24:34 UTC
The GCC problems noted above have been fixed.  I am going to add a patch to change the map_workspace_root default to false so that we can produce good debuginfo.  Projects are still free to set map_workspace_root to true or false as they please; this just changes the default.  Note that the default was true only for projects that set their dune lang value to 3.0 or higher, which explains why this issue didn't affect all dune-using packages in Fedora.

Comment 17 Fedora Release Engineering 2023-08-16 07:06:45 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.

Comment 18 Aoife Moloney 2024-11-08 10:46:11 UTC
This message is a reminder that Fedora Linux 39 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 39 on 2024-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '39'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 39 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 19 Aoife Moloney 2024-11-27 21:04:30 UTC
Fedora Linux 39 entered end-of-life (EOL) status on 2024-11-26.

Fedora Linux 39 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.