Bug 2168932
Summary: | debugedit not producing file lists for some OCaml packages | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jerry James <loganjerry> | ||||
Component: | ocaml-dune | Assignee: | Andy Li <andy> | ||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 39 | CC: | andy, loganjerry, mjw, rjones | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2024-11-27 21:04:30 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Jerry James
2023-02-10 15:53:02 UTC
Would you be able to attach one of these ocaml generated files? Created attachment 1943383 [details]
Compiled OCaml object
I'm attaching a cmxs file, the OCaml equivalent of a shared library. It ought to refer to source file ocaml_version.ml, unless something has gone wrong with generating debug information in the OCaml toolchain, of course.
Thanks, looking at this file we find: DWARF section [25] '.debug_info' at offset 0x13390: [Offset] Compilation unit at offset 0: Version: 3, Abbreviation section offset: 0, Address size: 8, Offset size: 4 [ b] compile_unit abbrev: 1 stmt_list (data4) 0 low_pc (addr) +0x0000000000006b30 <camlOcaml_version__v_273> high_pc (addr) +0x000000000000b6d9 <camlOcaml_version__code_end> name (strp) "ocaml_version.ml" comp_dir (strp) "/workspace_root" producer (strp) "GNU AS 2.39" language (data2) Mips_Assembler (32769) So here it would be looking for a file/path called /workspace_root/ocaml_version.ml Which I expect doesn't really exist. Similarly the debug_line directory table is empty and so all file names are resolved against the comp_dir: Directory table: File name table: Entry Dir Time Size Name 1 0 0 0 ocaml_version.ml 2 0 0 0 string.ml 3 0 0 0 scanf.ml So what we need to figure out is what this comp_dir "/workspace_root" is and if it is expected. I see "workspace_root" in the ocaml-dune sources. Dune, what have you done?!? Let me poke through the dune sources and see if I can figure out what that is for. Thanks for the hint! (In reply to Jerry James from comment #4) > I see "workspace_root" in the ocaml-dune sources. Dune, what have you > done?!? Let me poke through the dune sources and see if I can figure out > what that is for. Thanks for the hint! Please compare with the fedora 37 version of ocaml. Did it change versions? Please take a look at what eu-readelf --debug-dump=info days for the comp_dir in the debuginfo. For the Fedora 37 package, I get this: $ eu-readelf --debug-dump=info ocaml_version.cmxs-3.5.0-2.fc37.x86_64.debug DWARF section [25] '.debug_info' at offset 0x390: [Offset] Compilation unit at offset 0: Version: 3, Abbreviation section offset: 0, Address size: 8, Offset size: 4 [ b] compile_unit abbrev: 1 stmt_list (data4) 0 low_pc (addr) +0x0000000000006b40 <camlOcaml_version__v_273> high_pc (addr) +0x000000000000af06 <camlOcaml_version__code_end> name (strp) "ocaml_version.ml" comp_dir (strp) "/usr/src/debug/ocaml-version-3.5.0-2.fc37.x86_64/_build/default" producer (strp) "GNU AS 2.38" language (data2) Mips_Assembler (32769) It appears to me that workspace_root is a dune variable that is supposed to expand to the absolute path of the build directory; in a mock build, it would probably be something like /builddir/build/BUILD/ocaml-version-3.6.0/_build/default. My mental timeline was a little off. This did not show up during the mass rebuild, but shortly after. The mass rebuild unfortunately mangled the OCaml stack, so Richard Jones rebuilt all OCaml packages on January 24, which is when this issue first appeared. Regarding the ocaml-version package, I built it on January 20 with no trouble. When Richard Jones built it 4 days later, the build failed as noted above. Differences in packages in the build root between the two builds: ocaml: 4.14.0-2.fc37 vs. 4.14.0-5.fc38 ocaml-dune: 3.6.1-1.fc38 vs. 3.6.1-3.fc38 So the same base versions. It looks like the rebuilt ocaml-dune package is now misbehaving, and failing to expand the workspace_root variable in some circumstances. That sounds fun to track down. In any case, it appears that I unfairly blamed debugedit. This looks like an OCaml toolchain problem after all. (In reply to Jerry James from comment #6) > For the Fedora 37 package, I get this: > > $ eu-readelf --debug-dump=info ocaml_version.cmxs-3.5.0-2.fc37.x86_64.debug > > DWARF section [25] '.debug_info' at offset 0x390: > [Offset] > Compilation unit at offset 0: > Version: 3, Abbreviation section offset: 0, Address size: 8, Offset size: 4 > [ b] compile_unit abbrev: 1 > stmt_list (data4) 0 > low_pc (addr) +0x0000000000006b40 > <camlOcaml_version__v_273> > high_pc (addr) +0x000000000000af06 > <camlOcaml_version__code_end> > name (strp) "ocaml_version.ml" > comp_dir (strp) > "/usr/src/debug/ocaml-version-3.5.0-2.fc37.x86_64/_build/default" > producer (strp) "GNU AS 2.38" > language (data2) Mips_Assembler (32769) > > It appears to me that workspace_root is a dune variable that is supposed to > expand to the absolute path of the build directory; in a mock build, it > would probably be something like > /builddir/build/BUILD/ocaml-version-3.6.0/_build/default. Yes, that is what find-debuginfo and debugedit rely on. > My mental timeline was a little off. This did not show up during the mass > rebuild, but shortly after. The mass rebuild unfortunately mangled the > OCaml stack, so Richard Jones rebuilt all OCaml packages on January 24, > which is when this issue first appeared. > > Regarding the ocaml-version package, I built it on January 20 with no > trouble. When Richard Jones built it 4 days later, the build failed as > noted above. Differences in packages in the build root between the two > builds: > ocaml: 4.14.0-2.fc37 vs. 4.14.0-5.fc38 > ocaml-dune: 3.6.1-1.fc38 vs. 3.6.1-3.fc38 > > So the same base versions. It looks like the rebuilt ocaml-dune package is > now misbehaving, and failing to expand the workspace_root variable in some > circumstances. That sounds fun to track down. > > In any case, it appears that I unfairly blamed debugedit. This looks like > an OCaml toolchain problem after all. No worries. This is a common issue with DWARF debuginfo. It requires absolute paths (which are then rewritten by debuginfo) and has no concept of a relative "working directory". In general it wouldn't really be clear what it refers to since build and source dirs can be separate and sometime "source" is generated and placed in yet another directory like /tmp. Even though something like that (e.g. a workspace_root) would seem to be useful. Lets move this bug to ocaml-dune. But let me know if I can help. I see this in the dune 3.7.0 changelog: Add map_workspace_root dune-project stanza to allow disabling of mapping of workspace root to /workspace_root. (#6988, fixes #6929, @richardlford) That gives me hope this was a dune 3.6 bug and is now fixed. I'll update dune in Rawhide and try rebuilding the affected packages. For anyone following along, the ocaml-dune 3.7.0 build failed on i386 only, with the dreaded "warning: relocation in read-only section `.text'" warning; see https://github.com/ocaml/ocaml/issues/9800. I can inject -Wl,-z,notext into the linker flags as upstream suggests. That leads to a successful link, but then the dune binary segfaults when run. The segfault happens in vendor/spawn/src/spawn_stubs.c line 481, function spawn_unix: pthread_sigmask(SIG_SETMASK, &saved_procmask, NULL); The saved_procmask variable is a local variable. However, GDB shows that the value 9 was passed into pthread_sigmask as the second argument. None of this happens if I build dune-3.6.1 in Rawhide, so something in the dune-3.7.0 sources is triggering this behavior. I'm open to suggestions on what to look for. Argh, no, muscle memory betrayed me. I did an ocaml-dune-3.6.1 build for x86_64. Doing an ocaml-dune-3.6.1 build for i386 shows exactly the same issue. So the issue is somewhere in the toolchain. I'll try some selective downgrades and see if I can figure out what change broke us. (In reply to Jerry James from comment #8) > I see this in the dune 3.7.0 changelog: > > Add map_workspace_root dune-project stanza to allow disabling of mapping of > workspace root to /workspace_root. (#6988, fixes #6929) This seems to refer to: https://github.com/ocaml/dune/pull/6988 https://github.com/ocaml/dune/issues/6929 Which seems to describe exactly what we are seeing. So you have to add a map_workspace_root false to every dune project file for any project build using ocaml-dune (note that if missing, it defaults to true, so not having it will produce garbage debuginfo and break the build). (In reply to Mark Wielaard from comment #11) > So you have to add a map_workspace_root false to every dune project file for > any project build using ocaml-dune (note that if missing, it defaults to > true, so not having it will produce garbage debuginfo and break the build). I think anyone doing serious OCaml development work will use opam. The Fedora RPMs are, I believe, intended to support building RPMs of user-facing applications that are written in OCaml. If that is true, it would make sense for us to simply change the dune default, and document the change of course. (In reply to Jerry James from comment #10) > Argh, no, muscle memory betrayed me. I did an ocaml-dune-3.6.1 build for > x86_64. Doing an ocaml-dune-3.6.1 build for i386 shows exactly the same > issue. So the issue is somewhere in the toolchain. I'll try some selective > downgrades and see if I can figure out what change broke us. Downgrading gcc from 13.0.1-0.4 to 13.0.1-0.3 makes the i386 build work again, so I believe an i386-specific bug has been introduced in 0.4. Forcing the optimization level down to -O1 (with sed -i 's/"-c"; "-g"/&; "-ccopt"; "-O1"/' boot/duneboot.ml) also makes the i386 build work again. Tomorrow I will try to distill the code down to a small reproducer and file a bug. Richard, I am adding you on CC so you are aware of the source of the debuginfo problem you encountered when you last built all of the OCaml packages. What is your opinion on the question of changing the dune default to produce good debuginfo? I'm using Fedora packages, not opam for development. opam is quite unsuitable for making tested, distributable binaries and libraries because it rebuilds everything in the home directory. I'm not sure I follow the purpose behind the dune change, but I have asked a question on https://github.com/ocaml/dune/issues/6929 Okay, I won't take any action until we have a solution that works for you. In the meantime, I have discovered that simply adding -fno-inline to the CFLAGS is enough to avoid the i386 segfault. Next I'll try to find a small C reproducer for submitting to the GCC developers. I am running out of time to work on this today. So far my efforts to produce a small reproducer have failed. I filed bug 2171888 anyway in hopes that the GCC maintainers will have more insight into the problem than I have. The GCC problems noted above have been fixed. I am going to add a patch to change the map_workspace_root default to false so that we can produce good debuginfo. Projects are still free to set map_workspace_root to true or false as they please; this just changes the default. Note that the default was true only for projects that set their dune lang value to 3.0 or higher, which explains why this issue didn't affect all dune-using packages in Fedora. This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle. Changing version to 39. This message is a reminder that Fedora Linux 39 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora Linux 39 on 2024-11-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a 'version' of '39'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, change the 'version' to a later Fedora Linux version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see it. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora Linux 39 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora Linux, you are encouraged to change the 'version' to a later version prior to this bug being closed. Fedora Linux 39 entered end-of-life (EOL) status on 2024-11-26. Fedora Linux 39 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora Linux please feel free to reopen this bug against that version. Note that the version field may be hidden. Click the "Show advanced fields" button if you do not see the version field. If you are unable to reopen this bug, please file a new report against an active release. Thank you for reporting this bug and we are sorry it could not be fixed. |