malloc/tst-malloc-stats-cancellation hangs when I use the following configure line:
../configure CFLAGS="-v -w -g -O2 -iplugindir=/usr/lib/gcc/armv7hl-redhat-linux-gnueabi/11/plugin -fplugin=annobin" --prefix=/usr --with-nonshared-cflags="-fplugin=annobin -fplugin-arg-annobin-disable" --disable-werror
...but not when I use this:
../configure CFLAGS="-v -w -g -O2" --prefix=/usr --disable-werror
I'm not sure how the hang is related to annobin, but: a child thread is cancelled but the cancellation does not occur cleanly: a lock on stderr is not released; and the parent tries to acquire the lock after the child's cancellation, ending up waiting on it until the test times out.
In theory annobin should no affect on the execution of any binary to which it has been applied. The plugin just creates a non-loadable note section and some extra symbols in the symbol table. In practice those extra symbols can sometimes be problematical, and maybe this is the case in this particular scenario.
Without knowing more about why the lock is being held, it is hard to say any more. But a possible place to look is any ARM specific code in the thread library. In particular is there any code that scans the symbol table of ARM binaries, possibly looking for function symbols or the like ?
ARM EABI uses non-DWARF exception handling. Perhaps that's why it's disturbed by annobin data and the extra symbols?
If it is the annobin symbols that are causing a problem, then you *might* be able to make the test work by stripping them out. For example:
objcopy --strip-unneeded a.out a.stripped.
Of course this might also break the ARM unwinder by removing symbols that it needs, so no guarantees that it won't make things worse...
Proposed as a Blocker for 35-beta by Fedora user pbrobinson using the blocker tracking app because:
This is actually a mass rebuild blocker but we don't have the ability to add that so adding it here so it's tracked somewhere.
This may be fixed by annobin-9.72-1.fc35. Arjun - please can you check ?
(In reply to Nick Clifton from comment #5)
> This may be fixed by annobin-9.72-1.fc35. Arjun - please can you check ?
Thanks, Nick! I'm on it.
"This is actually a mass rebuild blocker but we don't have the ability to add that so adding it here so it's tracked somewhere."
That's what the prioritized bug tracker is for:
Given your recent results, I think that were actually two problems:
1. The hang in pthread cancellation. This I think was not caused
by the annobin problem (below) but rather something else. A
recent commit to the glibc sources appears to have fixed the
problem, even if annobin is used when compiling the sources.
2. When a relocatable link is performed on ARM object files that
have been annotated by the annobin plugin, the resulting
unwind information is corrupt. I think that this has been
fixed in the annobin-9.72-1.fc35 build.
Do you agree ? If so, then I think that we can close this BZ. If 1)
is true but 2) is not, then it would be better to open a separate BZ
for it. But if 1) is false, then more investigation is needed,
although I am not sure where.
So, I tested with "-Wl,--force-group-allocation" for libc_pic.os and
that seems to remove the hang. i.e.:
* Without the option but with annobin turned on: it hangs
* With the option and with annobin turned on: it does not hang
Note that this is at a glibc commit that was already hanging.
What we know now:
1. A hang started occuring at glibc commit "C1" (say).
2. Any *one* of three events appear to remove the hang:
* turning off annobin
* building libc_pic.os with --force-group-allocation
* fast-forwarding glibc to commit "C2"
Does this pinpoint any more about where bug #1 might lie?
> Does this pinpoint any more about where bug #1 might lie?
Yes - I think that it is safe to say that there is a latent problem with ARM unwind information and annobin annotated code. Commit C1 exposed this problem, (which presumably has existed for a long time, but is only now coming to light) and commit C2 has hidden it again.
I had really hoped that annobin-9.73 would fix this problem, as it contains ARM specific code to disable the generation of section groups. (I believe annobin's use of section groups to be the underlying cause of the problem).
So back to the drawing board for me I guess.
In today's Prioritized Bugs meeting, we accepted this as a Prioritized Bug.
If anyone has additional input or can do additional testing, please comment.
This bug appears to have been reported against 'rawhide' during the Fedora 35 development cycle.
Changing version to 35.
In today's Prioritized Bugs meeting, we agreed that this bug is no longer a prioritized bug as the mass rebuild seems to have completed successfully without a fix.
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version'
to a later Fedora Linux version.
Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora Linux 35 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13.
Fedora Linux 35 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.
If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.
If you are unable to reopen this bug, please file a new report against an
Thank you for reporting this bug and we are sorry it could not be fixed.