Bug 1789149

Summary:

redhat-rpm-config changes to fix broken autoconf configure scripts in-place

Product:

[Fedora] Fedora

Reporter:

Jeff Law <law>

Component:

redhat-rpm-config

Assignee:

Florian Festi <ffesti>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

high

Version:

CC:

ajax, ffesti, fweimer, igor.raits, john.j5live, jonathan, j, mcatanza, pmatilai, praiskup, yaneti

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

All

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-07-08 16:14:22 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Patch to have redhat-rpm-config fix broken configure files in place	none

Description Jeff Law 2020-01-08 21:31:14 UTC

Created attachment 1650843 [details]
Patch to have redhat-rpm-config fix broken configure files in place

Description of problem:
This is the final patch to redhat-rpm-config for LTO enablement (and I suspect the most controversial).

One of the things we found when evaluating LTO builds for Fedora was that LTO has a tendency to compromise various autoconf tests.

This can result in packages failing to build because they think the system libc provides some facility that doesn't actually exist.   This can also result in packages that don't behave properly.  Worse yet, some tests do things like check for stack growth direction and if they get it wrong the package will just segfault.

We were quite concerned when we discovered this issue, so we twiddled our package tester to run multiple builds (with and without LTO) and capture the generated config.h files.   Those generated config.h files are then compared and the build is marked as a failure if they differ.  From that we were able to determine all the configure related issues we needed to solve to prevent mis-behavior due to mis-configured packages.  To give you a sense of scale, there are hundreds of packages this affects -- IIRC it was 700+ packages that had these kinds of problems.

--

Some of the packages assumed ancient versions of autoconf and getting them to use modern versions is nontrivial.  Thankfully the most common issues are relatively easy to fix with sed, so for those common issues we have the %configure macro in redhat-rpm-config run sed scripts to fix the configure files in-place.

Note I didn't try to fix every configure issue via sed, only those which showed up commonly.  Many package specific configure issues are going to be fixed in the packages themselves.  I hope that at least some of the sed hacks can be revisited/retired as packages get proper updates or get dropped from Fedora completely.

The most common cases are things like this:

char (*f) = somefunction;

main ()
{
  if (f != somefunction)
    do something;
  exit (0);
}

LTO will realize that "f" never varies and replace the use of "f" with "somefunction" in main.  At that point "f" is dead and it gets removed resulting in something like this:

main ()
{
  if (somefunction != somefunction)
    do something
  exit (0);
}


Then the optimizer realizes that the test is always false and removes it resulting in:

main ()
{
  exit (0);
}


Which compiles successfully -- which is an indication to configure that "somefunction" has a definition -- regardless of whether or not the function really exists in libc.  This code is never run, autoconf is just looking to see if there's a link error or not to determine if the function is available.

The fix is pretty simple.  We insert an __attribute__ ((used)) onto the definition of "f".  That results in the reference to "somefunction" never being removed and we get a link error if "somefunction" isn't defined (which is what we want).  There's a couple different variants of this due to different versions of autoconf in use.

Another case we handle is testing for certain floating point properties (do NaNs signal?).  LTO optimizes away the floating point computations since it knows what the result will be -- which inhibits the SIGFPE we were trying to generate.  Well placed "volatiles" prevent that behavior.  Similarly we inject some volatile qualifiers to inhibit optimization of global values which are modified within signal handlers to detect how they behave (do they need re-arming?).

Another nasty issue is some codes want to test the direction of stack growth.  The routine they use to do that can not be inlined as it depends on comparing addresses of local variables.  So the sed scripts add __attribute__ ((noinline,noclone)) to prevent this optimization.

The last one is the most horrific.  One of the things autoconf sometimes wants to do is interpret the output of NM and turn that back into C code.  Prior to LTO a fairly simple script could do that at the expense of getting the symbol types wrong in various ways (functions in the original turn into integers).  In an LTO world getting the symbol type wrong results in an undesirable compile-time error.  Thankfully autoconf knew how to handle HPUX where symbol types are always important -- so we replace the simplistic version autoconf normally uses with the HPUX variant which is far more accurate.

Finally, we expose a macro with all the sed fixes for use by %configure *and* by packages which call configure directly rather than using %configure.  Packages which call configure directly add a one-line fix to update their configure files 
%{_fix_broken_configure_for_lto}

I do _not_ expect further sed hackery to fix configure scripts.  I've got a pretty good handle on the configure situation at this point.  Instead I expect to remove some of the sed scripts over time as packages get updated or even dropped from Fedora.

This could be installed now as it will not adversely impact non-LTO builds with the exception of triggering doc rebuilds in some packages (docs sometimes depend on the generated configure files) and some packages fail to list their dependency on texinfo.  I'll submit a pull-request momentarily.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jason Tibbitts 2020-01-08 21:57:15 UTC

I think this is quite clever.  I'll leave it up to the true experts to decide whether it's too clever.

One thing that bothers me, though is that we really should avoid performing unexpected deep magic silently.  This does have the capability (however small) to break things, and for many maintainers figuring out just what got broken could potentially be quite difficult.  It seems that it shouldn't be too terribly difficult to save a copy of the original configure file and emit a warning if it was changed.

I'm assuming since this is redhat-rpm-config and not RPM upstream, we don't particularly care about requiring GNU sed.

Comment 2 Jeff Law 2020-01-08 22:05:32 UTC

I'm not sure it's clever -- I do consider it practical though.

WRT saving a copy and possibly emitting a diagnostic.  Sure that seems easy enough.  In retrospect having the before/after would probably have saved me some time.  I suspect any diagnostic we emit would get lost in all the other noise of our builds, but it's easy enough to do.

WRT GNU sed.  Yea, it's not upstream and I'd like to avoid it ever going upstream.  I want to get over this hump to get LTO up, then go back and start removing these hacks one by one.

Comment 3 Jason Tibbitts 2020-01-08 22:39:10 UTC

It's true that the average build generates a bewildering amount of output, but many packagers are sensitive to it and do make note of new warning messages, especially if those are sent to STDERR and prefixed with "* WARNING" (which is what brp-mangle-shebangs and check-rpaths do currently).  Plus it gives us something to grep for.  There's a significant potential for mining build logs even if the maintainers don't notice themselves.  And in three years or whatever, having that output stored away in koji logs will be quite useful in determining whether hacks like this are still needed.

Comment 4 Ben Cotton 2020-02-11 17:36:49 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.

Comment 5 Michael Catanzaro 2020-07-07 12:41:17 UTC

Can this be closed?

Comment 6 Jeff Law 2020-07-08 16:14:22 UTC

Yes.  These changes have been committed and this BZ can be closed.