Bug 2231727 - rpm: Use main arch %optflags if subarch flags are missing for --target noarch
Summary: rpm: Use main arch %optflags if subarch flags are missing for --target noarch
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: 39
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Packaging Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL: https://kojipkgs.fedoraproject.org//w...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-13 15:49 UTC by marcdeop
Modified: 2023-09-18 00:15 UTC (History)
18 users (show)

Fixed In Version: rpm-4.18.99-1.fc39
Clone Of:
: 2233093 (view as bug list)
Environment:
Last Closed: 2023-09-18 00:15:56 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Fedora Pagure koji issue 3880 0 None None None 2023-08-21 12:13:43 UTC
Github rpm-software-management rpm pull 2622 0 None open Behave more consistently when target arch %optflags are not defined 2023-08-21 11:22:02 UTC

Description marcdeop 2023-08-13 15:49:32 UTC
While doing chainbuilds with KDE packages I noticed my noarch builds were failing

There seems to be some problem when building noarch packages on x86_64-v4 hosts.

Building packages plasma-workspace-wallpapers and breeze-gtk as examples. Logs:

https://kojipkgs.fedoraproject.org//work/tasks/4578/104324578/build.log
https://kojipkgs.fedoraproject.org//work/tasks/4873/104524873/build.log

I guess it has something to do with the value of CFLAGS (compare to values found in a successful build: https://kojipkgs.fedoraproject.org//work/tasks/591/104500591/build.log )



Reproducible: Sometimes

Steps to Reproduce:
1.Trigger a noarch build on koji (breeze-gtk, plasma-workspace-wallapers and others)
2.Build fails when I notice -march=x86-64-v4 in the logs


Expected Results:  
Successfull build

Comment 1 Michal Domonkos 2023-08-14 10:07:13 UTC
This appears to be fallout from the addition of the new x86-64 architecture levels in 4.19:
https://github.com/rpm-software-management/rpm/pull/2315

The CFLAGS value indeed is a good catch and most likely the reason for this error.  It's set by RPM from its internal optflags value (stored in /usr/lib/rpm/rpmrc) and is actually correct for the (newly detected) x86-64-v4 architecture.  However, on Fedora, we ship an extended variant of optflags for each platform (through redhat-rpm-config, see /usr/lib/rpm/redhat/rpmrc), and as it turns out, those values haven't been adapted for these new arch levels.  Thus, RPM ends up using the plain, built-in value of "-O2 -g -march=x86-64-v4", i.e. the same as it would for any other architecture if redhat-rpm-config wasn't installed.

I'm not yet sure what the actual fix would look like (just copy the optflags line for x86_64_v2, v3 and v4?) but I believe it has to be done redhat-rpm-config, therefore reassigning for now.

Comment 2 Florian Weimer 2023-08-14 10:45:31 UTC
(In reply to Michal Domonkos from comment #1)
> I'm not yet sure what the actual fix would look like (just copy the optflags
> line for x86_64_v2, v3 and v4?) but I believe it has to be done
> redhat-rpm-config, therefore reassigning for now.

I'm not sure how a noarch package is even supposed to obtain correct settings for %build_cflags. But I guess we need to support this somehow. I'm not too thrilled to fix it by adding more optflags entries. The RPM change is supposed to be backwards-compatible.

Comment 3 Neal Gompa 2023-08-14 10:52:46 UTC
The x86_64 subarches isn't the biggest problem. The biggest problem is that all the multiarch macros are resolving incorrectly. Note that everything that's supposed to be /usr/lib64 is resolving to /usr/lib and that breaks CMake and everything else.

This breaks noarch KDE builds because it can't find things like Qt anymore.

Comment 4 Neal Gompa 2023-08-14 10:53:52 UTC
I'm fairly certain this is an RPM bug since we don't even define x86_64-v4 in redhat-rpm-config.

Comment 5 Neal Gompa 2023-08-14 10:55:10 UTC
(And yes, while part of the solution may be defining stuff in redhat-rpm-config, there's still a significant breakage in RPM itself.)

Comment 6 Michal Domonkos 2023-08-14 11:32:35 UTC
Indeed, the improper CFLAGS value may just be the tip of the iceberg.  I suppose it shouldn't affect the configuration step in cmake anyway, which is what's actually failing.

The first warning in the build log happens in /usr/share/ECM/kde-modules/KDEInstallDirsCommon.cmake:39.  Looking at that file, CMAKE_SIZEOF_VOID_P is apparently undefined for some reason.

Comment 7 Michal Domonkos 2023-08-14 12:04:21 UTC
(In reply to Michal Domonkos from comment #6)
> I suppose it shouldn't affect the configuration step in cmake anyway, which is
> what's actually failing.

Which is not entirely true:

CMAKE_SIZEOF_VOID_P
This is set to the size of a pointer on the target machine, and is determined by a try compile. If a 64-bit size is found, then the library search path is modified to look for 64-bit libraries first.

Comment 8 Neal Gompa 2023-08-14 12:09:02 UTC
This issue goes back to all the file paths being broken. It can't do anything meaningful right now.

Comment 9 Michal Domonkos 2023-08-14 12:16:51 UTC
$ grep ^%_lib /usr/lib/rpm/platform/x86_64_v4-linux/macros
%_libexecdir		%{_exec_prefix}/libexec
%_lib			lib64
%_libdir		%{_prefix}/lib64

This looks alright.

Comment 10 Panu Matilainen 2023-08-14 12:19:56 UTC
This is all severely fishy. A noarch package is SUPPOSED TO get /usr/lib as the path (see /usr/lib/rpm/platform/noarch-linux/macros), ie it can't rely on having some arch-specific paths available. So I would *expect* all those %cmake macros to be broken on noarch packages when building with --target=noarch as koji does (IIRC). But that doesn't explain why v4 behaves differently.

Since when has this been happening?

Comment 11 Kalev Lember 2023-08-14 12:33:10 UTC
Could it be that the compile test that cmake executes to determine pointer size fails because the builder hosts don't support x86_64-v4 instruction set?

Comment 12 Michal Domonkos 2023-08-14 12:38:26 UTC
It could, however as Panu just noted in our team chat, it might as well be as simple as the -m64 flag not being used, due to RPM not being able to match the platform in /usr/lib/rpm/redhat/rpmrc for x86_64_v4 (since it's just not defined there).

Comment 13 Michal Domonkos 2023-08-14 12:39:56 UTC
In any case, it seems like the root cause here is that, despite it being a noarch build, cmake in fact needs to determine the pointer size for some reason, and fails to do that, given the wrong (insufficient) CFLAGS, due to the reason mentioned previously.

Comment 14 Michal Domonkos 2023-08-14 14:26:02 UTC
FWIW, this is locally reproducible with:

$ mock --arch x86_64_v4 -r fedora-rawhide-x86_64 plasma-workspace-wallpapers-5.27.7-1.fc40.src.rpm

Comment 15 Michal Domonkos 2023-08-14 18:22:52 UTC
OK, after digging into this further, the root cause really is quite simple and what we suspected at the beginning, which is that CFLAGS is not set properly.

The reason it's causing a cmake failure for these particular packages is that the KDE cmake module (KDEInstallDirsCommon.cmake) uses CMAKE_SIZEOF_VOID_P in order to decide between the /lib vs. /lib64 paths for its own purposes.  CMAKE_SIZEOF_VOID_P is set in cmake by doing a native compilation, that is, it happens even when we're otherwise doing a noarch build.  And for that, CFLAGS and LDFLAGS need to be set properly for that platform, otherwise such kind of misdetection happens.

CFLAGS is initialized from %build_cflags which is initialized from %optflags, which in turn is initialized from the optflags values stored in the platform-specific RPM configuration files.  The thing is, RPM now internally treats the new architecture levels as separate architectures, e.g. x86_64_v4 instead of x86_64, and uses that key to match the respective optflags in the configuration files.

LDFLAGS is initialized from %build_ldflags and does not (seem to) depend on the architecture detected.  This macro is only defined in redhat-rpm-config's macros file.

Now, the newly added x86_64_v* optflags are listed in the stock RPM config file (/usr/lib/rpm/rpmrc), however they're not listed in the config file shipped by redhat-rpm-config (which overrides the former).  Thus, RPM simply ends up using the default optflags for these new levels, which is (in the case of v4):

  -O2 -g -march=x86-64-v4

Whereas the original x86_64 architecture has this in redhat-rpm-config's rpmrc file:

  %{__global_compiler_flags} -m64 %{__cflags_arch_x86_64} -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection %{_frame_pointers_cflags} %{_frame_pointers_cflags_x86_64}

In fact, the actual missing flag from the former example is just -fPIC/-fPIE.  Adding it to the line fixes the build failures for me.  In the redhat-rpm-config version, it gets expanded through a bunch of macros starting with %{__global_compiler_flags}.

What's interesting is that when I remove the redhat-rpm-config's macros file that contains the definition of %build_ldflags, thus rendering LDFLAGS empty, it also fixes the problem.  So I suspect that it's the resulting combination of the plain CFLAGS and empty LDFLAGS that's causing some kind of mismatch somewhere (in GCC?).

Currently, I can see three options to fix this:

1) Revert the patch adding the new arch levels
2) Add new x86_64_v* optflags to redhat-rpm-config
3) Make it so that RPM treats x86_64_v* as x86_64 internally

The first two options seem quite ugly, the last might go against the very idea of the feature and would probably require code changes in rpmrc.c (in the get_x86_64_level() function) which I'm not quite comfortable touching myself.

Any thoughts?

Comment 16 Michal Domonkos 2023-08-14 18:27:07 UTC
Another option would be:

4) Add -fPIE to the stock optflags for the new x86_64 levels, or rather -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 (which is also used by redhat-rpm-config's optflags for x86_64)

Comment 17 Michal Domonkos 2023-08-14 18:30:37 UTC
Argh, got tangled up here a little bit, correction for:

(In reply to Michal Domonkos from comment #15)
> combination of the plain CFLAGS and empty LDFLAGS that's causing some kind

*non-empty* LDFLAGS

Comment 18 Michal Domonkos 2023-08-14 18:31:43 UTC
> 4) Add -fPIE to the stock optflags for the new x86_64 levels, or rather
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 (which is also used by
> redhat-rpm-config's optflags for x86_64)

Hmm, this wouldn't work of course, since it would require redhat-rpm-config...

Comment 19 Panu Matilainen 2023-08-15 07:33:43 UTC
(In reply to Michal Domonkos from comment #14)
> FWIW, this is locally reproducible with:
> 
> $ mock --arch x86_64_v4 -r fedora-rawhide-x86_64
> plasma-workspace-wallpapers-5.27.7-1.fc40.src.rpm

What actually makes this comprehensible is that it equally fails with "--arch noarch", which is what is actually happening on koji. So I think that currently all noarch builds that rely on OPTFLAGS and happen to land on x86_64 are failing on koji, and _v4 specifically has nothing to do with it. It's just that there aren't that many of such packages. Which is good, because technically OPTFLAGS on noarch should be empty (ultimately maybe even an error) because no other value is meaningful.

It seems the rpm-correct solution for such packages would be to drop the BuildArch from the main package, and create the content as a sub-arch specific "BuildArch: noarch" instead. 

But, redhat-rpm-config should nevertheless provide proper configuration for the detected architectures. It defaults to x86_64 but the point of those subarches is to allow building optimized versions, and for those we'd currently supply a very incomplete (even buggy, in Fedora setting) optflags. And once those are added, these packages will start to build again even if technically for the wrong reasons.

Anyhow, the right thing here is to add the new sub-architectures to redhat-rpm-config to support the new architectures, just like there are configs for i386, i486, i586 and so on.

Comment 20 Panu Matilainen 2023-08-15 08:34:37 UTC
On a related note: https://github.com/rpm-software-management/rpm/pull/2615

Comment 21 Fedora Release Engineering 2023-08-16 08:14:20 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.

Comment 22 Florian Weimer 2023-08-21 08:53:21 UTC
Why does rpmbuild select x86_64_v4 as the _arch for --target noarch? It doesn't do that in other cases. What's the benefit of doing this? As we've seen, it creates a maintenance hazard for the rest of the build system because it needs to be taught about new RPM architectures immediately, rather then when the distribution starts using the subarchitectures explicitly.

It's not quite clear why the bug is assigned to me. I can duplicate the flag setting, but it's not clear to me if we'd actually build x86-64-v4 packages for x86_64_v4, or if we'd stick to the official architecture flags instead. I'm leaning towards the latter.

Comment 23 Panu Matilainen 2023-08-21 09:06:29 UTC
It's not just v4, it's really all these new x86_64 subarchitectures. It's all more than a little "rpm internals crazy", but basically since noarch does not define optflags of it's own, --target noarch ends up "inheriting" whatever the host happens to be (and that cannot be changed because it then breaks worse things). And since these new subarches are not overridden in redhat-rpm-config, they get the upstream optflags definitions for those which are nowhere near adequate in the Fedora setting, hence we have this bug.

So we really need to duplicate the optflags for the v2, v3 and v4 subarches, whether we actually build packages for them or not.

Comment 24 Florian Weimer 2023-08-21 09:13:16 UTC
(In reply to Panu Matilainen from comment #23)
> It's not just v4, it's really all these new x86_64 subarchitectures. It's
> all more than a little "rpm internals crazy", but basically since noarch
> does not define optflags of it's own, --target noarch ends up "inheriting"
> whatever the host happens to be (and that cannot be changed because it then
> breaks worse things). And since these new subarches are not overridden in
> redhat-rpm-config, they get the upstream optflags definitions for those
> which are nowhere near adequate in the Fedora setting, hence we have this
> bug.
> 
> So we really need to duplicate the optflags for the v2, v3 and v4 subarches,
> whether we actually build packages for them or not.

Why doesn't RPM use buildarchtranslate to initialize the architecture name when --target noarch is used? Why switch to something that is completely unrelated to what the user has requested?

As I said, this RPM behavior is a maintenance hazard.

Comment 25 Panu Matilainen 2023-08-21 09:35:28 UTC
%_arch gets set to noarch, on --target noarch, this is about optflags only really. It doesn't "switch to" something unrelated, it's more a consequence of *not* switching to anything at all because that anything is not defined. As to why buildarchtranslate doesn't apply is anybody's guess, probably because it's nothing that applies is being explicitly invoked, or something, and so the optflags of the detected arch fall through.

Look, I'm not defending the nutty rpm behavior here, but it's also not new behavior. Just saying that the quickest and the safest way to address this is to just add the definitions for all the detected architectures to redhat-rpm-config.

Comment 26 Panu Matilainen 2023-08-21 10:21:09 UTC
Hmm okay, this would *seem* to do the trick of applying buildarchtranslate to the optflags we get in the case where none are defined for the requested target. The behavior doesn't make a whole lot of sense, but the existing makes even less so...

diff --git a/lib/rpmrc.c b/lib/rpmrc.c
index 8a829709b..0badb57c2 100644
--- a/lib/rpmrc.c
+++ b/lib/rpmrc.c
@@ -1685,6 +1685,10 @@ static void rpmRebuildTargetVars(rpmrcCtx ctx,
  * XXX Make sure that per-arch optflags is initialized correctly.
  */
   { const char *optflags = rpmGetVarArch(ctx, RPMVAR_OPTFLAGS, ca);
+    /* Fall back to current buildarchtranslate'd optflags if not defined */
+    if (optflags == NULL) {
+       optflags = rpmGetVarArch(ctx, RPMVAR_OPTFLAGS, NULL);
+    }
     if (optflags != NULL) {
        rpmPopMacro(NULL, "optflags");
        rpmPushMacro(NULL, "optflags", NULL, optflags, RMIL_RPMRC);

Whether that's the right thing to do in all situations I have no idea.

Comment 27 Panu Matilainen 2023-08-21 11:22:03 UTC
Let's give this a spin on rawhide and see what breaks.

Comment 28 Panu Matilainen 2023-08-21 11:24:14 UTC
Note that I'd still add those _v4 definitions to redhat-rpm-config, if only to let people experiment.

Comment 29 Florian Weimer 2023-08-21 11:36:17 UTC
(In reply to Panu Matilainen from comment #28)
> Note that I'd still add those _v4 definitions to redhat-rpm-config, if only
> to let people experiment.

I'll see what I can do. There's another PR we should merge (the License: update), then I'll look into this.

I don't think we can use -march=x86-64-v4 for building noarch packages. They might contain firmware that has to run on all CPUs. So I'm just going to replicate the flags.

Maybe the real bug here is to build noarch packages with --target noarch, instead of a real architecture?

Comment 30 Panu Matilainen 2023-08-21 11:44:05 UTC
> Maybe the real bug here is to build noarch packages with --target noarch, instead of a real architecture?

That's exactly what koji does. And that's also exactly the crack that this bug falls through, those same noarch packages build fine when --target noarch is NOT explicitly specified.

But, this should be "fixed" in rpm-4.18.92-2.fc40 now: the arch-specific stuff that ends up in %optflags in case of "--target noarch" is the same as without it, that is, with buildarchtranslate applied. So you don't *have* to do anything in redhat-rpm-config. 

> I don't think we can use -march=x86-64-v4 for building noarch packages. They might contain firmware that has to run on all CPUs. So I'm just going to replicate the flags.

With the fix in rpm now, you'll only ever get -v4 by explicitly requesting it, all normal builds fall back to plain x86_64 as per buildarchtranslate.

Comment 31 Florian Weimer 2023-08-21 12:13:43 UTC
(In reply to Panu Matilainen from comment #30)
> > Maybe the real bug here is to build noarch packages with --target noarch, instead of a real architecture?
> 
> That's exactly what koji does. And that's also exactly the crack that this
> bug falls through, those same noarch packages build fine when --target
> noarch is NOT explicitly specified.

I've filed a Koji issue regarding this.

> But, this should be "fixed" in rpm-4.18.92-2.fc40 now: the arch-specific
> stuff that ends up in %optflags in case of "--target noarch" is the same as
> without it, that is, with buildarchtranslate applied. So you don't *have* to
> do anything in redhat-rpm-config. 

Thanks.

> > I don't think we can use -march=x86-64-v4 for building noarch packages. They might contain firmware that has to run on all CPUs. So I'm just going to replicate the flags.
> 
> With the fix in rpm now, you'll only ever get -v4 by explicitly requesting
> it, all normal builds fall back to plain x86_64 as per buildarchtranslate.

I think it will break your workaround because %optflags is no longer unset. So we will build with -march=x86-64-v4 instead on Fedora builders.

I agree supporting --target x86_64_v4 is useful, so I'll implement that.

Comment 32 Florian Weimer 2023-08-21 12:22:13 UTC
Sorry, missed the bug reassignment.

Comment 33 Panu Matilainen 2023-08-21 12:30:31 UTC
(In reply to Florian Weimer from comment #31)
> (In reply to Panu Matilainen from comment #30)
> > > Maybe the real bug here is to build noarch packages with --target noarch, instead of a real architecture?
> > 
> > That's exactly what koji does. And that's also exactly the crack that this
> > bug falls through, those same noarch packages build fine when --target
> > noarch is NOT explicitly specified.
> 
> I've filed a Koji issue regarding this.

No no, koji does the right thing!

> 
> > But, this should be "fixed" in rpm-4.18.92-2.fc40 now: the arch-specific
> > stuff that ends up in %optflags in case of "--target noarch" is the same as
> > without it, that is, with buildarchtranslate applied. So you don't *have* to
> > do anything in redhat-rpm-config. 
> 
> Thanks.
> 
> > > I don't think we can use -march=x86-64-v4 for building noarch packages. They might contain firmware that has to run on all CPUs. So I'm just going to replicate the flags.
> > 
> > With the fix in rpm now, you'll only ever get -v4 by explicitly requesting
> > it, all normal builds fall back to plain x86_64 as per buildarchtranslate.
> 
> I think it will break your workaround because %optflags is no longer unset.
> So we will build with -march=x86-64-v4 instead on Fedora builders.

No, the thing with unset %optflags that triggers this all is noarch. Adding proper definitions for v4 is not supposed to break anything at all.

> 
> I agree supporting --target x86_64_v4 is useful, so I'll implement that.

Comment 34 Panu Matilainen 2023-08-21 13:06:11 UTC
> Maybe the real bug here is to build noarch packages with --target noarch, instead of a real architecture?

I misread this a bit initially, in case you wondered about the response. For any primary BuildArch: the correct way to build is with --target, it's more a bunch of forever bugs and missing features in rpm that it doesn't do this automatically on BuildArch, so you can end up with bizarrely different results depending on whether you passed eg --target noarch or not. Like here.

The underlying issue here is ambiguity over what all those macros mean in a --target context. As per rpm's definition they refer to the *target*, and from that it follows that a package which depends on *host* values should not be noarch package to begin with, instead it should just produce the allegedly arch-independent content in a noarch sub-package which is handled very differently than a top-level buildarch. But of course that's ugly, weird and error-prone and whatnot as well.

This all can only fixed by separating target and host macros and having packages explicitly adopt them. Needless to say, that is outside the scope of this bug :)

Thanks @fweimer for pushing back a bit on this, making the odd corner case of --target noarch behave more consistently wrt optflags is a good thing.

Comment 35 Panu Matilainen 2023-08-21 13:12:18 UTC
Oh and forgot to add: as usual it's best not to try fix ambiguity by tweaking individual behaviors or defaults, it'll accomplish nothing than breaking the other case in return. The patch I added doesn't address the ambiguity but an inconsistency in applying buildarchtranslate, which is could be considered an about 25 year old bug.

Comment 36 Panu Matilainen 2023-08-22 07:49:57 UTC
That said, my proposed fix from yesterday moves %optflags in the exact opposite direction of where it should be going by kinda legitimizing this use-case. It should *always* represent the target architecture %optflags. Only there is no %_host_optflags which is what compilation in a noarch package should actually use.

Comment 37 Fedora Update System 2023-08-24 06:45:55 UTC
FEDORA-2023-d1971fb6db has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-d1971fb6db

Comment 38 Fedora Update System 2023-09-05 02:01:14 UTC
FEDORA-2023-067d943f23 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-067d943f23`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-067d943f23

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 39 Fedora Update System 2023-09-18 00:15:56 UTC
FEDORA-2023-067d943f23 has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.