Bug 2231727

Summary: noarch builds fail when running on x86-64-v4 hosts
Product: [Fedora] Fedora Reporter: marcdeop
Component: redhat-rpm-configAssignee: Florian Weimer <fweimer>
Status: ASSIGNED --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 39CC: ajax, carl, code, davide, ffesti, fweimer, igor.raits, j, klember, mdomonko, mhroncok, michel, ngompa13, nickc, packaging-team-maint, pmatilai, sipoyare, torsava
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
URL: https://kojipkgs.fedoraproject.org//work/tasks/4873/104524873/build.log
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description marcdeop 2023-08-13 15:49:32 UTC
While doing chainbuilds with KDE packages I noticed my noarch builds were failing

There seems to be some problem when building noarch packages on x86_64-v4 hosts.

Building packages plasma-workspace-wallpapers and breeze-gtk as examples. Logs:

https://kojipkgs.fedoraproject.org//work/tasks/4578/104324578/build.log
https://kojipkgs.fedoraproject.org//work/tasks/4873/104524873/build.log

I guess it has something to do with the value of CFLAGS (compare to values found in a successful build: https://kojipkgs.fedoraproject.org//work/tasks/591/104500591/build.log )



Reproducible: Sometimes

Steps to Reproduce:
1.Trigger a noarch build on koji (breeze-gtk, plasma-workspace-wallapers and others)
2.Build fails when I notice -march=x86-64-v4 in the logs


Expected Results:  
Successfull build

Comment 1 Michal Domonkos 2023-08-14 10:07:13 UTC
This appears to be fallout from the addition of the new x86-64 architecture levels in 4.19:
https://github.com/rpm-software-management/rpm/pull/2315

The CFLAGS value indeed is a good catch and most likely the reason for this error.  It's set by RPM from its internal optflags value (stored in /usr/lib/rpm/rpmrc) and is actually correct for the (newly detected) x86-64-v4 architecture.  However, on Fedora, we ship an extended variant of optflags for each platform (through redhat-rpm-config, see /usr/lib/rpm/redhat/rpmrc), and as it turns out, those values haven't been adapted for these new arch levels.  Thus, RPM ends up using the plain, built-in value of "-O2 -g -march=x86-64-v4", i.e. the same as it would for any other architecture if redhat-rpm-config wasn't installed.

I'm not yet sure what the actual fix would look like (just copy the optflags line for x86_64_v2, v3 and v4?) but I believe it has to be done redhat-rpm-config, therefore reassigning for now.

Comment 2 Florian Weimer 2023-08-14 10:45:31 UTC
(In reply to Michal Domonkos from comment #1)
> I'm not yet sure what the actual fix would look like (just copy the optflags
> line for x86_64_v2, v3 and v4?) but I believe it has to be done
> redhat-rpm-config, therefore reassigning for now.

I'm not sure how a noarch package is even supposed to obtain correct settings for %build_cflags. But I guess we need to support this somehow. I'm not too thrilled to fix it by adding more optflags entries. The RPM change is supposed to be backwards-compatible.

Comment 3 Neal Gompa 2023-08-14 10:52:46 UTC
The x86_64 subarches isn't the biggest problem. The biggest problem is that all the multiarch macros are resolving incorrectly. Note that everything that's supposed to be /usr/lib64 is resolving to /usr/lib and that breaks CMake and everything else.

This breaks noarch KDE builds because it can't find things like Qt anymore.

Comment 4 Neal Gompa 2023-08-14 10:53:52 UTC
I'm fairly certain this is an RPM bug since we don't even define x86_64-v4 in redhat-rpm-config.

Comment 5 Neal Gompa 2023-08-14 10:55:10 UTC
(And yes, while part of the solution may be defining stuff in redhat-rpm-config, there's still a significant breakage in RPM itself.)

Comment 6 Michal Domonkos 2023-08-14 11:32:35 UTC
Indeed, the improper CFLAGS value may just be the tip of the iceberg.  I suppose it shouldn't affect the configuration step in cmake anyway, which is what's actually failing.

The first warning in the build log happens in /usr/share/ECM/kde-modules/KDEInstallDirsCommon.cmake:39.  Looking at that file, CMAKE_SIZEOF_VOID_P is apparently undefined for some reason.

Comment 7 Michal Domonkos 2023-08-14 12:04:21 UTC
(In reply to Michal Domonkos from comment #6)
> I suppose it shouldn't affect the configuration step in cmake anyway, which is
> what's actually failing.

Which is not entirely true:

CMAKE_SIZEOF_VOID_P
This is set to the size of a pointer on the target machine, and is determined by a try compile. If a 64-bit size is found, then the library search path is modified to look for 64-bit libraries first.

Comment 8 Neal Gompa 2023-08-14 12:09:02 UTC
This issue goes back to all the file paths being broken. It can't do anything meaningful right now.

Comment 9 Michal Domonkos 2023-08-14 12:16:51 UTC
$ grep ^%_lib /usr/lib/rpm/platform/x86_64_v4-linux/macros
%_libexecdir		%{_exec_prefix}/libexec
%_lib			lib64
%_libdir		%{_prefix}/lib64

This looks alright.

Comment 10 Panu Matilainen 2023-08-14 12:19:56 UTC
This is all severely fishy. A noarch package is SUPPOSED TO get /usr/lib as the path (see /usr/lib/rpm/platform/noarch-linux/macros), ie it can't rely on having some arch-specific paths available. So I would *expect* all those %cmake macros to be broken on noarch packages when building with --target=noarch as koji does (IIRC). But that doesn't explain why v4 behaves differently.

Since when has this been happening?

Comment 11 Kalev Lember 2023-08-14 12:33:10 UTC
Could it be that the compile test that cmake executes to determine pointer size fails because the builder hosts don't support x86_64-v4 instruction set?

Comment 12 Michal Domonkos 2023-08-14 12:38:26 UTC
It could, however as Panu just noted in our team chat, it might as well be as simple as the -m64 flag not being used, due to RPM not being able to match the platform in /usr/lib/rpm/redhat/rpmrc for x86_64_v4 (since it's just not defined there).

Comment 13 Michal Domonkos 2023-08-14 12:39:56 UTC
In any case, it seems like the root cause here is that, despite it being a noarch build, cmake in fact needs to determine the pointer size for some reason, and fails to do that, given the wrong (insufficient) CFLAGS, due to the reason mentioned previously.

Comment 14 Michal Domonkos 2023-08-14 14:26:02 UTC
FWIW, this is locally reproducible with:

$ mock --arch x86_64_v4 -r fedora-rawhide-x86_64 plasma-workspace-wallpapers-5.27.7-1.fc40.src.rpm

Comment 15 Michal Domonkos 2023-08-14 18:22:52 UTC
OK, after digging into this further, the root cause really is quite simple and what we suspected at the beginning, which is that CFLAGS is not set properly.

The reason it's causing a cmake failure for these particular packages is that the KDE cmake module (KDEInstallDirsCommon.cmake) uses CMAKE_SIZEOF_VOID_P in order to decide between the /lib vs. /lib64 paths for its own purposes.  CMAKE_SIZEOF_VOID_P is set in cmake by doing a native compilation, that is, it happens even when we're otherwise doing a noarch build.  And for that, CFLAGS and LDFLAGS need to be set properly for that platform, otherwise such kind of misdetection happens.

CFLAGS is initialized from %build_cflags which is initialized from %optflags, which in turn is initialized from the optflags values stored in the platform-specific RPM configuration files.  The thing is, RPM now internally treats the new architecture levels as separate architectures, e.g. x86_64_v4 instead of x86_64, and uses that key to match the respective optflags in the configuration files.

LDFLAGS is initialized from %build_ldflags and does not (seem to) depend on the architecture detected.  This macro is only defined in redhat-rpm-config's macros file.

Now, the newly added x86_64_v* optflags are listed in the stock RPM config file (/usr/lib/rpm/rpmrc), however they're not listed in the config file shipped by redhat-rpm-config (which overrides the former).  Thus, RPM simply ends up using the default optflags for these new levels, which is (in the case of v4):

  -O2 -g -march=x86-64-v4

Whereas the original x86_64 architecture has this in redhat-rpm-config's rpmrc file:

  %{__global_compiler_flags} -m64 %{__cflags_arch_x86_64} -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection %{_frame_pointers_cflags} %{_frame_pointers_cflags_x86_64}

In fact, the actual missing flag from the former example is just -fPIC/-fPIE.  Adding it to the line fixes the build failures for me.  In the redhat-rpm-config version, it gets expanded through a bunch of macros starting with %{__global_compiler_flags}.

What's interesting is that when I remove the redhat-rpm-config's macros file that contains the definition of %build_ldflags, thus rendering LDFLAGS empty, it also fixes the problem.  So I suspect that it's the resulting combination of the plain CFLAGS and empty LDFLAGS that's causing some kind of mismatch somewhere (in GCC?).

Currently, I can see three options to fix this:

1) Revert the patch adding the new arch levels
2) Add new x86_64_v* optflags to redhat-rpm-config
3) Make it so that RPM treats x86_64_v* as x86_64 internally

The first two options seem quite ugly, the last might go against the very idea of the feature and would probably require code changes in rpmrc.c (in the get_x86_64_level() function) which I'm not quite comfortable touching myself.

Any thoughts?

Comment 16 Michal Domonkos 2023-08-14 18:27:07 UTC
Another option would be:

4) Add -fPIE to the stock optflags for the new x86_64 levels, or rather -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 (which is also used by redhat-rpm-config's optflags for x86_64)

Comment 17 Michal Domonkos 2023-08-14 18:30:37 UTC
Argh, got tangled up here a little bit, correction for:

(In reply to Michal Domonkos from comment #15)
> combination of the plain CFLAGS and empty LDFLAGS that's causing some kind

*non-empty* LDFLAGS

Comment 18 Michal Domonkos 2023-08-14 18:31:43 UTC
> 4) Add -fPIE to the stock optflags for the new x86_64 levels, or rather
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 (which is also used by
> redhat-rpm-config's optflags for x86_64)

Hmm, this wouldn't work of course, since it would require redhat-rpm-config...

Comment 19 Panu Matilainen 2023-08-15 07:33:43 UTC
(In reply to Michal Domonkos from comment #14)
> FWIW, this is locally reproducible with:
> 
> $ mock --arch x86_64_v4 -r fedora-rawhide-x86_64
> plasma-workspace-wallpapers-5.27.7-1.fc40.src.rpm

What actually makes this comprehensible is that it equally fails with "--arch noarch", which is what is actually happening on koji. So I think that currently all noarch builds that rely on OPTFLAGS and happen to land on x86_64 are failing on koji, and _v4 specifically has nothing to do with it. It's just that there aren't that many of such packages. Which is good, because technically OPTFLAGS on noarch should be empty (ultimately maybe even an error) because no other value is meaningful.

It seems the rpm-correct solution for such packages would be to drop the BuildArch from the main package, and create the content as a sub-arch specific "BuildArch: noarch" instead. 

But, redhat-rpm-config should nevertheless provide proper configuration for the detected architectures. It defaults to x86_64 but the point of those subarches is to allow building optimized versions, and for those we'd currently supply a very incomplete (even buggy, in Fedora setting) optflags. And once those are added, these packages will start to build again even if technically for the wrong reasons.

Anyhow, the right thing here is to add the new sub-architectures to redhat-rpm-config to support the new architectures, just like there are configs for i386, i486, i586 and so on.

Comment 20 Panu Matilainen 2023-08-15 08:34:37 UTC
On a related note: https://github.com/rpm-software-management/rpm/pull/2615

Comment 21 Fedora Release Engineering 2023-08-16 08:14:20 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.