Bug 1799842

Summary: pacemaker: FTBFS in Fedora rawhide/f32
Product: [Fedora] Fedora Reporter: Fedora Release Engineering <releng>
Component: pacemakerAssignee: Jan Pokorný [poki] <jpokorny>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 32CC: andrew, anprice, dan, hannsj_uhl, jpokorny, lhh, mhroncok
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-2.0.3-4.fc33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-06 17:32:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1799903, 1811158    
Bug Blocks: 485231, 1750908, 1785415, 1792464, 1803234    
Attachments:
Description Flags
build.log
none
root.log
none
state.log none

Description Fedora Release Engineering 2020-02-06 19:07:43 UTC
pacemaker failed to build from source in Fedora rawhide/f32

https://koji.fedoraproject.org/koji/taskinfo?taskID=41320191


For details on the mass rebuild see:

https://fedoraproject.org/wiki/Fedora_32_Mass_Rebuild
Please fix pacemaker at your earliest convenience and set the bug's status to
ASSIGNED when you start fixing it. If the bug remains in NEW state for 8 weeks,
pacemaker will be orphaned. Before branching of Fedora 33,
pacemaker will be retired, if it still fails to build.

For more details on the FTBFS policy, please visit:
https://fedoraproject.org/wiki/Fails_to_build_from_source

Comment 1 Fedora Release Engineering 2020-02-06 19:07:45 UTC
Created attachment 1660228 [details]
build.log

Comment 2 Fedora Release Engineering 2020-02-06 19:07:47 UTC
Created attachment 1660229 [details]
root.log

file root.log too big, will only attach last 32768 bytes

Comment 3 Fedora Release Engineering 2020-02-06 19:07:48 UTC
Created attachment 1660230 [details]
state.log

Comment 4 Jan Pokorný [poki] 2020-02-07 16:07:48 UTC
re [comment 2]:

> DEBUG util.py:596:  Error:
> DEBUG util.py:596:   Problem: package publican-4.3.2-14.fc31.noarch requires fop, but none of the providers can be installed
> DEBUG util.py:596:    - package fop-2.2-4.fc30.noarch requires avalon-framework >= 4.1.4, but none of the providers can be installed
> DEBUG util.py:596:    - conflicting requests
> DEBUG util.py:596:    - nothing provides mvn(avalon-logkit:avalon-logkit) needed by avalon-framework-4.3-24.fc31.noarch

This is a chained dependency problem, immediately coming from publican,
see the respective [bug 1799903 comment 4] for a more complete story.

Marking this immediate dependency here -- nothing we can do about the
FTBFS state right away (well, except for excluding documentation from
what we ship in Fedora, which is rather an extreme workaround).

Comment 5 Ben Cotton 2020-02-11 16:37:45 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.

Comment 6 Fedora Release Engineering 2020-02-16 04:28:35 UTC
Dear Maintainer,

your package has not been built successfully in 32. Action is required from you.

If you can fix your package to build, perform a build in koji, and either create
an update in bodhi, or close this bug without creating an update, if updating is
not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to
acknowledge this. Following the latest policy for such packages [2], your package
will be orphaned if this bug remains in NEW state more than 8 weeks.

A week before the mass branching of Fedora 33 according to the schedule [3],
any packages not successfully rebuilt at least on Fedora 31 will be
retired regardless of the status of this bug.

[1] https://fedoraproject.org/wiki/Updates_Policy
[2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/
[3] https://fedoraproject.org/wiki/Releases/33/Schedule

Comment 7 Miro Hrončok 2020-02-23 13:57:04 UTC
pacemaker fails to build with Python 3.9.0a3 due to this Python-unrelaed FTBFS with gcc 10:

...
/usr/bin/ld: pacemaker_attrd-attrd_utils.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: multiple definition of `attrd_cluster'; pacemaker_attrd-pacemaker-attrd.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: first defined here
/usr/bin/ld: pacemaker_attrd-attrd_alerts.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:110: multiple definition of `attributes'; pacemaker_attrd-pacemaker-attrd.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:110: first defined here
/usr/bin/ld: pacemaker_attrd-attrd_alerts.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: multiple definition of `attrd_cluster'; pacemaker_attrd-pacemaker-attrd.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: first defined here
/usr/bin/ld: pacemaker_attrd-attrd_elections.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: multiple definition of `attrd_cluster'; pacemaker_attrd-pacemaker-attrd.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:109: first defined here
/usr/bin/ld: pacemaker_attrd-attrd_elections.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:110: multiple definition of `attributes'; pacemaker_attrd-pacemaker-attrd.o:/builddir/build/BUILD/pacemaker-Pacemaker-2.0.3/daemons/attrd/pacemaker-attrd.h:110: first defined here
/usr/bin/ld: warning: /usr/lib/gcc/x86_64-redhat-linux/10/../../../../lib64/libqb.so contains output sections; did you forget -T?
collect2: error: ld returned 1 exit status

See https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/RYVPP45PMWPXYBBDKXO3CI7YGL7CDQG6/ and https://gcc.gnu.org/gcc-10/porting_to.html#common for more information about the failure.


For the build logs, see:
https://copr-be.cloud.fedoraproject.org/results/@python/python3.9/fedora-rawhide-x86_64/01248360-pacemaker/

For all our attempts to build pacemaker with Python 3.9, see:
https://copr.fedorainfracloud.org/coprs/g/python/python3.9/package/pacemaker/

Testing and mass rebuild of packages is happening in copr. You can follow these instructions to test locally in mock if your package builds with Python 3.9:
https://copr.fedorainfracloud.org/coprs/g/python/python3.9/

Let us know here if you have any questions.

Python 3.9 will be included in Fedora 33. To make that update smoother, we're building Fedora packages with early pre-releases of Python 3.9.
A build failure prevents us from testing all dependent packages (transitive [Build]Requires), so if this package is required a lot, it's important for us to get it fixed soon.
We'd appreciate help from the people who know this package best, but if you don't want to work on this now, let us know so we can try to work around it on our side.

Comment 8 Fedora Release Engineering 2020-03-01 04:27:40 UTC
Dear Maintainer,

your package has not been built successfully in 32. Action is required from you.

If you can fix your package to build, perform a build in koji, and either create
an update in bodhi, or close this bug without creating an update, if updating is
not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to
acknowledge this. Following the latest policy for such packages [2], your package
will be orphaned if this bug remains in NEW state more than 8 weeks.

A week before the mass branching of Fedora 33 according to the schedule [3],
any packages not successfully rebuilt at least on Fedora 31 will be
retired regardless of the status of this bug.

[1] https://fedoraproject.org/wiki/Updates_Policy
[2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/
[3] https://fedoraproject.org/wiki/Releases/33/Schedule

Comment 9 Jan Pokorný [poki] 2020-03-04 16:48:57 UTC
re [comment 7]:

Miro, sorry for blocking you, pacemaker used to be stalled on a train
wreck with build dependencies ([comment 4]), which didn't move forward
until some two weeks ago (if I skim it down to [bug 1799365] well).

To put insult to injury, without being unblocked on build prereqs first,
the standard workflow (free of cutting down some features at configure
time) didn't allow me to anticipate and fix further problems related to
GCC 10 -- these have actually been discovered by that point and fixed
in master branch, but not in this proper 2.0.3 version of pacemaker.
So this time around, all was rather painful and with more iteration
rounds than usual.

And, unfortunately, s390x build failed without any explanatory message
from the linker command, which was the culprit.  Hence, s390x arch
is disabled at the moment.

Let me know if you see any further problems, e.g. Python related.

Comment 10 Jan Pokorný [poki] 2020-03-04 17:40:03 UTC
Marking this bug as blocking F-ExcludeArch-s390x for the reason
just mentioned:

> And, unfortunately, s390x build failed [...]

Comment 11 Dan Horák 2020-03-04 17:47:13 UTC
The problem is that all output goes into /dev/null, otherwise one could see following

[sharkcz@devel10 pengine]$ gcc -DHAVE_CONFIG_H -I. -I../../include -I../../include -I../../include -I../../libltdl -I../../libltdl -DPCMK_TIME_EMERGENCY_CGT -UPCMK_TIME_EMERGENCY_CGT -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/libxml2 -I/usr/include/heartbeat -I/usr/include/dbus-1.0 -I/usr/lib64/dbus-1.0/include -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=zEC12 -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection -ggdb -fgnu89-inline -Wall -Waggregate-return -Wbad-function-cast -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat-security -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith -Wwrite-strings -Wunused-but-set-variable -Wformat=2 -Wformat-nonliteral -Werror -c utils.c -o libpe_status_la-utils.o
In file included from ../../include/crm_internal.h:21,
                 from utils.c:10:
In function ‘pe_action_set_reason’,
    inlined from ‘custom_action’ at utils.c:605:13:
../../include/crm/common/logging.h:235:13: error: ‘%s’ directive argument is null [-Werror=format-overflow=]
  235 |             qb_log_from_external_source(__func__, __FILE__, fmt, level,     \
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  236 |                                         __LINE__, converted_tag , ##args);  \
      |                                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../include/crm/pengine/internal.h:19:43: note: in expansion of macro ‘crm_log_tag’
   19 | #  define pe_rsc_trace(rsc, fmt, args...) crm_log_tag(LOG_TRACE, rsc ? rsc->id : "<NULL>", fmt, ##args)
      |                                           ^~~~~~~~~~~
utils.c:2502:9: note: in expansion of macro ‘pe_rsc_trace’
 2502 |         pe_rsc_trace(action->rsc, "Changing %s reason from '%s' to '%s'", action->uuid, action->reason, reason);
      |         ^~~~~~~~~~~~
utils.c: In function ‘custom_action’:
utils.c:2502:69: note: format string is defined here
 2502 |         pe_rsc_trace(action->rsc, "Changing %s reason from '%s' to '%s'", action->uuid, action->reason, reason);
      |                                                                     ^~
cc1: all warnings being treated as errors

Comment 12 Jan Pokorný [poki] 2020-03-04 23:13:21 UTC
Preliminary fix:
https://github.com/ClusterLabs/pacemaker/pull/2004

Comment 13 Miro Hrončok 2020-03-04 23:32:15 UTC
poki, https://copr.fedorainfracloud.org/coprs/g/python/python3.9/package/pacemaker/ is a success, thanks.

Comment 14 Jan Pokorný [poki] 2020-03-05 15:22:31 UTC
Dan or whoever with easy access to s390x machine:

As Dan let me know out-of-band, there is a subtle problem with hidden
stdout/stderr of some compilation commands in pacemaker build
-- coincidentally those that are failing on s390x, making it difficult
to debug.

But, I've observed this troublesome pattern based on build logs from
failed s390x builds.  We are building with the help of libtool, meaning
that each module for what is to be linked together to form a shared
library is compiled twice:

1. first (standard?) pass, producing output to
   .libs/<LIBNAME>_la_<MODULE>.o

   - this finishes just fine

   - this is run without any stdout/stderr hiding


2. second (libtool-specific?) pass, producing output to
   <LIBNAME>_la_<MODULE>.o

   - this is what fails while the former has already finishes
     without problems (?!)

   - this is run with said stdout/stderr hiding (Dan thought
     we are hiding this intentionally -- nope, it's automatism
     in libtool and it works like this everywhere), likely
     justified by the fact that when the build per 1. suceed,
     this one must as well

   - the only spottable difference with this compilation command
     is that it lacks -fPIC -DPIC switches just prior to terminating
     "-o <OUTPUT>" part

I am rather lost why said difference would play such a crucial role
regarding whether the build succeeds or not -- and while the same
difference in "-fPIC -DPIC" presence happens also with other archs,
it won't turn the build down like in case of s390x.

I conducted a scratch build with an immediate fix[1] for what Dan
shared ([comment 11]) and it manifested the problem at another
location/occasion[2], but due to the libtool's success->success
(presumably at least at s390x) flawed logic, I can't proceed
further.

Therefore, take this as a request for help, please.


[1] https://github.com/ClusterLabs/pacemaker/pull/2004
[2] https://koji.fedoraproject.org/koji/taskinfo?taskID=42209989

or this snippet from respective build.log:

compilation per 1. above:

> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../include
> -I../../include -I../../include -I../../libltdl -I../../libltdl
> -I../.. -I../.. -DPCMK_TIME_EMERGENCY_CGT -UPCMK_TIME_EMERGENCY_CGT
> -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include
> -I/usr/include/libxml2 -I/usr/include/heartbeat
> -I/usr/include/dbus-1.0 -I/usr/lib64/dbus-1.0/include
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -O2 -g -pipe -Wall
> -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong
> -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=zEC12
> -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection -ggdb
> -fgnu89-inline -Wall -Waggregate-return -Wbad-function-cast
> -Wcast-align -Wdeclaration-after-statement -Wendif-labels
> -Wfloat-equal -Wformat-security -Wmissing-prototypes
> -Wmissing-declarations -Wnested-externs -Wno-long-long
> -Wno-strict-aliasing -Wpointer-arith -Wwrite-strings
> -Wunused-but-set-variable -Wformat=2 -Wformat-nonliteral -Werror -c
> pcmk_sched_group.c  -fPIC -DPIC -o
> .libs/libpacemaker_la-pcmk_sched_group.o

immediately followed with compilation per 2. above:

> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../include
> -I../../include -I../../include -I../../libltdl -I../../libltdl
> -I../.. -I../.. -DPCMK_TIME_EMERGENCY_CGT -UPCMK_TIME_EMERGENCY_CGT
> -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include
> -I/usr/include/libxml2 -I/usr/include/heartbeat
> -I/usr/include/dbus-1.0 -I/usr/lib64/dbus-1.0/include
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -O2 -g -pipe -Wall
> -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong
> -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=zEC12
> -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection -ggdb
> -fgnu89-inline -Wall -Waggregate-return -Wbad-function-cast
> -Wcast-align -Wdeclaration-after-statement -Wendif-labels
> -Wfloat-equal -Wformat-security -Wmissing-prototypes
> -Wmissing-declarations -Wnested-externs -Wno-long-long
> -Wno-strict-aliasing -Wpointer-arith -Wwrite-strings
> -Wunused-but-set-variable -Wformat=2 -Wformat-nonliteral -Werror -c
> pcmk_sched_group.c -o libpacemaker_la-pcmk_sched_group.o >/dev/null
> 2>&1

resulting in:

> make[3]: *** [Makefile:733: libpacemaker_la-pcmk_sched_constraints.lo] Error 1

Comment 15 Dan Horák 2020-03-05 17:49:00 UTC
The -fPIC/no-PIC builds could be for both static and shared libs or libtool mode set incorrectly and in theory both compiles could use a different path in the compiler, so different set of warnings. I need to look closer or I can give you access to the s390x machine I use.

One thing is true generally, production builds shouldn't use -Werror.

Comment 16 Jan Pokorný [poki] 2020-03-05 18:23:17 UTC
> One thing is true generally, production builds shouldn't use -Werror.

For us, it's deliberate, I believe:
better safe (broken build) than sorry (crash at run-time).

The first problem causing the build to fail you've reported and
that was already patched was a real crash-worthy problem (well,
depending on libc implementation, but nonetheless).

So that's the habit we follow, sparing us additional hassles for
CI purposes, for instance.

TBH, I dare to say that nobody will go fishing the warnings messages
from automated build systems (like koji), for big projects, one could
easily get blind to warnings during standard dev edit-recompile cycles.
So that's purposefully enforcing like this, with ability to opt-out.

Also, odds are that this very case exposed a flaw in libtool's
reasoning as mentioned (successful compilation 1. does not necessarily
imply success of compilation 2., so it's counter-productive to disable
any outputs going from 2., as they may actually contain the meat in
form of what failed).

Comment 17 Dan Horák 2020-03-06 12:22:44 UTC
Without going to the details of the buildsystem or libtool, I have a solution bellow

diff --git a/pacemaker.spec b/pacemaker.spec
index d7ae4b9..2a24086 100644
--- a/pacemaker.spec
+++ b/pacemaker.spec
@@ -387,6 +387,7 @@ export CPPFLAGS="-UPCMK_TIME_EMERGENCY_CGT $CPPFLAGS"
         %{?with_coverage:      --with-coverage}                                 \
         %{!?with_doc:          --with-brand=}                                   \
         %{?gnutls_priorities:  --with-gnutls-priorities="%{gnutls_priorities}"} \
+        --disable-static \
         --with-initdir=%{_initrddir}                                            \
         --with-runstatedir=%{_rundir}                                           \
         --localstatedir=%{_var}                                                 \


because static libs aren't packaged anyway. Looks like gcc has a problem with the "static part", combine with the redirection to /dev/null (maybe libtool expects the same output from the compiler for both static and shared compile) and an un-explainable problem is here. Still it would be good to know, why gcc has different opinions on the code between static and shared (no-PIC/PIC).

Comment 18 Jan Pokorný [poki] 2020-03-06 13:11:16 UTC
Sounds like an immediate plan, thanks.

Will you proceed to file a respective libtool bug?

It is my gut feeling that gcc is able to prove some more invariants
in a static-way compilation than otherwise ... perhaps unless LTO
is applied (which will be really interesting to eventually enable,
since I guess it will discover whole a lot of new potential problems).

This means that libtool's assumption is indeed flawed and this
part should go away, at least for architectures known to differ,
such as s390x:

>      # Allow error messages only from the first compilation.
>      if test yes = "$suppress_opt"; then
>        suppress_output=' >/dev/null 2>&1'
>      fi

(eventually with some smart middle mean that would compare
stdout/stderr between 1. and 2. and only present them from 2.
if they differ).

Comment 19 Dan Horák 2020-03-06 13:37:25 UTC
I'm going to file a gcc bug, because it's the root cause. It should provide consistent warnings across arches.

Comment 20 Jan Pokorný [poki] 2020-03-06 17:32:16 UTC
Thanks, the work here is over, at least for now, I think.
Build using ./configure --disable-static went through.

Not sure what ramifications are there for blocking the
main s390x arch bug -- do as you wish.

If there are any more problems, feel free to reopen.

Comment 21 Dan Horák 2020-03-10 10:11:03 UTC
Jan, do you plan to fix the other warnings in the source code too as they seem to be valid based on the feedback from the gcc devels in bug 1811158? Let me know if you need access to a s390x machine for reproducing them.

Comment 22 Jan Pokorný [poki] 2020-03-10 10:24:51 UTC
Yeah, I realize, we'd rather deal with the underlying problems
anyway.

My plan is to fiddle around with extra options suggested in
[bug 1811158 comment 2] to see if induces the problems first.

If I get stuck, I'll kindly ask for access to s390x, thanks for
the offer.