Bug 869826

Summary: heartbeat 'HA_LIBHBDIR' undeclared with cluster-glue-libs-devel-1.0.5-6.el6. Change also changes file locations in pacemaker-1.1.7-6.el6
Product: [Fedora] Fedora EPEL Reporter: James Hartsock <hartsjc>
Component: heartbeatAssignee: Kevin Fenzi <kevin>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: el6CC: abeekhof, andrew, hosting, kevin, lars.ellenberg, redhat-bugzilla, robert.scheck
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: heartbeat-3.0.4-2.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-18 00:19:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1028127    

Description James Hartsock 2012-10-24 22:05:06 UTC
Description of problem:
The heartbeat RPM appears to be impacted in a change in the recent cluster-glue-libs-devel RPMs which removed some #define lines to eliminate some conflicts to allow both 32 & 64 bit version of the devel RPM to be installed.  As a result RHEL updated the pacemaker RPM (BZ#808557) to no longer depend on some of these #define (also changed location of some files heartbeat uses).  It appears that pacemaker RPM now also needs to be updated in a similar way to no longer use these #defines and to note the new location of some pacemaker files:
  /usr/lib64/heartbeat//usr/libexec/pacemaker/{attrd,cib,crmd,stonithd}
    -to-
  /usr/libexec/pacemaker/{attrd,cib,crmd,stonithd}



Version-Release number of selected component (if applicable):
heartbeat-3.0.4-1.el6.x86_64

cluster-glue-1.0.5-2.el6.x86_64
cluster-glue-libs-1.0.5-2.el6.x86_64
pacemaker-1.1.6-3.el6.x86_64
pacemaker-libs-1.1.6-3.el6.x86_64
pacemaker-cluster-libs-1.1.6-3.el6.x86_64



How reproducible:
Install above RPM version and dependencies, when use will get some errors like:
  info: Pacemaker support: respawn
  ERROR: Client child command [/usr/lib64/heartbeat/cib] is not executable
  ERROR: Directive respawn  hacluster /usr/lib64/heartbeat/cib failed
  ERROR: Client child command [/usr/lib64/heartbeat/stonithd] is not executable
  ERROR: Directive respawn root /usr/lib64/heartbeat/stonithd failed 
  ERROR: Client child command [/usr/lib64/heartbeat/attrd] is not executable
  ERROR: Directive respawn  hacluster /usr/lib64/heartbeat/attrd failed
  ERROR: Client child command [/usr/lib64/heartbeat/crmd] is not executable
  ERROR: Directive respawn  hacluster /usr/lib64/heartbeat/crmd failed
  ERROR: Heartbeat not started: configuration error.
  ERROR: Configuration error, heartbeat not started.

Or install source RPM and try to rpmbuild package with latest cluster-glue-* & pacemaker* RPMs installed for RHEL 6.


Steps to Reproduce:
~~~
# yum upgrade cluster-glue cluster-glue-libs cluster-glue-libs-devel \
              pacemaker pacemaker-cli pacemaker-libs pacemaker-cluster-libs

# rpm -qa | grep -e ^cluster-glue -e ^pacemaker
cluster-glue-1.0.5-6.el6.x86_64
pacemaker-libs-1.1.7-6.el6.x86_64
pacemaker-cli-1.1.7-6.el6.x86_64
cluster-glue-libs-devel-1.0.5-6.el6.x86_64
pacemaker-cluster-libs-1.1.7-6.el6.x86_64
cluster-glue-libs-1.0.5-6.el6.x86_64
pacemaker-1.1.7-6.el6.x86_64

# pwd
/root/rpmbuild/SPECS

# rpmbuild -ba heartbeat.spec 
~~~



Actual results:
~~~
# rpmbuild -ba heartbeat.spec 
<snip>
gmake[2]: Entering directory `/root/rpmbuild/BUILD/Heartbeat-3-0-STABLE-3.0.4/heartbeat'
gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../include -I../include -I../include -I../linux-ha -I../linux-ha -I../libltdl -I../libltdl  -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -I/usr/include/heartbeat  -Wall -Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes -Wdeclaration-after-statement -Wpointer-arith -Wwrite-strings -Wcast-qual -Wcast-align -Wbad-function-cast -Winline -Wmissing-format-attribute -Wformat=2 -Wformat-security -Wformat-nonliteral -Wno-long-long -Wno-strict-aliasing   -ggdb3 -funsigned-char -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -I/usr/include/heartbeat  -Wall -Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes -Wdeclaration-after-statement -Wpointer-arith -Wwrite-strings -Wcast-qual -Wcast-align -Wbad-function-cast -Winline -Wmissing-format-attribute -Wformat=2 -Wformat-security -Wformat-nonliteral -Wno-long-long -Wno-strict-aliasing   -ggdb3 -funsigned-char -MT heartbeat-heartbeat.o -MD -MP -MF .deps/heartbeat-heartbeat.Tpo -c -o heartbeat-heartbeat.o `test -f 'heartbeat.c' || echo './'`heartbeat.c
heartbeat.c: In function 'restart_heartbeat':
heartbeat.c:4216: error: 'HA_LIBHBDIR' undeclared (first use in this function)
heartbeat.c:4216: error: (Each undeclared identifier is reported only once
heartbeat.c:4216: error: for each function it appears in.)
heartbeat.c:4216: error: expected ')' before string constant
heartbeat.c:4219: error: too few arguments to function 'execl'
heartbeat.c:4221: error: expected ')' before string constant
heartbeat.c:4222: error: too few arguments to function 'execl'
heartbeat.c:4229: error: expected ')' before string constant
heartbeat.c:4229: error: too few arguments to function 'execl'
heartbeat.c:4231: error: expected ')' before 'HA_LIBHBDIR'
gmake[2]: *** [heartbeat-heartbeat.o] Error 1
gmake[2]: Leaving directory `/root/rpmbuild/BUILD/Heartbeat-3-0-STABLE-3.0.4/heartbeat'
gmake[1]: *** [all-recursive] Error 1
gmake[1]: Leaving directory `/root/rpmbuild/BUILD/Heartbeat-3-0-STABLE-3.0.4/heartbeat'
make: *** [all-recursive] Error 1
error: Bad exit status from /var/tmp/rpm-tmp.gUEUnF (%build)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.gUEUnF (%build)
~~~




Expected results:
Expect rpmbuild to be a success, just like it is with older version of cluster-glue-lib & pacemaker:
~~~
# rpm -qa | grep -e ^cluster-glue -e ^pacemaker
pacemaker-cli-1.1.6-3.el6.x86_64
cluster-glue-libs-1.0.5-2.el6.x86_64
pacemaker-libs-1.1.6-3.el6.x86_64
pacemaker-1.1.6-3.el6.x86_64
cluster-glue-libs-devel-1.0.5-2.el6.x86_64
pacemaker-cluster-libs-1.1.6-3.el6.x86_64
cluster-glue-1.0.5-2.el6.x86_64

# rpmbuild -ba heartbeat.spec
<snip>
Processing files: heartbeat-debuginfo-3.0.4-1.el6.x86_64
Checking for unpackaged file(s): /usr/lib/rpm/check-files /root/rpmbuild/BUILDROOT/heartbeat-3.0.4-1.el6.x86_64
Wrote: /root/rpmbuild/SRPMS/heartbeat-3.0.4-1.el6.src.rpm
Wrote: /root/rpmbuild/RPMS/x86_64/heartbeat-3.0.4-1.el6.x86_64.rpm
Wrote: /root/rpmbuild/RPMS/x86_64/heartbeat-libs-3.0.4-1.el6.x86_64.rpm
Wrote: /root/rpmbuild/RPMS/x86_64/heartbeat-devel-3.0.4-1.el6.x86_64.rpm
Wrote: /root/rpmbuild/RPMS/x86_64/heartbeat-debuginfo-3.0.4-1.el6.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.GNsAzN
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd Heartbeat-3-0-STABLE-3.0.4
+ rm -rf /root/rpmbuild/BUILDROOT/heartbeat-3.0.4-1.el6.x86_64
+ exit 0
~~~



Additional info:

Another application that was impacted is the RHEL supplied pacemaker RPM which was updated (BZ#808557). This fix was actually accepted upstream:
https://github.com/ClusterLabs/pacemaker/commit/44e7ba80e2a259f3b407e2d35cf8353572c6d769

# rpm -q --changelog cluster-glue-libs-devel-1.0.5-6.el6.x86_64 | grep -B1 -A1 multilib
* Fri Mar 30 2012 David Vossel <dvossel> 1.0.5-6
- Fixes multilib conflicts in devel package.
  Resolves: rhbz#805147

# rpm -q --changelog pacemaker | grep -B1 -A1 HB_DAEMON_DIR
* Fri Mar 30 2012 David Vossel <dvossel> - 1.1.7-5
- Use default value for HB_DAEMON_DIR define when clusterglue does not provide one. 
  Resolves: rhbz#808557

# rpm -qp --list pacemaker-1.1.6-3.el6.x86_64.rpm | grep -e /cib$ -e /stonithd$ -e /attrd$ -e /crmd$
/usr/lib64/heartbeat/attrd
/usr/lib64/heartbeat/cib
/usr/lib64/heartbeat/crmd
/usr/lib64/heartbeat/stonithd

# rpm -qp --list pacemaker-1.1.7-6.el6.x86_64.rpm | grep -e /cib$ -e /stonithd$ -e /attrd$ -e /crmd$
/usr/libexec/pacemaker/attrd
/usr/libexec/pacemaker/cib
/usr/libexec/pacemaker/crmd
/usr/libexec/pacemaker/stonithd

Comment 1 Kevin Fenzi 2012-10-25 02:05:30 UTC
ok. I can look at fixing this, but I am about to head out on a trip... 

If someone could provide a patch or if cluster glue maintainer (added to cc) wants to send out a fixed build, feel free.

Comment 2 Andrew Beekhof 2012-10-25 03:39:09 UTC
So there are a few things going on here, but first question... how are you using pacemaker with heartbeat on EL6?  Is this an EPEL rebuild of pacemaker?  Because the one RH ships doesn't have support for Heartbeat compiled in.

Getting back to HB_DAEMON_DIR, if EPEL doesn't care about multilib we can revert David's change to cluster-glue.  Nothing in RHEL needs cluster-glue anymore so we'll be dropping it shortly anyway.

What's the best way to update cluster-glue?  There doesn't seem to be an EPEL branch for it.

Comment 3 James Hartsock 2012-10-25 13:42:21 UTC
As for pacemaker, it is the RHEL RPM, I am not aware of an EPEL build of it.

It appears at least on my system that cluster-glue-lib RPM is needed by pacemaker* & resource-agents from RHEL.

Comment 4 Andrew Beekhof 2012-10-25 19:46:09 UTC
(In reply to comment #3)
> As for pacemaker, it is the RHEL RPM, I am not aware of an EPEL build of it.

Thats simply not possible.

> It appears at least on my system that cluster-glue-lib RPM is needed by
> pacemaker* & resource-agents from RHEL.

The will no longer be the case in 6.4

Comment 5 Robert Scheck 2013-11-26 15:20:38 UTC
I am receiving the same make failure ("heartbeat.c:4216: error: 'HA_LIBHBDIR' 
undeclared (first use in this function)") when trying to package the symlink
to solve bug #1028127 on the packaging level.

Comment 6 Robert Scheck 2013-11-26 16:42:22 UTC
Is one of the heartbeat enlightened developers able to help here? :)

(In reply to Andrew Beekhof from comment #2)
> What's the best way to update cluster-glue?  There doesn't seem to be an
> EPEL branch for it.

By the way...as long as RHEL ships cluster-glue and EPEL will not branch it.
Even if RHEL might drop cluster-glue, it's still shipped with RHEL 6.5 as far
as I can see.

Comment 7 Lars Ellenberg 2013-11-27 07:55:34 UTC
cluster glue 1.0.5 is from April 2010.

Upstream glue is 1.0.12 (or 1.0.12 rc something).
If we build heartbeat packages against this on rhel6,
it just works (or so my build logs say).

So: replace your 3.5 years old cluster glue with a more recent one,
and build heartbeat against that.

Comment 8 Robert Scheck 2013-11-27 08:07:09 UTC
As cluster-glue is in RHEL (thus maintained by Red Hat) and heartbeat in
Fedora EPEL (thus maintained by the community) an update is unfortunately
not easily possible. And changes in RHEL are usually long-winded. So if
there is a chance to get this solved or worked around in heartbeat only
this would be likely much faster for the remaining heartbeat users here.

Comment 9 Lars Ellenberg 2013-11-27 08:22:13 UTC
*I* will not even attempt to find workarounds in some other package
because some distribution insists
on shipping 3.5 years old broken devel packages for some dependency,
and some other guidelines insist to not upgrade packages
if shipped by distribution.

That is simply wrong.

We (Linbit) have packages for all combinations of the full stack for rhel6,
i.e. pacemaker + cman, pacemaker + corosync 2, pacemaker + heartbeat.

Nothing special but some spec file massaging required, afaik.
(And, ok, for the most recent resource agens "breakage" of the heartbeat
init script, as also tracked here in that other bug, we still need to
fix the heartbeat init script at the least; in the long run,
I likely need to repackage heartbeat to use libexec as well.
But that's an other story.)

But yes, we also provide updated glue and other dependencies.

So if we can do that, you can do, too.

If you don't *want* to, for political reasons,
but insist on using very old, broken, devel packages
I cannot really help you ;-)

Comment 10 Robert Scheck 2013-11-27 13:47:54 UTC
Lars, thank you very much for your open reply. Thus I am linking this issue
with case 00979415 on the Red Hat customer portal as the cluster-glue thing
can only be resolved by Red Hat as it seems. I am happy to be told that I am
wrong here.

Comment 12 Andrew Beekhof 2013-12-01 23:20:07 UTC
(In reply to Lars Ellenberg from comment #9)
> *I* will not even attempt to find workarounds in some other package
> because some distribution insists
> on shipping 3.5 years old broken devel packages for some dependency,
> and some other guidelines insist to not upgrade packages
> if shipped by distribution.

cluster-glue is not shipped by rhel anymore (possibly since 6.3, my memory is a little hazy).  
So it should be possible to add a newer version of it to EPEL, but I don't know the correct proceedure.

Comment 13 Lars Ellenberg 2013-12-02 10:59:08 UTC
Guys,

if you rebuild heartbeat anyways,
please use current mercurial tip
not 3 years old 3.0.4.

Ok, "current" as in, was committed 8 month ago.
(Strange. I thought I wrote those patches together with those other 2012 ones.)

There are several highly relevant fixes.
Flaky network (first packet drop, then communication loss) could
 * potentially cause heartbeat core to eat up 100 % cpu, 
 * potentially preventing heartbeat from ever connecting to that node again
And
 * potentially heartbeat would segfault given bad timing of a node dead event
 * potentially heartbeat would not even notice a node as dead
   if it had massive packet loss just before that
 * in certain situations (again: packet loss helps to trigger it)
   the ccm would not converge, so nodes would not agree on membership

If it helps I can tag that as 3.0.6 "soon".
I'll cross-post this comment in the other bug, too.

Comment 14 Fedora Update System 2013-12-02 16:43:43 UTC
heartbeat-3.0.4-2.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/heartbeat-3.0.4-2.el6

Comment 15 Fedora Update System 2013-12-03 01:23:25 UTC
Package heartbeat-3.0.4-2.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing heartbeat-3.0.4-2.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-12278/heartbeat-3.0.4-2.el6
then log in and leave karma (feedback).

Comment 16 Fedora Update System 2013-12-18 00:19:19 UTC
heartbeat-3.0.4-2.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 17 Smile hosting 2014-02-11 13:15:19 UTC
Hi almighties,

just applied this minor update to our few cluster and guess what -> clusters is dead . I explain below :

This new version update( 3.0.4-1.el6 to 3.0.4-2.el6 ) just broke our clusters 's unicast fonctionnality taking origine to this new  patch puches by this bugreport version.

related broken patch : heartbeat-3.0.4-duplicate-ucast.patch

the result is heartbeat cannot start cause ucast (used in /etc/ha.d/ha.cf) cannot work with following error in logs :
info: glib: Starting serial heartbeat on tty /dev/ttyS1 (19200 baud)
info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on br1
info: glib: ucast: bound send socket to device: br1
ERROR: glib: ucast: error setting option SO_REUSEPORT(w): Protocol not available
ERROR: make_io_childpair: cannot open ucast br1
CRIT: Emergency Shutdown: Master Control process died.
CRIT: Killing pid 11194 with SIGTERM
CRIT: Killing pid 11198 with SIGTERM
CRIT: Killing pid 11199 with SIGTERM
CRIT: Emergency Shutdown(MCP dead): Killing ourselves.

When i downgrade to version 3.0.4-1.el6 it's all working back well.
So the patch applied in this bug report create a regression on unicast functionality.

Please rollback or finish/stabilize the patch "heartbeat-3.0.4-duplicate-ucast.patch".

I can test a new version if you want me to , before you push it to stable REPO.

Regards, aurelien Lemaire from Smile Hosting.

Comment 18 Lars Ellenberg 2014-02-12 14:31:11 UTC
(In reply to Smile hosting from comment #17)

double posted in the other bug,
answered there:
https://bugzilla.redhat.com/show_bug.cgi?id=1028127#c55

    Lars