Bug 869826
Summary: | heartbeat 'HA_LIBHBDIR' undeclared with cluster-glue-libs-devel-1.0.5-6.el6. Change also changes file locations in pacemaker-1.1.7-6.el6 | ||
---|---|---|---|
Product: | [Fedora] Fedora EPEL | Reporter: | James Hartsock <hartsjc> |
Component: | heartbeat | Assignee: | Kevin Fenzi <kevin> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | el6 | CC: | abeekhof, andrew, hosting, kevin, lars.ellenberg, redhat-bugzilla, robert.scheck |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | heartbeat-3.0.4-2.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-12-18 00:19:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1028127 |
Description
James Hartsock
2012-10-24 22:05:06 UTC
ok. I can look at fixing this, but I am about to head out on a trip... If someone could provide a patch or if cluster glue maintainer (added to cc) wants to send out a fixed build, feel free. So there are a few things going on here, but first question... how are you using pacemaker with heartbeat on EL6? Is this an EPEL rebuild of pacemaker? Because the one RH ships doesn't have support for Heartbeat compiled in. Getting back to HB_DAEMON_DIR, if EPEL doesn't care about multilib we can revert David's change to cluster-glue. Nothing in RHEL needs cluster-glue anymore so we'll be dropping it shortly anyway. What's the best way to update cluster-glue? There doesn't seem to be an EPEL branch for it. As for pacemaker, it is the RHEL RPM, I am not aware of an EPEL build of it. It appears at least on my system that cluster-glue-lib RPM is needed by pacemaker* & resource-agents from RHEL. (In reply to comment #3) > As for pacemaker, it is the RHEL RPM, I am not aware of an EPEL build of it. Thats simply not possible. > It appears at least on my system that cluster-glue-lib RPM is needed by > pacemaker* & resource-agents from RHEL. The will no longer be the case in 6.4 I am receiving the same make failure ("heartbeat.c:4216: error: 'HA_LIBHBDIR' undeclared (first use in this function)") when trying to package the symlink to solve bug #1028127 on the packaging level. Is one of the heartbeat enlightened developers able to help here? :) (In reply to Andrew Beekhof from comment #2) > What's the best way to update cluster-glue? There doesn't seem to be an > EPEL branch for it. By the way...as long as RHEL ships cluster-glue and EPEL will not branch it. Even if RHEL might drop cluster-glue, it's still shipped with RHEL 6.5 as far as I can see. cluster glue 1.0.5 is from April 2010. Upstream glue is 1.0.12 (or 1.0.12 rc something). If we build heartbeat packages against this on rhel6, it just works (or so my build logs say). So: replace your 3.5 years old cluster glue with a more recent one, and build heartbeat against that. As cluster-glue is in RHEL (thus maintained by Red Hat) and heartbeat in Fedora EPEL (thus maintained by the community) an update is unfortunately not easily possible. And changes in RHEL are usually long-winded. So if there is a chance to get this solved or worked around in heartbeat only this would be likely much faster for the remaining heartbeat users here. *I* will not even attempt to find workarounds in some other package because some distribution insists on shipping 3.5 years old broken devel packages for some dependency, and some other guidelines insist to not upgrade packages if shipped by distribution. That is simply wrong. We (Linbit) have packages for all combinations of the full stack for rhel6, i.e. pacemaker + cman, pacemaker + corosync 2, pacemaker + heartbeat. Nothing special but some spec file massaging required, afaik. (And, ok, for the most recent resource agens "breakage" of the heartbeat init script, as also tracked here in that other bug, we still need to fix the heartbeat init script at the least; in the long run, I likely need to repackage heartbeat to use libexec as well. But that's an other story.) But yes, we also provide updated glue and other dependencies. So if we can do that, you can do, too. If you don't *want* to, for political reasons, but insist on using very old, broken, devel packages I cannot really help you ;-) Lars, thank you very much for your open reply. Thus I am linking this issue with case 00979415 on the Red Hat customer portal as the cluster-glue thing can only be resolved by Red Hat as it seems. I am happy to be told that I am wrong here. And now I did it anyways... https://bugzilla.redhat.com/show_bug.cgi?id=1028127#c33 https://bugzilla.redhat.com/show_bug.cgi?id=1028127#c34 and https://bugzilla.redhat.com/attachment.cgi?id=830886&action=diff Cheers, Lars (In reply to Lars Ellenberg from comment #9) > *I* will not even attempt to find workarounds in some other package > because some distribution insists > on shipping 3.5 years old broken devel packages for some dependency, > and some other guidelines insist to not upgrade packages > if shipped by distribution. cluster-glue is not shipped by rhel anymore (possibly since 6.3, my memory is a little hazy). So it should be possible to add a newer version of it to EPEL, but I don't know the correct proceedure. Guys, if you rebuild heartbeat anyways, please use current mercurial tip not 3 years old 3.0.4. Ok, "current" as in, was committed 8 month ago. (Strange. I thought I wrote those patches together with those other 2012 ones.) There are several highly relevant fixes. Flaky network (first packet drop, then communication loss) could * potentially cause heartbeat core to eat up 100 % cpu, * potentially preventing heartbeat from ever connecting to that node again And * potentially heartbeat would segfault given bad timing of a node dead event * potentially heartbeat would not even notice a node as dead if it had massive packet loss just before that * in certain situations (again: packet loss helps to trigger it) the ccm would not converge, so nodes would not agree on membership If it helps I can tag that as 3.0.6 "soon". I'll cross-post this comment in the other bug, too. heartbeat-3.0.4-2.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/heartbeat-3.0.4-2.el6 Package heartbeat-3.0.4-2.el6: * should fix your issue, * was pushed to the Fedora EPEL 6 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=epel-testing heartbeat-3.0.4-2.el6' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-12278/heartbeat-3.0.4-2.el6 then log in and leave karma (feedback). heartbeat-3.0.4-2.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report. Hi almighties, just applied this minor update to our few cluster and guess what -> clusters is dead . I explain below : This new version update( 3.0.4-1.el6 to 3.0.4-2.el6 ) just broke our clusters 's unicast fonctionnality taking origine to this new patch puches by this bugreport version. related broken patch : heartbeat-3.0.4-duplicate-ucast.patch the result is heartbeat cannot start cause ucast (used in /etc/ha.d/ha.cf) cannot work with following error in logs : info: glib: Starting serial heartbeat on tty /dev/ttyS1 (19200 baud) info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on br1 info: glib: ucast: bound send socket to device: br1 ERROR: glib: ucast: error setting option SO_REUSEPORT(w): Protocol not available ERROR: make_io_childpair: cannot open ucast br1 CRIT: Emergency Shutdown: Master Control process died. CRIT: Killing pid 11194 with SIGTERM CRIT: Killing pid 11198 with SIGTERM CRIT: Killing pid 11199 with SIGTERM CRIT: Emergency Shutdown(MCP dead): Killing ourselves. When i downgrade to version 3.0.4-1.el6 it's all working back well. So the patch applied in this bug report create a regression on unicast functionality. Please rollback or finish/stabilize the patch "heartbeat-3.0.4-duplicate-ucast.patch". I can test a new version if you want me to , before you push it to stable REPO. Regards, aurelien Lemaire from Smile Hosting. (In reply to Smile hosting from comment #17) double posted in the other bug, answered there: https://bugzilla.redhat.com/show_bug.cgi?id=1028127#c55 Lars |