1028127 – Heartbeat not working on centos6 after last update

Bug 1028127 - Heartbeat not working on centos6 after last update [NEEDINFO]

Summary: Heartbeat not working on centos6 after last update

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora EPEL
Classification:	Fedora
Component:	heartbeat
Sub Component:
Version:	el6
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Kevin Fenzi
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1028957 (view as bug list)
Depends On:	869826
Blocks:
TreeView+	depends on / blocked

Reported:	2013-11-07 17:44 UTC by qmic
Modified:	2018-12-03 20:35 UTC (History)
CC List:	20 users (show)
Fixed In Version:	heartbeat-3.0.4-2.el6
Clone Of:
Environment:
Last Closed:	2013-12-18 00:19:11 UTC
Type:	Bug
Embargoed:
Flags:	tony.abohwo: needinfo? tony.abohwo: needinfo?

Attachments	(Terms of Use)
make heartbeat compile against rhel cluster-glue-libs-devel 1.0.5 and fix init script (2.84 KB, patch) 2013-11-30 10:01 UTC, Lars Ellenberg	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	638323	0	None	None	None	Never

Description qmic 2013-11-07 17:44:42 UTC

Description of problem:
Heartbeat`s init.d scripts stopped working after update.


Version-Release number of selected component (if applicable):
heartbeat-3.0.4-1.el6.x86_64

How reproducible:
Try to restart service

Steps to Reproduce:
1. service heartbeat status or service heartbeat start
2.
3.

Actual results:
Nothing (empty line) 

Expected results:
heartbeat OK [pid 1377 et al] is running on et [et]...
Or any error. 

Additional info:
In logs there is  no any information related

Comment 1 Robert Scheck 2013-11-07 19:14:33 UTC

I wonder if this is maybe SELinux related? Can you try "setenforce 0"?

Comment 2 Kevin Fenzi 2013-11-07 22:38:36 UTC

Also a 'sh -x /etc/init.d/heartbeat start' output might be good if you can attach it.

Comment 3 Nick Hope 2013-11-08 01:07:44 UTC

This appears to be related to the update of the resource-agents package (resource-agents-3.9.2-21.el6_4.8). The HA_BIN variable is set in /usr/lib/ocf/resource.d/heartbeat/.ocf-directories.

In previous versions of the package, this was set to the following:

: ${HA_BIN:=/usr/lib64/heartbeat}

In resource-agents-3.9.2-21.el6_4.8, HA_BIN is set to the following:

: ${HA_BIN:=/usr/libexec/heartbeat}

The heartbeat package from EPEL places the heartbeat binary in the /usr/lib64/heartbeat directory. A workaround of symlinking /usr/lib64/heartbeat/heartbeat to /usr/libexec/heartbeat/heartbeat seems to work for now.

Comment 4 qmic 2013-11-08 12:18:05 UTC

This isn`t SELinux related.
Nick Hope, thanks that worked!

Comment 5 Kevin Fenzi 2013-11-08 15:35:35 UTC

Would someone be willing to file a new bug or move this over to resource-agents then? 

it would be better if it would get addressed there if possible.

Comment 6 qmic 2013-11-08 18:42:27 UTC

I cannot find resource-agents on components list.

Comment 7 Kevin Fenzi 2013-11-08 20:25:22 UTC

Are you looking under RHEL?

https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%206

Comment 9 Fabio Massimo Di Nitto 2013-11-11 09:00:33 UTC

David, it looks like this is a consequence of multilib support and PATH handling.

Either heartbeat or resource-agents should ship a compatibility symlink.

In any case this bug doesn´t affect RHEL since we don´t ship or support heartbeat.

Comment 10 Tuomo Soini 2013-11-11 20:10:44 UTC

I'd suggest doing the same change for HA_BIN in heartbeat package.

Comment 11 Robert Scheck 2013-11-11 20:12:00 UTC

Cross-filed case 00979415 on the Red Hat customer portal as it breaks all of
our existing legacy heartbeat setups as it seems. The case requests that it's 
solved either or by the compatibility symlink and if the compat symlink gets
refused for RHEL that Red Hat supports the EPEL package maintainer.

Comment 12 Fabio Massimo Di Nitto 2013-11-11 20:19:03 UTC

(In reply to Robert Scheck from comment #11)
> Cross-filed case 00979415 on the Red Hat customer portal as it breaks all of
> our existing legacy heartbeat setups as it seems. The case requests that
> it's 
> solved either or by the compatibility symlink and if the compat symlink gets
> refused for RHEL that Red Hat supports the EPEL package maintainer.

We have no problem to help the EPEL package maintainer.

The reason why the path was changed was to accommodate multilib requirement for RHEL.

I don't honestly know if using a compatibility symlink will cause multilib issues, but definitely we will find a solution.

As for the customer case, please remember that we do not support heartbeat in RHEL and that even resource-agents for pacemaker (the heartbeat portions) are still TechPreview (not supported) by RHEL.

Comment 13 Robert Scheck 2013-11-11 20:28:32 UTC

(In reply to Fabio Massimo Di Nitto from comment #12)
> I don't honestly know if using a compatibility symlink will cause multilib
> issues, but definitely we will find a solution.

That's why I wonder why you use %{_libexecdir} instead of %{_libdir} now, but
that's out of my target.

> As for the customer case, please remember that we do not support heartbeat
> in RHEL and that even resource-agents for pacemaker (the heartbeat portions)
> are still TechPreview (not supported) by RHEL.

Yes, I am aware about that. But it's still not nice to break things thus I try
to provide valuable feedback - and still care about our own customers at work.

You also should keep in mind that AFAIK Linbit (the DRBD developers) still 
supports Heartbeat and their customers won't be amused about this, I guess.
Correct me but a customer case is the only real chance to get some attention
to IMHO not so well thought package changes during RHEL 6.x (sorry!).

Comment 14 Kevin Fenzi 2013-11-11 20:34:07 UTC

FYI, from my side (heartbeat maintainer in EPEL), I'm happy to try and make changes or add any interested folks who would like to co-maintain that might have more time than I do to help maintain.

Comment 15 Fabio Massimo Di Nitto 2013-11-11 20:37:57 UTC

(In reply to Robert Scheck from comment #13)
> (In reply to Fabio Massimo Di Nitto from comment #12)
> > I don't honestly know if using a compatibility symlink will cause multilib
> > issues, but definitely we will find a solution.
> 
> That's why I wonder why you use %{_libexecdir} instead of %{_libdir} now, but
> that's out of my target.
> 
> > As for the customer case, please remember that we do not support heartbeat
> > in RHEL and that even resource-agents for pacemaker (the heartbeat portions)
> > are still TechPreview (not supported) by RHEL.
> 
> Yes, I am aware about that. But it's still not nice to break things thus I
> try
> to provide valuable feedback - and still care about our own customers at
> work.
> 

Yes we all agree. That's why we will find a fix in one way or another (that being in resource-agents or heartbeat in EPEL).

> You also should keep in mind that AFAIK Linbit (the DRBD developers) still 
> supports Heartbeat and their customers won't be amused about this, I guess.

Linbit ships their own set of packages. I doubt they will be affected by this changes. But then again, we never claimed full support to allow us to change packages as necessary. EPEL and RHEL packaging guidelines are different.

> Correct me but a customer case is the only real chance to get some attention
> to IMHO not so well thought package changes during RHEL 6.x (sorry!).

Not really no.. the bug was getting attention without the customer case. GSS can't do much either way. They don't maintain EPEL nor they provide support for TP components.

We can also agree that this breakage could have been avoided tho.

Comment 16 Fabio Massimo Di Nitto 2013-11-11 20:39:00 UTC

(In reply to Kevin Fenzi from comment #14)
> FYI, from my side (heartbeat maintainer in EPEL), I'm happy to try and make
> changes or add any interested folks who would like to co-maintain that might
> have more time than I do to help maintain.

Let's wait Wed for David to come back and discuss quickly the correct fix.

Comment 17 Lars Ellenberg 2013-11-12 08:29:32 UTC

My colleague pointed me to this bug.
If you get other bugs regarding heartbeat or resource-agents,
feel free to add me to Cc proactively,
so later I can not pretend I did not know about it ;-)

A new *resource-agents* version, 3.9.6 is overdue,
it was announced to be released last month :-/

As soon as I find the time we'll release that.
(Which should be by the end of this November (honestly!)
 (or someone else takes over)).

Then I can move all binaries and other stuff that does not belong into libdir
(according to what guidelines? can someone shoot me a link please?)
in the *heartbeat* package to libexecdir as well,
also drop the useless legacy init script dependency on the
HA_BIN definition meanwhile split into the resource-agents package,
tag a heartbeat 3.0.6, and have that require resource-agents >= 3.9.6
for good measure.

Meanwhile, symlinks or patching the heartbeat init script is necessary
for recent (newer than ~ July 2013) resource-agents with old heartbeat.

AFAICS, the only heartbeat "dependency" on that variable is in fact the
use of $HA_BIN/heartbeat in the init script, so only patching that
would be an option as well, but the the heartbeat package would still
violate the "multilib guidelines", I guess (did I mention I'd like a pointer
as to which guidelines to apply?), by putting executable binaries into libdir.

Other ideas?

    Lars

Comment 18 Fabio Massimo Di Nitto 2013-11-14 10:43:36 UTC

*** Bug 1028957 has been marked as a duplicate of this bug. ***

Comment 19 David Vossel 2013-11-15 16:09:45 UTC

(In reply to Lars Ellenberg from comment #17)
> My colleague pointed me to this bug.
> If you get other bugs regarding heartbeat or resource-agents,
> feel free to add me to Cc proactively,
> so later I can not pretend I did not know about it ;-)
> 
> A new *resource-agents* version, 3.9.6 is overdue,
> it was announced to be released last month :-/
> 
> As soon as I find the time we'll release that.
> (Which should be by the end of this November (honestly!)
>  (or someone else takes over)).
> 
> Then I can move all binaries and other stuff that does not belong into libdir
> (according to what guidelines? can someone shoot me a link please?)

Here are a couple of documents I found searching google.

1. https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s1-filesystem-fhs.html

"/usr/lib, used for object files and libraries that are not designed to be directly utilized by shell scripts or users"

"/usr/libexec, contains small helper programs called by other programs"


2. http://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-filesystem-fhs.html

"lib/ contains object files and libraries that are not designed to be directly utilized by users or shell scripts"

"libexec/ directory contains small helper programs called by other programs"


> in the *heartbeat* package to libexecdir as well,
> also drop the useless legacy init script dependency on the
> HA_BIN definition meanwhile split into the resource-agents package,
> tag a heartbeat 3.0.6, and have that require resource-agents >= 3.9.6
> for good measure.

Sounds great :)

> 
> Meanwhile, symlinks or patching the heartbeat init script is necessary
> for recent (newer than ~ July 2013) resource-agents with old heartbeat.
> 
> AFAICS, the only heartbeat "dependency" on that variable is in fact the
> use of $HA_BIN/heartbeat in the init script, so only patching that
> would be an option as well,

If the init script is the only reference to HA_BIN, then that seems like the best fix.

>  but the the heartbeat package would still
> violate the "multilib guidelines", I guess (did I mention I'd like a pointer
> as to which guidelines to apply?), by putting executable binaries into
> libdir.

Have heartbeat install the binaries in /usr/libexec/heartbeat as well. But do not depend on the resource-agent package to create or even use that directory (no coupling of the two packages)

> Other ideas?
>
>     Lars

Comment 20 Christoph Galuschka 2013-11-25 20:19:31 UTC

would it be worthwhile to also file a bug against EPEL for hearbeat (if not allready done)?

Comment 21 David Vossel 2013-11-25 20:49:47 UTC

(In reply to Christoph Galuschka from comment #20)
> would it be worthwhile to also file a bug against EPEL for hearbeat (if not
> allready done)?

Actually, this bug should be moved to heartbeat.

Comment 22 Christoph Galuschka 2013-11-26 10:17:36 UTC

David, can you do that, or is a new bug required?
Thanks

Comment 23 Robert Scheck 2013-11-26 11:13:57 UTC

Is there any workaround or intermediate solution that could go into the EPEL
heartbeat package? I read about the symlink, so is there any reason that we do
not simply put that one into the heartbeat package in EPEL?

Comment 24 David Vossel 2013-11-26 14:43:47 UTC

(In reply to Robert Scheck from comment #23)
> Is there any workaround or intermediate solution that could go into the EPEL
> heartbeat package? I read about the symlink, so is there any reason that we
> do
> not simply put that one into the heartbeat package in EPEL?

I do not maintain that package, but from what I've gathered there's a workaround involving a symlink in the /usr/libexec/heartbeat folder that points to a binary in the /usr/lib/heartbeat folder.

Comment 25 Christoph Galuschka 2013-11-26 14:47:34 UTC

David: Thanks

Comment 26 Robert Scheck 2013-11-26 15:24:40 UTC

(In reply to David Vossel from comment #24)
> I do not maintain that package, but from what I've gathered there's a
> workaround involving a symlink in the /usr/libexec/heartbeat folder that
> points to a binary in the /usr/lib/heartbeat folder.

I tried to rebuild heartbeat with that symlink for the time being, however
any heartbeat rebuild on RHEL 6.5 fails with "error: 'HA_LIBHBDIR' undeclared 
(first use in this function)" due to cluster-glue-libs-devel-1.0.5-6.el6. It
was first reported via bug #869826.

Comment 27 Lars Ellenberg 2013-11-26 16:02:52 UTC

please just fix the heartbeat init script for now.

Something like this
 
sed -i -e 's,\$HA_BIN/heartbeat,$HEARTBEAT,g' heartbeat/init.d/heartbeat.in
sed -i -e $'/^### END INIT INFO/ a\\\n\\\nHEARTBEAT=@libdir@/heartbeat\n\n' heartbeat/init.d/heartbeat.in

[sorry, right now I'm deep in other deep shit ;-)
 or I'd prepare, test and commit upstream this "hotfix" myself]

There is no point using "$HA_BIN" in this init script at all.
Much less so now that variable is in fact provided by an other project
which just happens to be some decendend of something that used to be
part of heartbeat...

Comment 28 Robert Scheck 2013-11-26 16:05:52 UTC

As long as heartbeat does not build on RHEL 6 anymore, we are even not able
to ship this workaround, but users have to execute these sed calls themself.

Comment 29 Lars Ellenberg 2013-11-26 16:08:55 UTC

uhm, those sed calls where meant to be executed in a heartbeat source checkout ;)
but yes, similar would work to patch the init script in a "live" system.

But you are right, as long as heartbeat does not currently rebuild at all
for your setups, this does not really help much.

Comment 30 Dimitri Maziuk 2013-11-29 19:06:18 UTC

(quote Lars Ellenberg from comment #29)
... 
> But you are right, as long as heartbeat does not currently rebuild at all
> for your setups, this does not really help much.

And if rebuilding it is going to take some work, you might consider adding a little more work while you're at it and repackaging it so it doesn't depend on any of the current linux-ha stuff like cluster-glue and resource-agents.

The rationale is that a) heartbeat is unmaintained legacy code while the curent stuff is still a moving target so there's sure to be more incompatible changes coming to break things. And that "things" heartbeat tends to be used for are  mission-critical installations and when those break b) people tend to get really pissed off at Fedora, RedHat, and Linux-HA, which is not something anyone wants (including the pissed-off: it's bad for our blood pressure).

Comment 31 Lars Ellenberg 2013-11-29 20:32:33 UTC

(In reply to Dimitri Maziuk from comment #30)
> (quote Lars Ellenberg from comment #29)
> ... 
> > But you are right, as long as heartbeat does not currently rebuild at all
> > for your setups, this does not really help much.
> 
> And if rebuilding it is going to take some work, you might consider adding a
> little more work while you're at it and repackaging it 

Yes, as has been mentioned before,
we likely need to repackage it according to the "guidelines",
put binaries in "libexec", put dynamically loadable stuff in lib{,64}.


> so it doesn't depend
> on any of the current linux-ha stuff like cluster-glue and resource-agents.

Certainly impossible.
That has all been one monolithic package.
When the package split was done,
the messaging and ipc was moved into "glue",
so heartbeat will always have to depend on glue.

And the resource-agents, well, have been moved into "resource-agents",
and what remains in heartbeat is only wrappers to use the "ocf" resource agents
from the old "haresources" ResourceManager script.

Which means this dependency is *also* a hard dependency.

> The rationale is that a) heartbeat is unmaintained legacy code

We very much *do* maintain it.
We do use it in production a lot,
and we do use it in production also with pacemaker on top.

Just that there have not been much code changes in a while
does not mean it is unmaintained.
It's "stable".
It is pretty ugly in its insides at many places, but so are many other projects.
But it does work.

So appart from the occasional bug report about possible misbehaviour
in certain corner cases, there won't be any "developement" happening:
it just has no intention to go anywhere, being "feature complete".

> while the curent stuff is still a moving target so there's sure to be more
> incompatible changes coming to break things.

Uhm, no.
Glue was no moving target at all, it only tried to keep up with all the
breakage caused by pacemaker progess ;-)

And after having rewritten much everything that used to be glue,
using libqb as messaging layer, using its own re-written-from scratch
lrmd, using its own stonithd and so on,
pacemaker no longer depends on glue in any way
(unless you try to use it on top of heartbeat,
as it then needs to use that messaging layer).

> And that "things" heartbeat
> tends to be used for are  mission-critical installations and when those
> break b) people tend to get really pissed off at Fedora, RedHat, and
> Linux-HA, which is not something anyone wants (including the pissed-off:
> it's bad for our blood pressure).

Never change a running system ;-)

Or use a supported stack.

(You also did notice that this "does not build" breakage is because
of some incompatibility with a 3.5 years old glue-devel,
and that heartbeat *does* build fine against
the current glue devel, right?)

Comment 32 Dimitri Maziuk 2013-11-29 21:01:13 UTC

(In reply to Lars Ellenberg from comment #31)
...
> We very much *do* maintain it.
> We do use it in production a lot,
> and we do use it in production also with pacemaker on top.
> 
> Just that there have not been much code changes in a while
> does not mean it is unmaintained.
> It's "stable".

and

> Or use a supported stack.

When I did my Software Engineering 101 "maintained" meant "supported". (Admittedly it wasn't this century and I sort of stopped paying attention sometime after design patterns. So maybe it doesn't anymore, what do I know.)

> Glue was no moving target at all, it only tried to keep up with all the
> breakage caused by pacemaker progess ;-)

"Keep up" by "not moving" is way too zen for me.

So cluster-glue is "stable" and does not have to keep up with pacemaker progress anymore, great. Could there be a package heartbeat-resource-agents that similarly doesn't have to keep up while standing still?

> Never change a running system ;-)

Why bother releasing updates if I'm supposed to never install them?

Comment 33 Lars Ellenberg 2013-11-30 09:58:29 UTC

(In reply to Dimitri Maziuk from comment #32)
> (In reply to Lars Ellenberg from comment #31)
> ...
> > We very much *do* maintain it.
> > We do use it in production a lot,
> > and we do use it in production also with pacemaker on top.
> > 
> > Just that there have not been much code changes in a while
> > does not mean it is unmaintained.
> > It's "stable".
> 
> and
> 
> > Or use a supported stack.
> 
> When I did my Software Engineering 101 "maintained" meant "supported".
> (Admittedly it wasn't this century and I sort of stopped paying attention
> sometime after design patterns. So maybe it doesn't anymore, what do I know.)

I don't think "maintained" and "supported" are exact synonyms.
But even for your narrow definition:
SuSE supports it.
Linbit supports it.
Others may, too...
 
> > Glue was no moving target at all, it only tried to keep up with all the
> > breakage caused by pacemaker progess ;-)
> 
> "Keep up" by "not moving" is way too zen for me.

Think prey and archer (standing still, but keeping up his aim)
 
> So cluster-glue is "stable" and does not have to keep up with pacemaker
> progress anymore, great. Could there be a package heartbeat-resource-agents
> that similarly doesn't have to keep up while standing still?

Hey I have been pissed of by all those unneccessary and incompatible
package splits and breakages all the time, believe it or not.

But neither the "unmaintained" nor the "moving target" thing
was your central point, I hope, and the "unmaintained" simply struck
a nerve here, I very much dislike heartbeat being spoken of as not maintained,
when it is.

> > Never change a running system ;-)
> 
> Why bother releasing updates if I'm supposed to never install them?

But it is ok to blame upstream heartbeat for a breakage that happened
in the 3.5 years old RHEL-only glue package?

I was simply trying to get through that
 * you are blaming the wrong package
 * you are suggesting the wrong cure...

But alas, there is no point.
it's never the fault of he who broke it,
it is always he who does not longer work.

I'll add an attachment to "make heartbeat compile again" in a minute,
to make you all happy again... who cares for the blame game, after all,
we all want "it" to "just work", right?

*sigh*

Comment 34 Lars Ellenberg 2013-11-30 10:01:15 UTC

Created attachment 830886 [details]
make heartbeat compile against rhel cluster-glue-libs-devel 1.0.5 and fix init script

make heartbeat compile against rhel cluster-glue-libs-devel 1.0.5
and fix init script which was broken by recent HA_BIN redefinition
in resource-agents.

May be incomplete, and resulting package is untested.
But should get you going.
Let me know the final fix you settle on,
so I can push similar in heartbeat upstream, too.

Comment 35 Kevin Fenzi 2013-11-30 17:50:26 UTC

Thanks very much Lars. I actually dug into this earlier in the week, but didn't get a fully building package, I was going to look again today. ;) 

Anyhow, I took your patch and added another one to fix another compile issue and have a scratch build: 

http://koji.fedoraproject.org/koji/taskinfo?taskID=6241293

Could folks please test this and provide feedback? If it looks good, I can push an update.

Comment 36 Christoph Galuschka 2013-11-30 19:09:39 UTC

Kevin: I will try to test those builds next week on monday (hopefully). Thanks for providing them.

Comment 37 Dimitri Maziuk 2013-11-30 21:13:26 UTC

(In reply to Kevin Fenzi from comment #35)

> Could folks please test this and provide feedback? If it looks good, I can
> push an update.

/etc/init.d/ patch gets heartbeat started -- I have it running since the day it broke.

There are 11 other binaries and a bunch os .so's in subdirs in /usr/lib64/heartbeat installed by heartbeat rpm. There are 3 binaries in /usr/libexec/heartbeat installed by resource-agents.

Does anyone know if anything else in the location formerly known as $HA_BIN is ever used?

Comment 38 Dimitri Maziuk 2013-11-30 21:36:13 UTC

(In reply to Lars Ellenberg from comment #33)

> SuSE supports it.
> Linbit supports it.

As a centos user filing a bug against epel rpm, I'm glad to hear that. Especially from a guy known to start his replies on linux-ha mailing list with "are you a paying suse customer?"

You have a day job, we get it, so do I, so does Kevin. Until linbit and suse start paying him for his effort, the fact that they support their customers is irrlevant here.

> I was simply trying to get through that
>  * you are blaming the wrong package
>  * you are suggesting the wrong cure...

I am saying that *epel* package X is not being developed or updated, nor supported by anything other than goodness of Kevin's heart. It depends on *rhel* package Y that is being developed as part of redhat's effort to bring software Z into their distro. To me that spells high chance of X being broken by an update to Z, which will be unfixable in the current package layout.

You are suggesting that will never happen -- after all the history of linux-ha and all the splits and forks you yourself aren't happy about. I hope you're right, it'd sure make my life easier.

Comment 39 Christoph Galuschka 2013-12-01 12:01:51 UTC

(In reply to Kevin Fenzi from comment #35)
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=6241293
> 
> Could folks please test this and provide feedback? If it looks good, I can
> push an update.

Kevin: I tested today with two VMs (x64 and i386) and 6.5 and it is looking good. I will do some more testing on real iron and for a longer period tomorrow.

Comment 40 Christoph Galuschka 2013-12-02 10:11:40 UTC

Kevin. I now have the new heartbeat also running on real iron machines (6.5 with current resource-agents) where IPs are monitored - so far looking good.

Comment 41 Lars Ellenberg 2013-12-02 10:42:38 UTC

(In reply to Dimitri Maziuk from comment #38)
> (In reply to Lars Ellenberg from comment #33)
> 
> > SuSE supports it.
> > Linbit supports it.
> 
> As a centos user filing a bug against epel rpm, I'm glad to hear that.
> Especially from a guy known to start his replies on linux-ha mailing list
> with "are you a paying suse customer?"

You know, thank you, but I'm the Lars. (Ellenberg)
*That* is *the other* Lars ;-) (Marowsky-Brée).

> You have a day job, we get it, so do I, so does Kevin

You realized that I was certainly NOT fighting with Kevin,
but telling you (Dimitri) that even for your incorrect and too narrow definition
of "maintained", you are wrong?

> Until linbit and suse
> start paying him for his effort, the fact that they support their customers
> is irrlevant here.

Absolutely.

> I am saying that *epel* package X is not being developed or updated,

You are complaining that an unsupported stack stopped working.
And that trying to fix that by rebuilding the no longer working package
does not work either, because yet an other package dropped a define
and heartbeat has not compiled on the platform you chose
*for over three years*

And if I say "don't complain that it broke, you used an unsupported stack.
Your options are to either use a supported stack,
or fix who has broken it (not who was broken by it)",
then you complain even louder, and forbid me to speak?
 :-)

I would have skipped this comment altogether
if you had not mistaken me for lmb; me is lge.
So if you insist on arguing with me further about definitions,
wording, and the blame game, take it to private mail ...

Comment 42 Lars Ellenberg 2013-12-02 10:57:07 UTC

Guys,

if you rebuild heartbeat anyways,
please use current mercurial tip
not 3 years old 3.0.4.

Ok, "current" as in, was committed 8 month ago.
(Strange. I thought I wrote those patches together with those other 2012 ones.)

There are several highly relevant fixes.
Flaky network (first packet drop, then communication loss) could
 * potentially cause heartbeat core to eat up 100 % cpu, 
 * potentially preventing heartbeat from ever connecting to that node again
And
 * potentially heartbeat would segfault given bad timing of a node dead event
 * potentially heartbeat would not even notice a node as dead
   if it had massive packet loss just before that
 * in certain situations (again: packet loss helps to trigger it)
   the ccm would not converge, so nodes would not agree on membership

If it helps I can tag that as 3.0.6 "soon".
I'll cross-post this comment in the other bug, too.

Comment 43 Kevin Fenzi 2013-12-02 16:19:15 UTC

Well, how about we push this old one out with the fix for this issue now... and then when you tag 3.0.6 push it out as soon as it's available?

I'd prefer to get people working as they were before without too many changes in one update...

Comment 44 Fedora Update System 2013-12-02 16:43:30 UTC

heartbeat-3.0.4-2.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/heartbeat-3.0.4-2.el6

Comment 45 Dimitri Maziuk 2013-12-02 18:21:45 UTC

(In reply to Lars Ellenberg from comment #41)

> You know, thank you, but I'm the Lars. (Ellenberg)
> *That* is *the other* Lars ;-) (Marowsky-Brée).

Sorry, brain fart. Always knew working on weekend's bad for me.

> And if I say "don't complain that it broke, you used an unsupported stack.
> Your options are to either use a supported stack,
> or fix who has broken it (not who was broken by it)",
> then you complain even louder, and forbid me to speak?

No, I'm saying we can't fix who has broken it because it has nothing to do with any of us. It's RHEL pulling in the other stack in order to pull in the other other stack (RDO) -- the situation known as "too many cooks". 

I suppose I can add resource-agents to yum.conf's exclude list...

Comment 46 Dimitri Maziuk 2013-12-02 18:46:30 UTC

> (In reply to Lars Ellenberg from comment #41)

PS. I get it, it's heartbeat's bug: wrong path in /etc/init.d script. This time. Next time RHEL fixes something else in resource-agents and it won't be. What matters is heartbeat will be broke again.

Comment 47 Fedora Update System 2013-12-03 01:23:15 UTC

Package heartbeat-3.0.4-2.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing heartbeat-3.0.4-2.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2013-12278/heartbeat-3.0.4-2.el6
then log in and leave karma (feedback).

Comment 48 Robert Scheck 2013-12-04 22:37:09 UTC

Lars, Kevin, thank you very much for your time and work! The update in EPEL
testing works here fine and as expected. No issues so far. Great work! :)

Comment 49 Christoph Galuschka 2013-12-10 17:16:29 UTC

Kevin: Is it possible the change to heartbeat also changed the behaviour of ifconfig (as it does no longer return the HA-IP)? 'ip addr list' works however.

Comment 50 Dimitri Maziuk 2013-12-10 17:50:13 UTC

(In reply to Christoph Galuschka from comment #49)
> Kevin: Is it possible the change to heartbeat also changed the behaviour of
> ifconfig (as it does no longer return the HA-IP)? 'ip addr list' works
> however.

My guess is it's whatever resource-agent that ends up handling IPAddr that did it. I've a vague memory that the pacemaker's resource agent's been doing that for quite some time, it just probably never made it into redhat until now.

Which is why I was bitching upthread: wrong $HA_BIN location is not the only thing that changed. RHEL is making more changes to resource-agents and they have no reason to maintain compatibility with EPEL's heartbeat RPM.

Comment 51 Kevin Fenzi 2013-12-10 17:55:23 UTC

Can we stop piling on unrelated issues here please?

The ip addr thing I seem to recall was a difference between using: 
/etc/ha.d/resource.d/IPaddr
and
/etc/ha.d/resource.d/IPaddr2
resources, but I don't recall fully. 

If you have another concrete bug, please file a new bug on it. Thanks.

Comment 52 Dimitri Maziuk 2013-12-10 18:21:39 UTC

(In reply to Kevin Fenzi from comment #51)

> The ip addr thing I seem to recall was a difference between using: 
> /etc/ha.d/resource.d/IPaddr
> and
> /etc/ha.d/resource.d/IPaddr2

No. Unless you're saying the Lars got unstuck in space-time and rewrote several haresources files here to invoke IPaddr2 instead of IPaddr while I was yum-updating resource-agents.

Comment 53 Fedora Update System 2013-12-18 00:19:11 UTC

heartbeat-3.0.4-2.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 54 Smile hosting 2014-02-11 13:16:18 UTC

Hi almighties,

just applied this minor update to our few cluster and guess what -> clusters is dead . I explain below :

This new version update( 3.0.4-1.el6 to 3.0.4-2.el6 ) just broke our clusters 's unicast fonctionnality taking origine to this new  patch puches by this bugreport version.

related broken patch : heartbeat-3.0.4-duplicate-ucast.patch

the result is heartbeat cannot start cause ucast (used in /etc/ha.d/ha.cf) cannot work with following error in logs :
info: glib: Starting serial heartbeat on tty /dev/ttyS1 (19200 baud)
info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on br1
info: glib: ucast: bound send socket to device: br1
ERROR: glib: ucast: error setting option SO_REUSEPORT(w): Protocol not available
ERROR: make_io_childpair: cannot open ucast br1
CRIT: Emergency Shutdown: Master Control process died.
CRIT: Killing pid 11194 with SIGTERM
CRIT: Killing pid 11198 with SIGTERM
CRIT: Killing pid 11199 with SIGTERM
CRIT: Emergency Shutdown(MCP dead): Killing ourselves.

When i downgrade to version 3.0.4-1.el6 it's all working back well.
So the patch applied in this bug report create a regression on unicast functionality.

Please rollback or finish/stabilize the patch "heartbeat-3.0.4-duplicate-ucast.patch".

I can test a new version if you want me to , before you push it to stable REPO.

Regards, aurelien Lemaire from Smile Hosting.

Comment 55 Lars Ellenberg 2014-02-12 14:29:41 UTC

Actual bug is: SO_REUSEPORT defined by headers, but not supported by kernel.

> --- Comment #17 from Smile hosting <hosting> ---
> just applied this minor update to our few cluster and guess what -> clusters is
> dead . I explain below :
> 
> This new version update( 3.0.4-1.el6 to 3.0.4-2.el6 ) just broke our clusters
> 's unicast fonctionnality taking origine to this new  patch puches by this
> bugreport version.
> 
> related broken patch : heartbeat-3.0.4-duplicate-ucast.patch
> 
> the result is heartbeat cannot start cause ucast (used in /etc/ha.d/ha.cf)
> cannot work with following error in logs :
> info: glib: Starting serial heartbeat on tty /dev/ttyS1 (19200 baud)
> info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on br1
> info: glib: ucast: bound send socket to device: br1
> ERROR: glib: ucast: error setting option SO_REUSEPORT(w): Protocol not
> available

> When i downgrade to version 3.0.4-1.el6 it's all working back well.
> So the patch applied in this bug report create a regression on unicast
> functionality.

No, it does not.

But at the time that -1 binary package was built, SO_REUSEPORT was not
defined...  when the -2 binary package was built, apparently the define
was there.  But your kernel does not support it (yet).

If you try to rebuild the -1 package now, against the same headers
the -2 package was built, it will break with compile time error.
That compile time error was what said patch tries to trivially fix.

Only that this then breaks at runtime when compiled against
too recent headers but run against too old linux kernel.

See upstream mercurial for my attempt at fixing this:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/37f57a36a2dd

I suggest you update to upstream mercurial,
or replace your ucast patch with the above.


Cheers,
	Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

Comment 56 Smile hosting 2014-02-12 17:12:51 UTC

Hi Lars,

Thansk for your answser.
I now owe you the context :

I'm in full vanilla centos6 (as this bugreport talk about) up-to-date without any home-cook rebuild of any package. with vanille EL6 REpo for heartbeat packages

uname -a  : Linux HOSTNAME 2.6.32-358.14.1.el6.x86_64 #1 SMP Tue Jul 16 23:51:20 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Vanilla package : 
me-filer5:~# rpm -qa |egrep 'heartbeat|kernel-'
heartbeat-libs-3.0.4-1.el6.x86_64 (working version)
heartbeat-3.0.4-1.el6.x86_64  (working version)
kernel-2.6.32-358.14.1.el6.x86_64


The EPEL EL6 repo currently proposed an update for both heartbeat package to :
heartbeat-libs-3.0.4-2.el6.x86_64 (not working version)
heartbeat-3.0.4-2.el6.x86_64 (not working version)

The patch i refered to is the one i noticed in a diff of the -1 and -2 SRC.rpm  of heartbeat package made by the EPEL heartbeat package maintener.

I confirmed the -2 version of those package does not work anymore on a vanilla EL6 with vanilla EL6 kernel which i supposed is not intended but unfotunate.

Hope it helps understanding the situation.

regards, Aurelien Lemaire

Comment 57 Christoph Galuschka 2014-02-12 17:21:09 UTC

As this might solve it - with regards to the kernel age - the one you use is from 6.4 and thus from July last year. 2.6.32-431.5.1 would be the current version.
All I can add is, those packages from EPEL work fine for me.

Comment 58 Robert Scheck 2014-02-12 17:25:29 UTC

Please also note that Fedora EPEL 6 officially only supports the latest
version of RHEL/CentOS 6, so currently RHEL 6.5. It might work (or not)
with RHEL 6.4, 6.3, etc. Aside of that I can not see any issues here...

Comment 59 Smile hosting 2014-02-13 09:57:41 UTC

Hi,

Now i owe you all my facepalm meaculpa.

My Puppet servant start excluding my kernel from updates since the 2.6.32-358 .Thus i was indeed using an old kernel.

In conclusion after fix and update:  
with vanilla UP-TO-DATE kernel the EPEL heartbeat package work like a charm with following packages version :
rpm -qa |egrep 'heartbeat|kernel-2.6'
heartbeat-libs-3.0.4-2.el6.x86_64
kernel-2.6.32-431.el6.x86_64
heartbeat-3.0.4-2.el6.x86_64
kernel-2.6.32-358.14.1.el6.x86_64

My bad.

Regards, Aurélien Lemaire

Comment 60 Tony 2015-01-28 10:20:48 UTC

Hi all,

I have an issue where I have configured heartbeat to run on a 2 node httpd cluster, heartbeat seems to be running when i check logs and I see that node1 comes up on web page, but when i shutdown heartbeat so that node2 would failover, it does not work. This is the log i see on node1... 

tailf /var/log/ha-log
Jan 28 09:48:04 node1 heartbeat: [2420]: info: Configuration validated. Starting heartbeat 3.0.4
Jan 28 09:48:04 node1 heartbeat: [2421]: info: heartbeat: version 3.0.4
Jan 28 09:48:04 node1 heartbeat: [2421]: info: Heartbeat generation: 1422435302
Jan 28 09:48:04 node1 heartbeat: [2421]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Jan 28 09:48:04 node1 heartbeat: [2421]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jan 28 09:48:04 node1 heartbeat: [2421]: info: Local status now set to: 'up'
Jan 28 09:48:04 node1 heartbeat: [2421]: info: Link node1:eth0 up.
Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: node node2: is dead
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Comm_now_up(): updating status to active
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Local status now set to: 'active'
Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: No STONITH device configured.
Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: Shared disks are not protected.
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Resources being acquired from node2.
harc(default)[2433]:	2015/01/28_09:50:05 info: Running /etc/ha.d//rc.d/status status
mach_down(default)[2469]:	2015/01/28_09:50:05 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[2469]:	2015/01/28_09:50:05 info: mach_down takeover complete for node node2.
Jan 28 09:50:05 node1 heartbeat: [2421]: info: mach_down takeover complete.
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Initial resource acquisition complete (mach_down)
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2501]:	2015/01/28_09:50:05 INFO:  Resource is stopped
Jan 28 09:50:05 node1 heartbeat: [2434]: info: Local Resource acquisition completed.
harc(default)[2588]:	2015/01/28_09:50:06 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp(default)[2588]:	2015/01/28_09:50:06 received ip-request-resp 172.31.29.243 OK yes
ResourceManager(default)[2611]:	2015/01/28_09:50:06 info: Acquiring resource group: node1 172.31.29.243 httpd
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2639]:	2015/01/28_09:50:06 INFO:  Resource is stopped
ResourceManager(default)[2611]:	2015/01/28_09:50:06 info: Running /etc/ha.d/resource.d/IPaddr 172.31.29.243 start
IPaddr(IPaddr_172.31.29.243)[2737]:	2015/01/28_09:50:06 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0
IPaddr(IPaddr_172.31.29.243)[2737]:	2015/01/28_09:50:06 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.31.29.243)[2737]:	2015/01/28_09:50:06 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2723]:	2015/01/28_09:50:06 INFO:  Success
Jan 28 09:50:16 node1 heartbeat: [2421]: info: Local Resource acquisition completed. (none)
Jan 28 09:50:16 node1 heartbeat: [2421]: info: local resource transition completed.





node2 i see this

tailf /var/log/ha-log
Jan 28 09:27:22 node2 heartbeat: [1646]: info: Configuration validated. Starting heartbeat 3.0.4
Jan 28 09:27:22 node2 heartbeat: [1647]: info: heartbeat: version 3.0.4
Jan 28 09:27:22 node2 heartbeat: [1647]: info: Heartbeat generation: 1422435301
Jan 28 09:27:22 node2 heartbeat: [1647]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Jan 28 09:27:22 node2 heartbeat: [1647]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jan 28 09:27:22 node2 heartbeat: [1647]: info: Local status now set to: 'up'
Jan 28 09:27:22 node2 heartbeat: [1647]: info: Link node2:eth0 up.
Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: node node1: is dead
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Comm_now_up(): updating status to active
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Local status now set to: 'active'
Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: No STONITH device configured.
Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: Shared disks are not protected.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Resources being acquired from node1.
Jan 28 09:29:23 node2 heartbeat: [1656]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys node2] to acquire.
harc(default)[1655]:	2015/01/28_09:29:23 info: Running /etc/ha.d//rc.d/status status
mach_down(default)[1685]:	2015/01/28_09:29:23 info: Taking over resource group 172.31.29.243
ResourceManager(default)[1712]:	2015/01/28_09:29:23 info: Acquiring resource group: node1 172.31.29.243 httpd
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1740]:	2015/01/28_09:29:23 INFO:  Resource is stopped
ResourceManager(default)[1712]:	2015/01/28_09:29:23 info: Running /etc/ha.d/resource.d/IPaddr 172.31.29.243 start
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1824]:	2015/01/28_09:29:23 INFO:  Success
mach_down(default)[1685]:	2015/01/28_09:29:23 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[1685]:	2015/01/28_09:29:23 info: mach_down takeover complete for node node1.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: mach_down takeover complete.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Initial resource acquisition complete (mach_down)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: Local Resource acquisition completed. (none)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: local resource transition completed.
^Z
[1]+  Stopped                 tailf /var/log/ha-log
[root@ip-172-31-29-242 ~]# tailf /var/log/ha-log
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1824]:	2015/01/28_09:29:23 INFO:  Success
mach_down(default)[1685]:	2015/01/28_09:29:23 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[1685]:	2015/01/28_09:29:23 info: mach_down takeover complete for node node1.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: mach_down takeover complete.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Initial resource acquisition complete (mach_down)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: Local Resource acquisition completed. (none)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: local resource transition completed.

Comment 61 Tony 2015-01-28 10:21:30 UTC

Hi all,

I have an issue where I have configured heartbeat to run on a 2 node httpd cluster, heartbeat seems to be running when i check logs and I see that node1 comes up on web page, but when i shutdown heartbeat so that node2 would failover, it does not work. This is the log i see on node1... 

tailf /var/log/ha-log
Jan 28 09:48:04 node1 heartbeat: [2420]: info: Configuration validated. Starting heartbeat 3.0.4
Jan 28 09:48:04 node1 heartbeat: [2421]: info: heartbeat: version 3.0.4
Jan 28 09:48:04 node1 heartbeat: [2421]: info: Heartbeat generation: 1422435302
Jan 28 09:48:04 node1 heartbeat: [2421]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Jan 28 09:48:04 node1 heartbeat: [2421]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:48:04 node1 heartbeat: [2421]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jan 28 09:48:04 node1 heartbeat: [2421]: info: Local status now set to: 'up'
Jan 28 09:48:04 node1 heartbeat: [2421]: info: Link node1:eth0 up.
Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: node node2: is dead
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Comm_now_up(): updating status to active
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Local status now set to: 'active'
Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: No STONITH device configured.
Jan 28 09:50:05 node1 heartbeat: [2421]: WARN: Shared disks are not protected.
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Resources being acquired from node2.
harc(default)[2433]:	2015/01/28_09:50:05 info: Running /etc/ha.d//rc.d/status status
mach_down(default)[2469]:	2015/01/28_09:50:05 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[2469]:	2015/01/28_09:50:05 info: mach_down takeover complete for node node2.
Jan 28 09:50:05 node1 heartbeat: [2421]: info: mach_down takeover complete.
Jan 28 09:50:05 node1 heartbeat: [2421]: info: Initial resource acquisition complete (mach_down)
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2501]:	2015/01/28_09:50:05 INFO:  Resource is stopped
Jan 28 09:50:05 node1 heartbeat: [2434]: info: Local Resource acquisition completed.
harc(default)[2588]:	2015/01/28_09:50:06 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp(default)[2588]:	2015/01/28_09:50:06 received ip-request-resp 172.31.29.243 OK yes
ResourceManager(default)[2611]:	2015/01/28_09:50:06 info: Acquiring resource group: node1 172.31.29.243 httpd
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2639]:	2015/01/28_09:50:06 INFO:  Resource is stopped
ResourceManager(default)[2611]:	2015/01/28_09:50:06 info: Running /etc/ha.d/resource.d/IPaddr 172.31.29.243 start
IPaddr(IPaddr_172.31.29.243)[2737]:	2015/01/28_09:50:06 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0
IPaddr(IPaddr_172.31.29.243)[2737]:	2015/01/28_09:50:06 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.31.29.243)[2737]:	2015/01/28_09:50:06 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[2723]:	2015/01/28_09:50:06 INFO:  Success
Jan 28 09:50:16 node1 heartbeat: [2421]: info: Local Resource acquisition completed. (none)
Jan 28 09:50:16 node1 heartbeat: [2421]: info: local resource transition completed.





node2 i see this

tailf /var/log/ha-log
Jan 28 09:27:22 node2 heartbeat: [1646]: info: Configuration validated. Starting heartbeat 3.0.4
Jan 28 09:27:22 node2 heartbeat: [1647]: info: heartbeat: version 3.0.4
Jan 28 09:27:22 node2 heartbeat: [1647]: info: Heartbeat generation: 1422435301
Jan 28 09:27:22 node2 heartbeat: [1647]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
Jan 28 09:27:22 node2 heartbeat: [1647]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_TriggerHandler: Added signal manual handler
Jan 28 09:27:22 node2 heartbeat: [1647]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jan 28 09:27:22 node2 heartbeat: [1647]: info: Local status now set to: 'up'
Jan 28 09:27:22 node2 heartbeat: [1647]: info: Link node2:eth0 up.
Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: node node1: is dead
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Comm_now_up(): updating status to active
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Local status now set to: 'active'
Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: No STONITH device configured.
Jan 28 09:29:23 node2 heartbeat: [1647]: WARN: Shared disks are not protected.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Resources being acquired from node1.
Jan 28 09:29:23 node2 heartbeat: [1656]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys node2] to acquire.
harc(default)[1655]:	2015/01/28_09:29:23 info: Running /etc/ha.d//rc.d/status status
mach_down(default)[1685]:	2015/01/28_09:29:23 info: Taking over resource group 172.31.29.243
ResourceManager(default)[1712]:	2015/01/28_09:29:23 info: Acquiring resource group: node1 172.31.29.243 httpd
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1740]:	2015/01/28_09:29:23 INFO:  Resource is stopped
ResourceManager(default)[1712]:	2015/01/28_09:29:23 info: Running /etc/ha.d/resource.d/IPaddr 172.31.29.243 start
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1824]:	2015/01/28_09:29:23 INFO:  Success
mach_down(default)[1685]:	2015/01/28_09:29:23 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[1685]:	2015/01/28_09:29:23 info: mach_down takeover complete for node node1.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: mach_down takeover complete.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Initial resource acquisition complete (mach_down)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: Local Resource acquisition completed. (none)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: local resource transition completed.
^Z
[1]+  Stopped                 tailf /var/log/ha-log
[root@ip-172-31-29-242 ~]# tailf /var/log/ha-log
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: Adding inet address 172.31.29.243/20 with broadcast address 172.31.31.255 to device eth0
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: Bringing device eth0 up
IPaddr(IPaddr_172.31.29.243)[1838]:	2015/01/28_09:29:23 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.31.29.243 eth0 172.31.29.243 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.31.29.243)[1824]:	2015/01/28_09:29:23 INFO:  Success
mach_down(default)[1685]:	2015/01/28_09:29:23 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[1685]:	2015/01/28_09:29:23 info: mach_down takeover complete for node node1.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: mach_down takeover complete.
Jan 28 09:29:23 node2 heartbeat: [1647]: info: Initial resource acquisition complete (mach_down)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: Local Resource acquisition completed. (none)
Jan 28 09:29:33 node2 heartbeat: [1647]: info: local resource transition completed.

Note You need to log in before you can comment on or make changes to this bug.

agk
bcathey
cluster-maint
dmaziuk
dvossel
fdinitto
hosting
huangling1952
kevin
lars.ellenberg
nhope
oscar.carlberg
redhat-bugzilla
rmitchel
robert.scheck
roman
tigalch
tis
tony.abohwo
toracat