Bug 1127752

Summary: controller_ips changes during deployment lifetime
Product: Red Hat OpenStack Reporter: John Eckersberg <jeckersb>
Component: ruby193-rubygem-staypuftAssignee: John Eckersberg <jeckersb>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: unspecified    
Version: Foreman (RHEL 6)CC: mburns, rhos-maint, sclewis, sseago, yeylon
Target Milestone: ga   
Target Release: Installer   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ruby193-rubygem-staypuft-0.2.1-1.el6ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-21 18:08:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description John Eckersberg 2014-08-07 13:22:08 UTC
Staypuft uses the controller_ips method in several places to template
class parameters, one of the most important being
pacemaker_cluster_members.  It is imperative that this method returns
a consistent list of IP addresses, because it directly drives the
pacemaker configuration file, and any change will break the cluster.

Presently, staypuft generates this list using the Host.ip value for
each host in the deployment.  This is from the Foreman Host model.
Foreman maintains the value of ip based on the value provided in the
ipaddress fact from facter.  In turn, facter takes the value of
ipaddress to be the first address encountered in the output of
ifconfig that does not match /^127./.

Here's what we've observed happening during an HA deployment:

- All three controllers provision on the management network, and run
  puppet during kickstart %post.  The ipaddress fact for each is the
  management network address, so this is correct so far.

- Hosts reboot, run puppet.  At this point it gets a little ambiguous.
  If the management interface is the first alphabetically, it will
  still be chosen as the ipaddress fact and the foreman host will
  still be correct.  However it's possible that the host provisioned
  off of eth1, and now the host has brought up eth0 as a DHCP
  interface.  In that case, ipaddress will be the eth0 address, and
  foreman will update the host ip accordingly.  This will cause
  controller_ips to change on the next puppet run.

- If we manage to get through the first puppet runs correctly across
  all controllers (e.g. eth0 was in fact the management interface),
  then the second time puppet runs the external bridge address will
  almost certainly take over the host ip, since br-ex comes before eth
  alphabetically.  The next puppet run will cause the controller_ips
  value to change to reflect this, and the pacemaker cluster will
  break.

Comment 3 John Eckersberg 2014-08-07 20:56:49 UTC
https://github.com/theforeman/staypuft/pull/258

Comment 5 Scott Seago 2014-08-11 14:31:24 UTC
Moving back to ON_DEV pending a new fix. We're going to revert foreman PR 258 and push a fix in the installer instead. Installer fix is merged already:

https://github.com/theforeman/foreman-installer-staypuft/pull/71

Comment 7 Leonid Natapov 2014-08-21 05:17:14 UTC
ruby193-rubygem-staypuft-0.2.5-1.el6ost.noarch
Controller IPs stay the same.

Comment 8 errata-xmlrpc 2014-08-21 18:08:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1090.html