Created attachment 971758 [details] logs Description of problem: While HA deployment foreman suggests which IP address will be used for VIPs and passes them to Puppet, so puppet can run pcs commands to create VIPs. I have a situation (happened already twice) when IP addresses suggested for VIP are already in use as controllers' ip address for provisioning. For example I have controller with ip address 192.168.0.2 and another controlles with ip address 192.168.0.3. While puppet I see that there are same ip addresses used for VIPs: ip-192.168.0.3 (ocf::heartbeat:IPaddr2): Started pcmk-mac848f69fbc4c3 ip-192.168.0.2 (ocf::heartbeat:IPaddr2): Started pcmk-mac848f69fbc4c3 As a result HA deployment fails because host with actual IP X looses it's IP and can't get it back from DHCP because this ip used for VIP. So cluster members can't communicate with each other. I am attaching foreman log from staypuft machine and and messages log from controllers. ruby193-rubygem-foreman_openstack_simplify-0.0.6-8.el7ost.noarch openstack-puppet-modules-2014.2.7-2.el7ost.noarch openstack-foreman-installer-3.0.8-1.el7ost.noarch Here is pcs status output: ------------------------------- [root@macf04da2732fb1 ~]# pcs status Cluster name: openstack Last updated: Sun Dec 21 18:31:04 2014 Last change: Sun Dec 21 18:05:20 2014 via cibadmin on pcmk-macf04da2732fb1 Stack: corosync Current DC: pcmk-macf04da2732fb1 (3) - partition with quorum Version: 1.1.10-32.el7_0.1-368c726 3 Nodes configured 16 Resources configured Online: [ pcmk-mac848f69fbc4c3 pcmk-macf04da2732fb1 ] OFFLINE: [ pcmk-mac848f69fbc643 ] Full list of resources: stonith-ipmilan-10.35.160.172 (stonith:fence_ipmilan): Started pcmk-mac848f69fbc4c3 stonith-ipmilan-10.35.160.174 (stonith:fence_ipmilan): Started pcmk-mac848f69fbc4c3 stonith-ipmilan-10.35.160.170 (stonith:fence_ipmilan): Started pcmk-macf04da2732fb1 ip-192.168.0.3 (ocf::heartbeat:IPaddr2): Started pcmk-mac848f69fbc4c3 ip-10.35.173.157 (ocf::heartbeat:IPaddr2): Started pcmk-macf04da2732fb1 ip-10.35.173.158 (ocf::heartbeat:IPaddr2): Started pcmk-macf04da2732fb1 ip-192.168.0.2 (ocf::heartbeat:IPaddr2): Started pcmk-mac848f69fbc4c3 ip-192.168.0.18 (ocf::heartbeat:IPaddr2): Started pcmk-macf04da2732fb1 ip-192.168.0.13 (ocf::heartbeat:IPaddr2): Started pcmk-macf04da2732fb1 ip-192.168.0.14 (ocf::heartbeat:IPaddr2): Started pcmk-mac848f69fbc4c3 ip-192.168.0.21 (ocf::heartbeat:IPaddr2): Started pcmk-mac848f69fbc4c3 ip-10.35.173.150 (ocf::heartbeat:IPaddr2): Started pcmk-macf04da2732fb1 ip-10.35.173.155 (ocf::heartbeat:IPaddr2): Started pcmk-mac848f69fbc4c3 Clone Set: memcached-clone [memcached] Started: [ pcmk-mac848f69fbc4c3 pcmk-macf04da2732fb1 ] Stopped: [ pcmk-mac848f69fbc643 ] PCSD Status: pcmk-mac848f69fbc4c3: Online pcmk-mac848f69fbc643: Unable to authenticate pcmk-macf04da2732fb1: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ------------------------------- pcs config output: ------------------------------- [root@mac848f69fbc4c3 ~]# pcs config Cluster Name: openstack Corosync Nodes: pcmk-mac848f69fbc4c3 pcmk-mac848f69fbc643 pcmk-macf04da2732fb1 Pacemaker Nodes: pcmk-mac848f69fbc4c3 pcmk-mac848f69fbc643 pcmk-macf04da2732fb1 Resources: Resource: ip-192.168.0.3 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.0.3 cidr_netmask=32 Operations: start interval=0s timeout=20s (ip-192.168.0.3-start-timeout-20s) stop interval=0s timeout=20s (ip-192.168.0.3-stop-timeout-20s) monitor interval=30s (ip-192.168.0.3-monitor-interval-30s) Resource: ip-10.35.173.157 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.35.173.157 cidr_netmask=32 Operations: start interval=0s timeout=20s (ip-10.35.173.157-start-timeout-20s) stop interval=0s timeout=20s (ip-10.35.173.157-stop-timeout-20s) monitor interval=30s (ip-10.35.173.157-monitor-interval-30s) Resource: ip-10.35.173.158 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.35.173.158 cidr_netmask=32 Operations: start interval=0s timeout=20s (ip-10.35.173.158-start-timeout-20s) stop interval=0s timeout=20s (ip-10.35.173.158-stop-timeout-20s) monitor interval=30s (ip-10.35.173.158-monitor-interval-30s) Resource: ip-192.168.0.2 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.0.2 cidr_netmask=32 Operations: start interval=0s timeout=20s (ip-192.168.0.2-start-timeout-20s) stop interval=0s timeout=20s (ip-192.168.0.2-stop-timeout-20s) monitor interval=30s (ip-192.168.0.2-monitor-interval-30s) Resource: ip-192.168.0.18 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.0.18 cidr_netmask=32 Operations: start interval=0s timeout=20s (ip-192.168.0.18-start-timeout-20s) stop interval=0s timeout=20s (ip-192.168.0.18-stop-timeout-20s) monitor interval=30s (ip-192.168.0.18-monitor-interval-30s) Resource: ip-192.168.0.13 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.0.13 cidr_netmask=32 Operations: start interval=0s timeout=20s (ip-192.168.0.13-start-timeout-20s) stop interval=0s timeout=20s (ip-192.168.0.13-stop-timeout-20s) monitor interval=30s (ip-192.168.0.13-monitor-interval-30s) Resource: ip-192.168.0.14 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.0.14 cidr_netmask=32 Operations: start interval=0s timeout=20s (ip-192.168.0.14-start-timeout-20s) stop interval=0s timeout=20s (ip-192.168.0.14-stop-timeout-20s) monitor interval=30s (ip-192.168.0.14-monitor-interval-30s) Resource: ip-192.168.0.21 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.0.21 cidr_netmask=32 Operations: start interval=0s timeout=20s (ip-192.168.0.21-start-timeout-20s) stop interval=0s timeout=20s (ip-192.168.0.21-stop-timeout-20s) monitor interval=30s (ip-192.168.0.21-monitor-interval-30s) Resource: ip-10.35.173.150 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.35.173.150 cidr_netmask=32 Operations: start interval=0s timeout=20s (ip-10.35.173.150-start-timeout-20s) stop interval=0s timeout=20s (ip-10.35.173.150-stop-timeout-20s) monitor interval=30s (ip-10.35.173.150-monitor-interval-30s) Resource: ip-10.35.173.155 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.35.173.155 cidr_netmask=32 Operations: start interval=0s timeout=20s (ip-10.35.173.155-start-timeout-20s) stop interval=0s timeout=20s (ip-10.35.173.155-stop-timeout-20s) monitor interval=30s (ip-10.35.173.155-monitor-interval-30s) Clone: memcached-clone Resource: memcached (class=systemd type=memcached) Attributes: start-delay=10s Operations: monitor interval=30s (memcached-monitor-interval-30s) Stonith Devices: Resource: stonith-ipmilan-10.35.160.172 (class=stonith type=fence_ipmilan) Attributes: pcmk_host_list=pcmk-mac848f69fbc643 ipaddr=10.35.160.172 login=root passwd=calvin Operations: monitor interval=60s (stonith-ipmilan-10.35.160.172-monitor-interval-60s) Resource: stonith-ipmilan-10.35.160.174 (class=stonith type=fence_ipmilan) Attributes: pcmk_host_list=pcmk-macf04da2732fb1 ipaddr=10.35.160.174 login=root passwd=calvin Operations: monitor interval=60s (stonith-ipmilan-10.35.160.174-monitor-interval-60s) Resource: stonith-ipmilan-10.35.160.170 (class=stonith type=fence_ipmilan) Attributes: pcmk_host_list=pcmk-mac848f69fbc4c3 ipaddr=10.35.160.170 login=root passwd=calvin Operations: monitor interval=60s (stonith-ipmilan-10.35.160.170-monitor-interval-60s) Fencing Levels: Location Constraints: Resource: stonith-ipmilan-10.35.160.170 Disabled on: pcmk-mac848f69fbc4c3 (score:-INFINITY) (id:location-stonith-ipmilan-10.35.160.170-pcmk-mac848f69fbc4c3--INFINITY) Resource: stonith-ipmilan-10.35.160.172 Disabled on: pcmk-mac848f69fbc643 (score:-INFINITY) (id:location-stonith-ipmilan-10.35.160.172-pcmk-mac848f69fbc643--INFINITY) Resource: stonith-ipmilan-10.35.160.174 Disabled on: pcmk-macf04da2732fb1 (score:-INFINITY) (id:location-stonith-ipmilan-10.35.160.174-pcmk-macf04da2732fb1--INFINITY) Ordering Constraints: Colocation Constraints: Cluster Properties: cluster-infrastructure: corosync dc-version: 1.1.10-32.el7_0.1-368c726 pcmk-mac848f69fbc4c3: memcached,rabbitmq pcmk-mac848f69fbc643: memcached pcmk-macf04da2732fb1: memcached,rabbitmq,haproxy rabbitmq: running
This is unrelated to puppet configuration, but is somewhere in staypuft, which passes the IP addresses to be used for VIPs to the puppet manifests
*** Bug 1177033 has been marked as a duplicate of this bug. ***
For the record - let's take a look on dhcpd.leases and proxy.log to see what IP addresses does Foreman and Proxy return.
Worked with Lukas and Leonid to get to the root cause: The installer uses an api call to unused_ip to generate the VIPs for a deployment. The unused_ip api call does not create leases in dhcp. This means that a new host in the environment will use dhcp and get a lease which might conflict with the VIPs. Some notes: This only affects subnets using a foreman-proxy DHCP server (the provisioning network currently). It can be avoided by discovering *all* hosts prior to creating the deployment. Proposed fix: in the case that the subnet is using DHCP, we should do: * unused_ip to get an ip address * immediately call into dhcp and create a reservation for the ip address ** This is done with by calling POST /dhcp/network?mac=xyz&name=abc A feature request for foreman has been filed related to this issue to allow doing this in one step: http://projects.theforeman.org/issues/8854
Correction, to create a reservation one needs to provide MAC, name and IP: POST /dhcp/network?mac=xyz&name=abc&ip=def When doing the code change, leave a note this can be later refactored to a signle call once our feature request is implemented.
What about on deployment deletion? I imagine we also need to clear reservations for all those macs? Whats the POST API call for that?
I just realized that this is not that easy. If you add the reservation, Foreman orchestration code will try to perform it once again, which will fail. You also need to make sure the orchestration will not happen. In this code: app/models/concerns/orchestration/dhcp.rb you need to disable the after_validation hooks: after_validation :dhcp_conflict_detected?, :queue_dhcp for the particular Host instance you do want to do the pre-reservation. Instead commenting that out I think you want to introduce some flag and use it when you want to skip the DHCP validations (which triggers the orchestration code) on a particular instance. For the deletion - you don't need to care about this, because our orchestration code will make sure DHCP record get's deleted automatically (that's actually the before_destroy :queue_dhcp_destroy line in this file). But when you do the pre-reservation, make sure to store the IP address that was returned in the Host record (field name "ip") otherwise the deletion code will not have enough information for the deletion.
Lukas, Does this still apply if we're using nics that are not attached to a host? The nics this is an issue for are all virtual and not attached to a host. Same question for the deletion. Is the cleanup done when the nic is removed? or when the host is removed. If we don't have a host, does that cleanup work?
Right, if you pre-create a reservation on a proxy/dhcp and save the IP address with a host in Foreman, the moment you try to save it orchestration code sends the very same proxy/dhcp request again (MAC/IP/hostname) which will likely fail rolling the whole transaction back. This paragraph applies only to primary (provisioning) interface. Now, for all the other interfaces, you have option to create "unmanaged" NICs. Those interfaces are not being registered against DHCP/DNS. Although there was no user interface for this until current nightly versions (will be 1.8), you were able to set the flag. It's called "managed" and if you set it to false, then you can do the DHCP/DNS reservations yourself, but in that case you also need to make sure it get's removed upon host deletion. Can't you just keep those interfaces simly unmanaged (set the flag) so they will get "random" IP address and will have no hostname?
The issue that we were facing is that we have virtual ip addresses that need to be set. They're logical IP addresses that move from host to host in a cluster using pacemaker. There is no physical nic on a host with that IP. The way we've modeled this in staypuft (and Scott can correct me when I say it incorrectly) is that we have a set of nics that are not associated with any physical host. We use those nics to get an ip address which gets passed into puppet and used to configure pacemaker vips. The issue in this BZ is that the ips of these virtual nics aren't reserved, so new dhcp requests can get the same addresses. The proposed fix is: * during deployment creation, the virtual nics are created with no host association. * A call to unused_ip is made to get an ip address for that nic * An immediate call to reserve that ip address is made to dhcp * On deployment deletion, for each nic, a call is made to dhcp to delete the reservation for that ip address.
https://github.com/theforeman/staypuft/pull/402 Pull request is here.
*** Bug 1185107 has been marked as a duplicate of this bug. ***
Ohad, is there a remote chance we can get a quick fix for this, or should we push it to async?
This is going to be much more involved than initially thought. For the PXE network, we essentially need a pool of addresses for VIPs that's separate from the DHCP pool. The challenges include how to split the allocation such that we don't prematurely run out of either VIP IPs or normal host IPs, and coming up with a staypuft-specific IPAM scheme for PXE-network VIPs since we wo't use the foreman 'suggest IP' here. The workaround is to put all VIP API network traffic types on networks *other than* the PXE provisioning network.
(In reply to Scott Seago from comment #18) > This is going to be much more involved than initially thought. For the PXE > network, we essentially need a pool of addresses for VIPs that's separate > from the DHCP pool. The challenges include how to split the allocation such > that we don't prematurely run out of either VIP IPs or normal host IPs, and > coming up with a staypuft-specific IPAM scheme for PXE-network VIPs since we > wo't use the foreman 'suggest IP' here. > > The workaround is to put all VIP API network traffic types on networks > *other than* the PXE provisioning network. Or pre-discover all hosts to be used in your environment.
Scott, Is this something we could add a deployment validation for without doing all the general deployment validation for? What we're looking for is for a comparison of the vips and the ips on the hosts to catch the conflicts and throw a message if there is a conflict.
*** Bug 1190825 has been marked as a duplicate of this bug. ***
This would require significant re-architecture to resolve. The workaround listed in the release notes avoids the issue completely.
The Doc Text for this issue is not completely correct. I have had the IP address theft/conflict occur when the Public API network is completely separate. If the Public API Network is 192.168.141.0/21 with a pool range of 25 through 225, I have had the auto assigned IP for the host interface also be assigned for a VIP. My solution has been to manually assign the interface to high in the pool range (>200). Are the VIPs picked from a specific range of the pool?