Bug 1298716 - pcs create constraint failed due to wrong naming of the vip resources
pcs create constraint failed due to wrong naming of the vip resources
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-puppet-modules (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
unspecified Severity high
: z4
: 7.0 (Kilo)
Assigned To: Sofer Athlan-Guyot
yeylon@redhat.com
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-14 14:26 EST by Marius Cornea
Modified: 2016-04-18 03:14 EDT (History)
10 users (show)

See Also:
Fixed In Version: openstack-puppet-modules-2015.1.8-36.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, creating an IPv6 resource with the puppet-pacemaker created an invalid name with an invalid character pacemaker. Therefore, it was not possible to create the IPv6 IPaddr2 resource for Pacemaker using the puppet-pacemaker's "pacemaker::resource::ip" resource parameter. With this update, it is ensured that the name of the resource does not contain ':' in its name and the IPv6 IPaddr2 resource is properly created using the puppet-pacemaker module.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-18 11:44:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
os-collect-config (deleted)
2016-01-14 14:26 EST, Marius Cornea
no flags Details
os-collect-config (400.97 KB, text/plain)
2016-01-14 14:26 EST, Marius Cornea
no flags Details

  None (edit)
Description Marius Cornea 2016-01-14 14:26:11 EST
Description of problem:
pcs create constraint failed due to wrong naming of the vip resources. The os-collect-config log shows several messages such as:
Error: Resource 'ip-fd00:fd00:fd00:2000:f816:3eff:fe11:920' does not exist

while the resource name is (after applying the patch in BZ#1298391)

 ip-fd00.fd00.fd00.2000.f816.3eff.fe11.920	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0

Version-Release number of selected component (if applicable):
I'm doing the test following the instructions in:
https://etherpad.openstack.org/p/tripleo-ipv6-support
and enabling pacemaker by passing an additional $THT/environments/puppet-pacemaker.yaml environment file

openstack-puppet-modules-7.0.1-1.el7.noarch

How reproducible:
100%

Steps to Reproduce:
Deploy ipv6 enabled overcloud

Actual results:
The deployment fails 

Expected results:
The deployment succeeds.

Additional info:
Attaching the os-collect-config journal where the errors show up.

[root@overcloud-controller-0 ~]# pcs status | grep ip
Cluster name: tripleo_cluster
 ip-192.0.2.23	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-2001.db8.fd00.1000.f816.3eff.fee9.4a21	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-fd00.fd00.fd00.3000.f816.3eff.fe1b.5223	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-fd00.fd00.fd00.2000.f816.3eff.fed8.4b9b	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-fd00.fd00.fd00.4000.f816.3eff.fe55.5f05	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-fd00.fd00.fd00.2000.f816.3eff.fe11.920	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
Comment 1 Marius Cornea 2016-01-14 14:26 EST
Created attachment 1114941 [details]
os-collect-config
Comment 2 Jason Guiditta 2016-01-14 14:36:48 EST
From the pacemaker puppet module resource::ip:

# pcs dislikes colons from IPv6 addresses. Replacing them with dots.
$resource_name = regsubst($ip_address, '(:)', '.', 'G')

When tht creates the constraint, it does not do the same, simply passes in:
  first_resource    => "ip-${control_vip}",

which causes the mismatch here.  Seems to me that either we munge the ip to make a pcs-compliant name, or explicitly set a name when creating the resource::ip via the name param for that class (and then use the same name in the constraint ref).  Not sure which is better, but I think either would solve the issue.
Comment 3 Dan Sneddon 2016-01-15 04:25:16 EST
(In reply to Jason Guiditta from comment #2)
> From the pacemaker puppet module resource::ip:
> 
> # pcs dislikes colons from IPv6 addresses. Replacing them with dots.
> $resource_name = regsubst($ip_address, '(:)', '.', 'G')

I'm not sure that we can do the required munging inside of TripleO Heat Templates, it might be required to do a text replacement inside of Puppet. I think Puppet actually takes the VIP and creates this name, I don't think we output it from THT.
Comment 4 Sofer Athlan-Guyot 2016-01-15 06:15:40 EST
I'm going to munge it inside puppet.
Comment 5 Sofer Athlan-Guyot 2016-01-15 08:12:08 EST
It's been already fixed upstream:

https://github.com/redhat-openstack/puppet-pacemaker/commit/01c6000db5040055372021ad5a3231840ccb8bba
Comment 6 Sofer Athlan-Guyot 2016-01-15 08:14:42 EST
Tested the patch and it function properly:

pacemaker::resource::ip {'ip-2001::59':
  ip_address         => '2001::59',
  nic                => 'eth1',
  cidr_netmask       => '',
  post_success_sleep => 0,
  tries              => 1,
  try_sleep          => 1,
  require            => 'Class[pacemaker::corosync]',
}

pcs resource show ip-2001..59
                                                    
 Resource: ip-2001..59 (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=2001::59 nic=eth1 
  Operations: start interval=0s timeout=20s (ip-2001..59-start-interval-0s)
              stop interval=0s timeout=20s (ip-2001..59-stop-interval-0s)
              monitor interval=10s timeout=20s (ip-2001..59-monitor-interval-10s)
Comment 7 Marius Cornea 2016-01-15 08:21:22 EST
(In reply to Sofer Athlan-Guyot from comment #6)
> Tested the patch and it function properly:
> 
> pacemaker::resource::ip {'ip-2001::59':
>   ip_address         => '2001::59',
>   nic                => 'eth1',
>   cidr_netmask       => '',
>   post_success_sleep => 0,
>   tries              => 1,
>   try_sleep          => 1,
>   require            => 'Class[pacemaker::corosync]',
> }
> 
> pcs resource show ip-2001..59
>                                                     
>  Resource: ip-2001..59 (class=ocf provider=heartbeat type=IPaddr2)
>   Attributes: ip=2001::59 nic=eth1 
>   Operations: start interval=0s timeout=20s (ip-2001..59-start-interval-0s)
>               stop interval=0s timeout=20s (ip-2001..59-stop-interval-0s)
>               monitor interval=10s timeout=20s
> (ip-2001..59-monitor-interval-10s)

I already had the patch when testing(see the resources names in the initial description - ip-2001.db8.fd00.1000.f816.3eff.fee9.4a21). The problem is that when running pcs constraint command it uses the the name containing ':' - e.g /Exec[Creating order constraint public_vip-then-haproxy]/returns: change from notrun to 0 failed: /usr/sbin/pcs constraint order start ip-2001:db8:fd00:1000:f816:3eff:fee9:4a21 then start haproxy-clone kind=Optional returned 1 instead of one of [0][0m
Comment 8 Sofer Athlan-Guyot 2016-01-15 08:24:29 EST
Yeap, I'm working on it right now, thanks for the clarification.
Comment 9 Sofer Athlan-Guyot 2016-01-15 11:16:49 EST
Waiting for review in https://github.com/redhat-openstack/puppet-pacemaker/pull/70

This has been tested with the following manifest:

  class {'::pacemaker::corosync':
    cluster_name    => 'basic_cluster',
    cluster_members => 'node1 node2',
  }
  
  class {'::pacemaker::stonith':
    disable => true,
  }
  
  pacemaker::resource::ip {'ip-2001::59':
      ip_address         => '2001::59',
      nic                => 'eth1',
      cidr_netmask       => '',
      post_success_sleep => 0,
      tries              => 1,
      try_sleep          => 1,
      require            => 'Class[pacemaker::corosync]',
  }
  
  pacemaker::resource::ip {'ip-2001::60':
      ip_address         => '2001::60',
      nic                => 'eth1',
      cidr_netmask       => '',
      post_success_sleep => 0,
      tries              => 1,
      try_sleep          => 1,
      require            => 'Class[pacemaker::corosync]',
  }
  
  # testing location
  pacemaker::constraint::location { 'ipv6-on-node1':
    resource => 'ip-2001::59',
    location => 'node1',
    score    => '100',
  }
  
  # testing colocation
  pacemaker::constraint::colocation { 'ipv6-on-same-node':
    source => 'ip-2001::60',
    target => 'ip-2001::59',
    score  => 'INFINITY',
    require => ['Pacemaker::resource::ip[ip-2001::60]', 'Pacemaker::resource::ip[ip-2001::60]'],
  }
  
  # testing order
  pacemaker::constraint::base { 'ipv6-59-before-ipv6-60':
    constraint_type => 'order',
    first_resource  => 'ip-2001::59',
    second_resource => 'ip-2001::60',
    first_action    => 'start',
    second_action   => 'start',
  }


and with this one for the deletion:

  class {'::pacemaker::corosync':
    cluster_name    => 'basic_cluster',
    cluster_members => 'node1 node2',
  }
  
  class {'::pacemaker::stonith':
    disable => true,
  }
  # testing location
  pacemaker::constraint::location { 'ipv6-on-node1':
    ensure   => absent,
    resource => 'ip-2001::59',
    location => 'node1',
    score    => '100',
  }
  
  # testing colocation
  pacemaker::constraint::colocation { 'ipv6-on-same-node':
    ensure => absent,
    source => 'ip-2001::60',
    target => 'ip-2001::59',
    score  => 'INFINITY',
  }
  
  # testing order
  pacemaker::constraint::base { 'ipv6-59-before-ipv6-60':
    ensure          => absent,
    constraint_type => 'order',
    first_resource  => 'ip-2001::59',
    second_resource => 'ip-2001::60',
    first_action    => 'start',
    second_action   => 'start',
  }


Can somebody validate this on a OSP7 deployment ?
Comment 11 Marius Cornea 2016-01-19 05:47:59 EST
[root@overcloud-controller-0 ~]# rpm -qa | grep puppet-modules
openstack-puppet-modules-2015.1.8-41.el7ost.noarch

[root@overcloud-controller-0 ~]# pcs constraint list --full | grep ip-
  start ip-fd00.fd00.fd00.2000.f816.3eff.fe4a.871 then start haproxy-clone (kind:Optional) (id:order-ip-fd00.fd00.fd00.2000.f816.3eff.fe4a.871-haproxy-clone-Optional)
  start ip-192.0.2.6 then start haproxy-clone (kind:Optional) (id:order-ip-192.0.2.6-haproxy-clone-Optional)
  start ip-fd00.fd00.fd00.4000.f816.3eff.fe88.dbfc then start haproxy-clone (kind:Optional) (id:order-ip-fd00.fd00.fd00.4000.f816.3eff.fe88.dbfc-haproxy-clone-Optional)
  start ip-2001.db8.fd00.1000.f816.3eff.feb3.25fb then start haproxy-clone (kind:Optional) (id:order-ip-2001.db8.fd00.1000.f816.3eff.feb3.25fb-haproxy-clone-Optional)
  start ip-fd00.fd00.fd00.2000.f816.3eff.feeb.38af then start haproxy-clone (kind:Optional) (id:order-ip-fd00.fd00.fd00.2000.f816.3eff.feeb.38af-haproxy-clone-Optional)
  start ip-fd00.fd00.fd00.3000.f816.3eff.fe2e.5873 then start haproxy-clone (kind:Optional) (id:order-ip-fd00.fd00.fd00.3000.f816.3eff.fe2e.5873-haproxy-clone-Optional)
  ip-192.0.2.6 with haproxy-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-ip-192.0.2.6-haproxy-clone-INFINITY)
  ip-fd00.fd00.fd00.3000.f816.3eff.fe2e.5873 with haproxy-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-ip-fd00.fd00.fd00.3000.f816.3eff.fe2e.5873-haproxy-clone-INFINITY)
  ip-fd00.fd00.fd00.4000.f816.3eff.fe88.dbfc with haproxy-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-ip-fd00.fd00.fd00.4000.f816.3eff.fe88.dbfc-haproxy-clone-INFINITY)
  ip-2001.db8.fd00.1000.f816.3eff.feb3.25fb with haproxy-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-ip-2001.db8.fd00.1000.f816.3eff.feb3.25fb-haproxy-clone-INFINITY)
  ip-fd00.fd00.fd00.2000.f816.3eff.fe4a.871 with haproxy-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-ip-fd00.fd00.fd00.2000.f816.3eff.fe4a.871-haproxy-clone-INFINITY)
  ip-fd00.fd00.fd00.2000.f816.3eff.feeb.38af with haproxy-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) (id:colocation-ip-fd00.fd00.fd00.2000.f816.3eff.feeb.38af-haproxy-clone-INFINITY)
Comment 15 errata-xmlrpc 2016-02-18 11:44:24 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0265.html

Note You need to log in before you can comment on or make changes to this bug.