Bug 1297850 - Corosync fails to start in an ipv6 deployment
Corosync fails to start in an ipv6 deployment
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
urgent Severity high
: y3
: 7.0 (Kilo)
Assigned To: Jiri Stransky
yeylon@redhat.com
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-12 10:45 EST by Marius Cornea
Modified: 2016-04-18 03:13 EDT (History)
10 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-0.8.6-99.el7ost
Doc Type: Bug Fix
Doc Text:
Corosync failed to start in an IPv6-based Overcloud. This is due to a missing '--ipv6' option when the director tries to start Corosync. This fix adds this option to the Controller's Puppet manifest and also adds related parameters to the Heat template collection. Corosync now starts successfully in IPv6-based Overclouds.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-18 11:48:58 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 267073 None None None 2016-01-13 23:13 EST

  None (edit)
Description Marius Cornea 2016-01-12 10:45:58 EST
Description of problem:
Corosync fails to start in an ipv6 deployment with the following error:
[MAIN  ] parse error in config: No interfaces defined
[MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1278.


Version-Release number of selected component (if applicable):
I'm doing the test following the instructions in:
https://etherpad.openstack.org/p/tripleo-ipv6-support
and enabling pacemaker by passing an additional $THT/environments/puppet-pacemaker.yaml environment file

How reproducible:
100%

Steps to Reproduce:
1. Deploy ipv6 enabled overcloud with pacemaker

Actual results:
Deployment eventually fails.

Expected results:
Deployment completes successfully.

Additional info:
This is the corosync.conf:
totem {
    version: 2
    secauth: off
    cluster_name: tripleo_cluster
    transport: udpu
}

nodelist {
    node {
        ring0_addr: overcloud-controller-0
        nodeid: 1
    }
}

quorum {
    provider: corosync_votequorum
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

overcloud-controller-0 resolves to an ipv6 address:
[root@overcloud-controller-0 ~]# ping6 -n -c1 overcloud-controller-0
PING overcloud-controller-0(fd00:fd00:fd00:2000:f816:3eff:fe45:bec3) 56 data bytes
64 bytes from fd00:fd00:fd00:2000:f816:3eff:fe45:bec3: icmp_seq=1 ttl=64 time=0.032 ms

[root@overcloud-controller-0 ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2016-01-12 14:46:01 UTC; 57min ago
  Process: 1004 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
  Process: 1255 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAILURE)
 Main PID: 814 (code=exited, status=0/SUCCESS)

Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Starting Corosync Cluster Engine...
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service.
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] parse error in config: No interfaces defined
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1278.
Jan 12 14:46:01 overcloud-controller-0 corosync[1255]: Starting Corosync Cluster Engine (corosync): [FAILED]
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: corosync.service: control process exited, code=exited status=1
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Failed to start Corosync Cluster Engine.
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Unit corosync.service entered failed state.
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: corosync.service failed.
Comment 1 Emilien Macchi 2016-01-12 12:13:42 EST
I think TripleO Heat Templates is missing some useful options to enable Corosync on the overcloud:
https://github.com/redhat-openstack/puppet-pacemaker/blob/master/manifests/corosync.pp#L25-L27

Looking at THT now, it seems like cluster_setup_extras is empty now, which could be the reason why Corosync configure IPv4 by default.
Comment 2 Gilles Dubreuil 2016-01-13 18:53:18 EST
Effectively when using IPv6 the cluster_setup_extras must bear the --ipv6 option.

For instance:

-------------
class {"pacemaker::corosync":
  cluster_name => "cluster_test",
  cluster_members => "one.pcs.tst two.pcs.tst three.pcs.tst",
  cluster_setup_extras => { '--ipv6' => '' },
}
--------------

With above, the cluster starts properly.
The option must be added to the deployment parameters.
Comment 4 Marius Cornea 2016-01-19 05:52:53 EST
openstack-tripleo-heat-templates-0.8.6-106.el7ost.noarch

[root@overcloud-controller-0 ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-01-19 05:30:34 EST; 22min ago
 Main PID: 25781 (corosync)
   CGroup: /system.slice/corosync.service
           └─25781 corosync

Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [QB    ] server name: votequorum
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [QB    ] server name: quorum
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [TOTEM ] adding new UDPU member {fd00:fd00:fd00:2000:f816:3eff:feeb:3100}
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [TOTEM ] A new membership (fd00:fd00:fd00:2000:f816:3eff:feeb:3100:4) was formed. Members joined: 1
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [QUORUM] Members[1]: 1
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [MAIN  ] Completed service synchronization, ready to provide service.
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25774]: Starting Corosync Cluster Engine (corosync): [  OK  ]
Jan 19 05:30:34 overcloud-controller-0.localdomain systemd[1]: Started Corosync Cluster Engine.
Comment 8 errata-xmlrpc 2016-02-18 11:48:58 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0264.html

Note You need to log in before you can comment on or make changes to this bug.