Bug 1297850 - Corosync fails to start in an ipv6 deployment
Summary: Corosync fails to start in an ipv6 deployment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: y3
: 7.0 (Kilo)
Assignee: Jiri Stransky
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-12 15:45 UTC by Marius Cornea
Modified: 2016-04-18 07:13 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-0.8.6-99.el7ost
Doc Type: Bug Fix
Doc Text:
Corosync failed to start in an IPv6-based Overcloud. This is due to a missing '--ipv6' option when the director tries to start Corosync. This fix adds this option to the Controller's Puppet manifest and also adds related parameters to the Heat template collection. Corosync now starts successfully in IPv6-based Overclouds.
Clone Of:
Environment:
Last Closed: 2016-02-18 16:48:58 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 267073 None MERGED Allow to enable IPv6 on Corosync 2020-07-07 09:48:46 UTC
Red Hat Product Errata RHBA-2016:0264 normal SHIPPED_LIVE Red Hat Enterprise Linux OSP 7 director Bug Fix Advisory 2016-02-18 21:41:29 UTC

Description Marius Cornea 2016-01-12 15:45:58 UTC
Description of problem:
Corosync fails to start in an ipv6 deployment with the following error:
[MAIN  ] parse error in config: No interfaces defined
[MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1278.


Version-Release number of selected component (if applicable):
I'm doing the test following the instructions in:
https://etherpad.openstack.org/p/tripleo-ipv6-support
and enabling pacemaker by passing an additional $THT/environments/puppet-pacemaker.yaml environment file

How reproducible:
100%

Steps to Reproduce:
1. Deploy ipv6 enabled overcloud with pacemaker

Actual results:
Deployment eventually fails.

Expected results:
Deployment completes successfully.

Additional info:
This is the corosync.conf:
totem {
    version: 2
    secauth: off
    cluster_name: tripleo_cluster
    transport: udpu
}

nodelist {
    node {
        ring0_addr: overcloud-controller-0
        nodeid: 1
    }
}

quorum {
    provider: corosync_votequorum
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

overcloud-controller-0 resolves to an ipv6 address:
[root@overcloud-controller-0 ~]# ping6 -n -c1 overcloud-controller-0
PING overcloud-controller-0(fd00:fd00:fd00:2000:f816:3eff:fe45:bec3) 56 data bytes
64 bytes from fd00:fd00:fd00:2000:f816:3eff:fe45:bec3: icmp_seq=1 ttl=64 time=0.032 ms

[root@overcloud-controller-0 ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2016-01-12 14:46:01 UTC; 57min ago
  Process: 1004 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
  Process: 1255 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAILURE)
 Main PID: 814 (code=exited, status=0/SUCCESS)

Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Starting Corosync Cluster Engine...
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service.
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] parse error in config: No interfaces defined
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1278.
Jan 12 14:46:01 overcloud-controller-0 corosync[1255]: Starting Corosync Cluster Engine (corosync): [FAILED]
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: corosync.service: control process exited, code=exited status=1
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Failed to start Corosync Cluster Engine.
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Unit corosync.service entered failed state.
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: corosync.service failed.

Comment 1 Emilien Macchi 2016-01-12 17:13:42 UTC
I think TripleO Heat Templates is missing some useful options to enable Corosync on the overcloud:
https://github.com/redhat-openstack/puppet-pacemaker/blob/master/manifests/corosync.pp#L25-L27

Looking at THT now, it seems like cluster_setup_extras is empty now, which could be the reason why Corosync configure IPv4 by default.

Comment 2 Gilles Dubreuil 2016-01-13 23:53:18 UTC
Effectively when using IPv6 the cluster_setup_extras must bear the --ipv6 option.

For instance:

-------------
class {"pacemaker::corosync":
  cluster_name => "cluster_test",
  cluster_members => "one.pcs.tst two.pcs.tst three.pcs.tst",
  cluster_setup_extras => { '--ipv6' => '' },
}
--------------

With above, the cluster starts properly.
The option must be added to the deployment parameters.

Comment 4 Marius Cornea 2016-01-19 10:52:53 UTC
openstack-tripleo-heat-templates-0.8.6-106.el7ost.noarch

[root@overcloud-controller-0 ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-01-19 05:30:34 EST; 22min ago
 Main PID: 25781 (corosync)
   CGroup: /system.slice/corosync.service
           └─25781 corosync

Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [QB    ] server name: votequorum
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [QB    ] server name: quorum
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [TOTEM ] adding new UDPU member {fd00:fd00:fd00:2000:f816:3eff:feeb:3100}
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [TOTEM ] A new membership (fd00:fd00:fd00:2000:f816:3eff:feeb:3100:4) was formed. Members joined: 1
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [QUORUM] Members[1]: 1
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [MAIN  ] Completed service synchronization, ready to provide service.
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25774]: Starting Corosync Cluster Engine (corosync): [  OK  ]
Jan 19 05:30:34 overcloud-controller-0.localdomain systemd[1]: Started Corosync Cluster Engine.

Comment 8 errata-xmlrpc 2016-02-18 16:48:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0264.html


Note You need to log in before you can comment on or make changes to this bug.