1297850 – Corosync fails to start in an ipv6 deployment

Bug 1297850 - Corosync fails to start in an ipv6 deployment

Summary: Corosync fails to start in an ipv6 deployment

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	7.0 (Kilo)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	y3
Target Release:	7.0 (Kilo)
Assignee:	Jiri Stransky
QA Contact:	yeylon@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-01-12 15:45 UTC by Marius Cornea
Modified:	2016-04-18 07:13 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-0.8.6-99.el7ost
Doc Type:	Bug Fix
Doc Text:	Corosync failed to start in an IPv6-based Overcloud. This is due to a missing '--ipv6' option when the director tries to start Corosync. This fix adds this option to the Controller's Puppet manifest and also adds related parameters to the Heat template collection. Corosync now starts successfully in IPv6-based Overclouds.
Clone Of:
Environment:
Last Closed:	2016-02-18 16:48:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	267073	0	None	MERGED	Allow to enable IPv6 on Corosync	2020-07-07 09:48:46 UTC
Red Hat Product Errata	RHBA-2016:0264	0	normal	SHIPPED_LIVE	Red Hat Enterprise Linux OSP 7 director Bug Fix Advisory	2016-02-18 21:41:29 UTC

Description Marius Cornea 2016-01-12 15:45:58 UTC

Description of problem:
Corosync fails to start in an ipv6 deployment with the following error:
[MAIN  ] parse error in config: No interfaces defined
[MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1278.


Version-Release number of selected component (if applicable):
I'm doing the test following the instructions in:
https://etherpad.openstack.org/p/tripleo-ipv6-support
and enabling pacemaker by passing an additional $THT/environments/puppet-pacemaker.yaml environment file

How reproducible:
100%

Steps to Reproduce:
1. Deploy ipv6 enabled overcloud with pacemaker

Actual results:
Deployment eventually fails.

Expected results:
Deployment completes successfully.

Additional info:
This is the corosync.conf:
totem {
    version: 2
    secauth: off
    cluster_name: tripleo_cluster
    transport: udpu
}

nodelist {
    node {
        ring0_addr: overcloud-controller-0
        nodeid: 1
    }
}

quorum {
    provider: corosync_votequorum
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

overcloud-controller-0 resolves to an ipv6 address:
[root@overcloud-controller-0 ~]# ping6 -n -c1 overcloud-controller-0
PING overcloud-controller-0(fd00:fd00:fd00:2000:f816:3eff:fe45:bec3) 56 data bytes
64 bytes from fd00:fd00:fd00:2000:f816:3eff:fe45:bec3: icmp_seq=1 ttl=64 time=0.032 ms

[root@overcloud-controller-0 ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2016-01-12 14:46:01 UTC; 57min ago
  Process: 1004 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
  Process: 1255 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAILURE)
 Main PID: 814 (code=exited, status=0/SUCCESS)

Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Starting Corosync Cluster Engine...
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service.
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] parse error in config: No interfaces defined
Jan 12 14:46:01 overcloud-controller-0 corosync[1261]:  [MAIN  ] Corosync Cluster Engine exiting with status 8 at main.c:1278.
Jan 12 14:46:01 overcloud-controller-0 corosync[1255]: Starting Corosync Cluster Engine (corosync): [FAILED]
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: corosync.service: control process exited, code=exited status=1
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Failed to start Corosync Cluster Engine.
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Unit corosync.service entered failed state.
Jan 12 14:46:01 overcloud-controller-0 systemd[1]: corosync.service failed.

Comment 1 Emilien Macchi 2016-01-12 17:13:42 UTC

I think TripleO Heat Templates is missing some useful options to enable Corosync on the overcloud:
https://github.com/redhat-openstack/puppet-pacemaker/blob/master/manifests/corosync.pp#L25-L27

Looking at THT now, it seems like cluster_setup_extras is empty now, which could be the reason why Corosync configure IPv4 by default.

Comment 2 Gilles Dubreuil 2016-01-13 23:53:18 UTC

Effectively when using IPv6 the cluster_setup_extras must bear the --ipv6 option.

For instance:

-------------
class {"pacemaker::corosync":
  cluster_name => "cluster_test",
  cluster_members => "one.pcs.tst two.pcs.tst three.pcs.tst",
  cluster_setup_extras => { '--ipv6' => '' },
}
--------------

With above, the cluster starts properly.
The option must be added to the deployment parameters.

Comment 4 Marius Cornea 2016-01-19 10:52:53 UTC

openstack-tripleo-heat-templates-0.8.6-106.el7ost.noarch

[root@overcloud-controller-0 ~]# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-01-19 05:30:34 EST; 22min ago
 Main PID: 25781 (corosync)
   CGroup: /system.slice/corosync.service
           └─25781 corosync

Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [QB    ] server name: votequorum
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [QB    ] server name: quorum
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [TOTEM ] adding new UDPU member {fd00:fd00:fd00:2000:f816:3eff:feeb:3100}
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [TOTEM ] A new membership (fd00:fd00:fd00:2000:f816:3eff:feeb:3100:4) was formed. Members joined: 1
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [QUORUM] Members[1]: 1
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]:  [MAIN  ] Completed service synchronization, ready to provide service.
Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25774]: Starting Corosync Cluster Engine (corosync): [  OK  ]
Jan 19 05:30:34 overcloud-controller-0.localdomain systemd[1]: Started Corosync Cluster Engine.

Comment 8 errata-xmlrpc 2016-02-18 16:48:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0264.html

Note You need to log in before you can comment on or make changes to this bug.