Description of problem: Corosync fails to start in an ipv6 deployment with the following error: [MAIN ] parse error in config: No interfaces defined [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1278. Version-Release number of selected component (if applicable): I'm doing the test following the instructions in: https://etherpad.openstack.org/p/tripleo-ipv6-support and enabling pacemaker by passing an additional $THT/environments/puppet-pacemaker.yaml environment file How reproducible: 100% Steps to Reproduce: 1. Deploy ipv6 enabled overcloud with pacemaker Actual results: Deployment eventually fails. Expected results: Deployment completes successfully. Additional info: This is the corosync.conf: totem { version: 2 secauth: off cluster_name: tripleo_cluster transport: udpu } nodelist { node { ring0_addr: overcloud-controller-0 nodeid: 1 } } quorum { provider: corosync_votequorum } logging { to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes } overcloud-controller-0 resolves to an ipv6 address: [root@overcloud-controller-0 ~]# ping6 -n -c1 overcloud-controller-0 PING overcloud-controller-0(fd00:fd00:fd00:2000:f816:3eff:fe45:bec3) 56 data bytes 64 bytes from fd00:fd00:fd00:2000:f816:3eff:fe45:bec3: icmp_seq=1 ttl=64 time=0.032 ms [root@overcloud-controller-0 ~]# systemctl status corosync ● corosync.service - Corosync Cluster Engine Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Tue 2016-01-12 14:46:01 UTC; 57min ago Process: 1004 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS) Process: 1255 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAILURE) Main PID: 814 (code=exited, status=0/SUCCESS) Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Starting Corosync Cluster Engine... Jan 12 14:46:01 overcloud-controller-0 corosync[1261]: [MAIN ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service. Jan 12 14:46:01 overcloud-controller-0 corosync[1261]: [MAIN ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow Jan 12 14:46:01 overcloud-controller-0 corosync[1261]: [MAIN ] parse error in config: No interfaces defined Jan 12 14:46:01 overcloud-controller-0 corosync[1261]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1278. Jan 12 14:46:01 overcloud-controller-0 corosync[1255]: Starting Corosync Cluster Engine (corosync): [FAILED] Jan 12 14:46:01 overcloud-controller-0 systemd[1]: corosync.service: control process exited, code=exited status=1 Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Failed to start Corosync Cluster Engine. Jan 12 14:46:01 overcloud-controller-0 systemd[1]: Unit corosync.service entered failed state. Jan 12 14:46:01 overcloud-controller-0 systemd[1]: corosync.service failed.
I think TripleO Heat Templates is missing some useful options to enable Corosync on the overcloud: https://github.com/redhat-openstack/puppet-pacemaker/blob/master/manifests/corosync.pp#L25-L27 Looking at THT now, it seems like cluster_setup_extras is empty now, which could be the reason why Corosync configure IPv4 by default.
Effectively when using IPv6 the cluster_setup_extras must bear the --ipv6 option. For instance: ------------- class {"pacemaker::corosync": cluster_name => "cluster_test", cluster_members => "one.pcs.tst two.pcs.tst three.pcs.tst", cluster_setup_extras => { '--ipv6' => '' }, } -------------- With above, the cluster starts properly. The option must be added to the deployment parameters.
openstack-tripleo-heat-templates-0.8.6-106.el7ost.noarch [root@overcloud-controller-0 ~]# systemctl status corosync ● corosync.service - Corosync Cluster Engine Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2016-01-19 05:30:34 EST; 22min ago Main PID: 25781 (corosync) CGroup: /system.slice/corosync.service └─25781 corosync Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]: [QB ] server name: votequorum Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]: [QB ] server name: quorum Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]: [TOTEM ] adding new UDPU member {fd00:fd00:fd00:2000:f816:3eff:feeb:3100} Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]: [TOTEM ] A new membership (fd00:fd00:fd00:2000:f816:3eff:feeb:3100:4) was formed. Members joined: 1 Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]: [QUORUM] Members[1]: 1 Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25781]: [MAIN ] Completed service synchronization, ready to provide service. Jan 19 05:30:34 overcloud-controller-0.localdomain corosync[25774]: Starting Corosync Cluster Engine (corosync): [ OK ] Jan 19 05:30:34 overcloud-controller-0.localdomain systemd[1]: Started Corosync Cluster Engine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0264.html