Bug 1298671 - Galera fails to start in ipv6 environment
Summary: Galera fails to start in ipv6 environment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: y3
: 7.0 (Kilo)
Assignee: Giulio Fidente
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-14 17:33 UTC by Marius Cornea
Modified: 2016-04-18 07:14 UTC (History)
12 users (show)

Fixed In Version: openstack-tripleo-heat-templates-0.8.6-99.el7ost
Doc Type: Bug Fix
Doc Text:
In an IPv6-based Overcloud, Galera failed to start due to issues with using an IPv6 address in configuration. This fix copnfigures the 'bind-address' parameter to use the hostname, which all nodes should have in their ''/etc/hosts' file. Galera now starts successfully in IPv6 Overclouds.
Clone Of:
Environment:
Last Closed: 2016-02-18 16:49:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
mysqld.log (1.43 KB, text/plain)
2016-01-14 17:33 UTC, Marius Cornea
no flags Details
galera.cnf (1.43 KB, text/plain)
2016-01-14 17:33 UTC, Marius Cornea
no flags Details
/var/log/messages (519.54 KB, text/plain)
2016-01-14 17:34 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 268267 0 None MERGED Bind Galera on a hostname for compat with IPv6 addresses 2020-09-10 00:44:10 UTC
Red Hat Product Errata RHBA-2016:0264 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OSP 7 director Bug Fix Advisory 2016-02-18 21:41:29 UTC

Description Marius Cornea 2016-01-14 17:33:31 UTC
Created attachment 1114889 [details]
mysqld.log

Description of problem:
Galera fails to start in ipv6 environment with error:
160114 17:25:21 [ERROR] WSREP: Can't parse port number from '4af4': 22 (Invalid argument)
	 at galerautils/src/gu_uri.cpp:parse_authority():69
160114 17:25:21 [ERROR] WSREP: wsrep::init() failed: 7, must shutdown
160114 17:25:21 [ERROR] Aborting


Version-Release number of selected component (if applicable):
I'm doing the test following the instructions in:
https://etherpad.openstack.org/p/tripleo-ipv6-support
and enabling pacemaker by passing an additional $THT/environments/puppet-pacemaker.yaml environment file


How reproducible:
100%

Steps to Reproduce:
1. Apply workarounds for BZ#1295986, BZ#1297850 and BZ#1298506
2. Deploy ipv6 enabled overcloud

Actual results:
Galera resource fails to start:

Master/Slave Set: galera-master [galera]
     galera	(ocf::heartbeat:galera):	FAILED Master overcloud-controller-0 (unmanaged)

Failed Actions:
* galera_promote_0 on overcloud-controller-0 'unknown error' (1): call=45, status=complete, exitreason='MySQL server failed to start (pid=8247) (rc=0), please check your installation',
    last-rc-change='Thu Jan 14 17:25:18 2016', queued=0ms, exec=4365ms


Expected results:
Galera resource starts

Additional info:
Attaching galera.cnf, mysqld.log and /var/log/messages.

Comment 1 Marius Cornea 2016-01-14 17:33:49 UTC
Created attachment 1114890 [details]
galera.cnf

Comment 2 Marius Cornea 2016-01-14 17:34:24 UTC
Created attachment 1114891 [details]
/var/log/messages

Comment 3 Marius Cornea 2016-01-14 18:49:50 UTC
This could be related to https://mariadb.atlassian.net/browse/MDEV-8034

I'm not sure what the  cause is but I was able to get it running by setting the bind address to the hostname in galera.cnf:

bind-address = overcloud-controller-0

Comment 4 Gilles Dubreuil 2016-01-15 06:02:56 UTC
Using a hostname seems the best work around attended there is a DNS resolution (or correct /etc/hosts file) in regards of IPv6.

It seems there is an issue with IPv6 address. 
IPv6 addresses should always be using brackets in cases where a port number could be used. Whether it's not provided and assumes a default or not.
So would it be possible to test your environment with an [::ipv6address] value?

Also, could you please provide all the parameters used for the installation?
So we can identify what value TrippleO Heat Templates has been passed to Galera Puppet module.

Comment 5 Marius Cornea 2016-01-15 08:17:44 UTC
(In reply to Gilles Dubreuil from comment #4)
> Using a hostname seems the best work around attended there is a DNS
> resolution (or correct /etc/hosts file) in regards of IPv6.
> 
> It seems there is an issue with IPv6 address. 
> IPv6 addresses should always be using brackets in cases where a port number
> could be used. Whether it's not provided and assumes a default or not.
> So would it be possible to test your environment with an [::ipv6address]
> value?

Setting the following in galera.cnf
bind-address = [fd00:fd00:fd00:2000:f816:3eff:fe94:c469]

fails with:
[ERROR] Can't create IP socket: Connection timed out
[ERROR] Aborting

> Also, could you please provide all the parameters used for the installation?
> So we can identify what value TrippleO Heat Templates has been passed to
> Galera Puppet module.

Here is the mysql_bind_host hiera value:
mysql_bind_host: fd00:fd00:fd00:2000:f816:3eff:fe94:c469

This is the deploy command:

export THT=/home/stack/templates/tripleo-heat-templates #checkout from https://review.openstack.org/#/c/235423/ 
openstack overcloud deploy --templates $THT \
-e $THT/environments/network-isolation-v6.yaml \
-e $THT/environments/net-single-nic-with-vlans.yaml \
-e $THT/environments/puppet-pacemaker.yaml \
-e /home/stack/templates/network-environment-v6.yaml \
--control-scale 1 \
--compute-scale 1 \
--neutron-network-type vxlan \
--neutron-tunnel-types vxlan \
--libvirt-type qemu 

Contens of network-environment-v6.yaml:
parameters:
  ServiceNetMap:
    NeutronTenantNetwork: tenant
    CeilometerApiNetwork: internal_api
    MongoDbNetwork: ctlplane # changed from storage_mgmt for ipv6 testing
    CinderApiNetwork: internal_api
    CinderIscsiNetwork: storage
    GlanceApiNetwork: storage
    GlanceRegistryNetwork: internal_api
    KeystoneAdminApiNetwork: ctlplane # allows undercloud to config endpoints
    KeystonePublicApiNetwork: internal_api
    NeutronApiNetwork: internal_api
    HeatApiNetwork: internal_api
    NovaApiNetwork: internal_api
    NovaMetadataNetwork: internal_api
    NovaVncProxyNetwork: internal_api
    SwiftMgmtNetwork: ctlplane # changed from storage_mgmt for ipv6 testing
    SwiftProxyNetwork: storage
    HorizonNetwork: internal_api
    MemcachedNetwork: internal_api
    RabbitMqNetwork: internal_api
    RedisNetwork: internal_api
    MysqlNetwork: internal_api
    CephClusterNetwork: storage_mgmt
    CephPublicNetwork: storage
    ControllerHostnameResolveNetwork: internal_api
    ComputeHostnameResolveNetwork: internal_api
    BlockStorageHostnameResolveNetwork: internal_api
    ObjectStorageHostnameResolveNetwork: internal_api
    CephStorageHostnameResolveNetwork: storage
  NeutronMetadataProxySharedSecret: marius

parameter_defaults:
  ControlPlaneSubnetCidr: "24"
  EC2MetadataIp: 192.0.2.1
  ControlPlaneDefaultRoute: 192.0.2.1
  ExternalInterfaceDefaultRoute: 2001:db8:fd00:1000:dead:beef:cafe:f00
  #DnsServers: ["2001:db8:fd00:1000:dead:beef:cafe:f00", "2001:db8:fd00:1000:dead:beef:cafe:f01"]

Comment 6 Hugh Brock 2016-01-15 09:18:10 UTC
Giulio, can you check on this? Dan thinks it may already be fixed.

Comment 7 Marius Cornea 2016-01-15 10:04:30 UTC
FWIW the issue was reported here for Galera:
https://bugs.launchpad.net/galera/+bug/1130595

Comment 8 Giulio Fidente 2016-01-15 16:01:17 UTC
Under investigation; by commenting out:

  bind-address

and adding:

  wsrep_provider_options = "gmcast.listen_addr=tcp://[::]:4567;"

where [::] is replaced by the desired binding address, we get galera to bind on the ipv6 address.

If bind-address is also set to an ipv6 address though the problem remains as described in the report:

  160115 15:48:45 [ERROR] WSREP: Can't parse port number from '8d64': 22 (Invalid argument) at galerautils/src/gu_uri.cpp:parse_authority():69

Comment 9 Damien Ciabrini 2016-01-15 16:49:19 UTC
When mysql is trying to bind to an ipv6 address with anything else than ::, it seems to fail here:
 https://github.com/atcurtis/mariadb/blob/5.5/sql/mysqld.cc#L2198 or
 https://github.com/atcurtis/mariadb/blob/5.5/sql/mysqld.cc#L2164

Investigating...

Comment 10 Javier Peña 2016-01-16 10:01:02 UTC
I've managed to get Galera to work on an IPv6 environment by manually setting up this way:

- In /etc/my.cnf.d/galera.cnf, set:

bind_address = <hostname>
wsrep_provider_options="gmcast.listen_addr=tcp://[::]:4567;"

Where <hostname> is an entry in /etc/hosts pointing to the server IPv6 address

- Then, in the pacemaker resource, set:

wsrep_cluster_address=gcomm://hacontroller1,hacontroller2,hacontroller3

Where hacontroller1, hacontroller2 and hacontroller3 are the /etc/hosts entries (or DNS) pointing to the IPv6 addresses.

With those, mysqld was listening on an IPv6 address, and wsrep communication succeeded.

Comment 12 Marius Cornea 2016-01-19 10:56:57 UTC
openstack-tripleo-heat-templates-0.8.6-106.el7ost.noarch

[root@overcloud-controller-0 ~]# pcs status | grep -A 1 galera
 Master/Slave Set: galera-master [galera]
     Masters: [ overcloud-controller-0 ]

Comment 14 errata-xmlrpc 2016-02-18 16:49:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0264.html


Note You need to log in before you can comment on or make changes to this bug.