Bug 1378008 - Moving haproxy galera proxy VIP result in nova controller service to be down until sql timeout reached
Summary: Moving haproxy galera proxy VIP result in nova controller service to be down ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: mariadb-galera
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 7.0 (Kilo)
Assignee: Damien Ciabrini
QA Contact: Udi Shkalim
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-21 09:50 UTC by Martin Schuppert
Modified: 2022-08-09 14:17 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-20 08:29:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github PyMySQL PyMySQL pull 529 0 'None' closed Allow connecting to the DB from a specific network interface 2020-07-15 05:48:36 UTC
Launchpad 1643487 0 None None None 2017-02-02 09:44:33 UTC
Red Hat Issue Tracker OSP-8522 0 None None None 2022-08-09 14:17:37 UTC

Description Martin Schuppert 2016-09-21 09:50:11 UTC
Description of problem:
In a HA controller setup (deployed using director - OSP7/OSP8 tested), when the VIP used by haproxy galera proxy is being moved, the nova services on the controller where the IP previously was located go in down state until a SQL timeout is being reached.

Version-Release number of selected component (if applicable):
* Reported on OSP7, but also reproduced on OSP8 using python-nova-12.0.3-1.el7ost.noarch

How reproducible:
always

Steps to Reproduce:
1. move VIP to new controller
$ pcs resource move ip-172.16.2.5
2. check 'nova service-list', after configured timeout, nova-scheduler, nova-conductor and nova-consoleauth go into down state

Actual results:
* nova service in down state since 2016-09-21 09:26:54 
* nova-conductor service up again at 2016-09-21 09:44:33

* From nova conductor log with sqlalchemy debug enabled:
# egrep "MySQL|outlasted" /var/log/nova/nova-conductor.log
2016-09-21 09:24:38.258 15949 DEBUG oslo_db.sqlalchemy.engines [req-3f89cecb-38bf-4f32-bae4-8214756d5a89 - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py:256
2016-09-21 09:27:11.463 15948 INFO sqlalchemy.pool.QueuePool [req-dd51f26e-83de-4cc2-91af-ec33975e037d - - - - -] Invalidate connection <pymysql.connections.Connection object at 0x625b2d0> (reason: OperationalError:(2006, "MySQL server has gone away (error(110, 'Connection timed out'))"))
2016-09-21 09:44:33.737 15946 INFO sqlalchemy.pool.QueuePool [-] Invalidate connection <pymysql.connections.Connection object at 0x5c43d50> (reason: OperationalError:(2013, 'Lost connection to MySQL server during query ([Errno 110] Connection timed out)'))
2016-09-21 09:44:33.813 15946 WARNING oslo.service.loopingcall [-] Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 1039.21 sec
2016-09-21 09:44:34.761 15949 INFO sqlalchemy.pool.QueuePool [-] Invalidate connection <pymysql.connections.Connection object at 0x5c43d50> (reason: OperationalError:(2013, 'Lost connection to MySQL server during query ([Errno 110] Connection timed out)'))
2016-09-21 09:44:34.839 15949 WARNING oslo.service.loopingcall [-] Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 1040.28 sec
2016-09-21 09:44:38.857 15948 INFO sqlalchemy.pool.QueuePool [-] Invalidate connection <pymysql.connections.Connection object at 0x6240dd0> (reason: OperationalError:(2013, 'Lost connection to MySQL server during query ([Errno 110] Connection timed out)'))
2016-09-21 09:44:39.008 15948 WARNING oslo.service.loopingcall [-] Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 1044.49 sec
2016-09-21 09:44:42.953 15947 INFO sqlalchemy.pool.QueuePool [-] Invalidate connection <pymysql.connections.Connection object at 0x5c43d50> (reason: OperationalError:(2013, 'Lost connection to MySQL server during query ([Errno 110] Connection timed out)'))
2016-09-21 09:44:43.049 15947 WARNING oslo.service.loopingcall [-] Function 'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted interval by 1048.53 sec

Expected results:
controller services do not go into down state when VIP used by galers switch

Additional info:
* setting pool_timeout in nova.conf seem to have a positive impact on nova-conductor. When this is set e.g. to 30s we do not see nova-conductor to be reported as down
* pool_timeout seem to not have an impact on nova-scheduler

Comment 2 Martin Schuppert 2016-09-21 09:57:44 UTC
Forgot to mention that when we configure nova to not use the VIP for the haproxy galera proxy and instead go direct to one of the DB nodes we do not see this behavior when moving the VIP.

For Reference this is the haproxy galera proxy config:

global
  daemon
  group  haproxy
  log  /dev/log local0
  maxconn  20480
  pidfile  /var/run/haproxy.pid
  ssl-default-bind-ciphers  !SSLv2:kEECDH:kRSA:kEDH:kPSK:+3DES:!aNULL:!eNULL:!MD5:!EXP:!RC4:!SEED:!IDEA:!DES
  ssl-default-bind-options  no-sslv3
  user  haproxy

defaults
  log  global
  maxconn  4096
  mode  tcp
  retries  3
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 10s
  timeout  client 1m
  timeout  server 1m
  timeout  check 10s

...

listen mysql
  bind 172.16.2.5:3306 transparent  
  option tcpka
  option httpchk
  stick on dst
  stick-table type ip size 1000
  timeout client 90m
  timeout server 90m
  server overcloud-controller-0 172.16.2.6:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2
  server overcloud-controller-1 172.16.2.7:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2
  server overcloud-controller-2 172.16.2.8:3306 backup check fall 5 inter 2000 on-marked-down shutdown-sessions port 9200 rise 2

Comment 3 Diana Clarke 2016-09-22 19:40:32 UTC
I don't have enough happy-path experience using haproxy/galera/pacemaker/tripleo to help debug this much, but I wonder if auditing the differences in the recommended config vs the actual config might help.

- How can I verify my haproxy.cfg is correctly configured to load balance openstack services?
https://access.redhat.com/solutions/1599813

Here's a diff between the recommended haproxy config and the config pasted above. Do any of those differences jump out as problematic to anyone else with more experience?

  http://paste.openstack.org/show/582652/


global
  maxconn  20480
  ssl-default-bind-ciphers  !SSLv2:kEECDH:kRSA:kEDH:kPSK:+3DES:!aNULL:!eNULL:!MD5:!EXP:!RC4:!SEED:!IDEA:!DES
  ssl-default-bind-options  no-sslv3

defaults
  maxconn  4096

listen mysql
  bind 172.16.2.5:3306 transparent  
  option tcpka

Comment 4 Martin Schuppert 2016-09-23 08:42:52 UTC
(In reply to Diana Clarke from comment #3)
> I don't have enough happy-path experience using
> haproxy/galera/pacemaker/tripleo to help debug this much, but I wonder if
> auditing the differences in the recommended config vs the actual config
> might help.
> 
> - How can I verify my haproxy.cfg is correctly configured to load balance
> openstack services?
> https://access.redhat.com/solutions/1599813
> 
> Here's a diff between the recommended haproxy config and the config pasted
> above. Do any of those differences jump out as problematic to anyone else
> with more experience?
> 
>   http://paste.openstack.org/show/582652/
> 
> 
> global
>   maxconn  20480
>   ssl-default-bind-ciphers 
> !SSLv2:kEECDH:kRSA:kEDH:kPSK:+3DES:!aNULL:!eNULL:!MD5:!EXP:!RC4:!SEED:!IDEA:!
> DES
>   ssl-default-bind-options  no-sslv3
> 
> defaults
>   maxconn  4096
> 
> listen mysql
>   bind 172.16.2.5:3306 transparent  
>   option tcpka

Thanks for the review. It is possible to reproduce the issue on an empty idle overcloud, so connection limits should not matter here.
I have verified the tcpka and transparent option makes no difference on the behavior.

Comment 10 Ryan O'Hara 2016-09-23 18:08:45 UTC
(In reply to Martin Schuppert from comment #4)
> (In reply to Diana Clarke from comment #3)
> > I don't have enough happy-path experience using
> > haproxy/galera/pacemaker/tripleo to help debug this much, but I wonder if
> > auditing the differences in the recommended config vs the actual config
> > might help.
> > 
> > - How can I verify my haproxy.cfg is correctly configured to load balance
> > openstack services?
> > https://access.redhat.com/solutions/1599813
> > 
> > Here's a diff between the recommended haproxy config and the config pasted
> > above. Do any of those differences jump out as problematic to anyone else
> > with more experience?
> > 
> >   http://paste.openstack.org/show/582652/
> > 
> > 
> > global
> >   maxconn  20480
> >   ssl-default-bind-ciphers 
> > !SSLv2:kEECDH:kRSA:kEDH:kPSK:+3DES:!aNULL:!eNULL:!MD5:!EXP:!RC4:!SEED:!IDEA:!
> > DES
> >   ssl-default-bind-options  no-sslv3
> > 
> > defaults
> >   maxconn  4096
> > 
> > listen mysql
> >   bind 172.16.2.5:3306 transparent  
> >   option tcpka
> 
> Thanks for the review. It is possible to reproduce the issue on an empty
> idle overcloud, so connection limits should not matter here.
> I have verified the tcpka and transparent option makes no difference on the
> behavior.

I agree that 'tcpka' probably has no impact, but 'transparent' allows haproxy to bind to an IP address (VIP) that does not exist on the system. Best to leave that intact.

Comment 11 Martin Schuppert 2016-09-24 12:25:12 UTC
(In reply to Ryan O'Hara from comment #10)
> (In reply to Martin Schuppert from comment #4)
> > (In reply to Diana Clarke from comment #3)
> > > I don't have enough happy-path experience using
> > > haproxy/galera/pacemaker/tripleo to help debug this much, but I wonder if
> > > auditing the differences in the recommended config vs the actual config
> > > might help.
> > > 
> > > - How can I verify my haproxy.cfg is correctly configured to load balance
> > > openstack services?
> > > https://access.redhat.com/solutions/1599813
> > > 
> > > Here's a diff between the recommended haproxy config and the config pasted
> > > above. Do any of those differences jump out as problematic to anyone else
> > > with more experience?
> > > 
> > >   http://paste.openstack.org/show/582652/
> > > 
> > > 
> > > global
> > >   maxconn  20480
> > >   ssl-default-bind-ciphers 
> > > !SSLv2:kEECDH:kRSA:kEDH:kPSK:+3DES:!aNULL:!eNULL:!MD5:!EXP:!RC4:!SEED:!IDEA:!
> > > DES
> > >   ssl-default-bind-options  no-sslv3
> > > 
> > > defaults
> > >   maxconn  4096
> > > 
> > > listen mysql
> > >   bind 172.16.2.5:3306 transparent  
> > >   option tcpka
> > 
> > Thanks for the review. It is possible to reproduce the issue on an empty
> > idle overcloud, so connection limits should not matter here.
> > I have verified the tcpka and transparent option makes no difference on the
> > behavior.
> 
> I agree that 'tcpka' probably has no impact, but 'transparent' allows
> haproxy to bind to an IP address (VIP) that does not exist on the system.
> Best to leave that intact.

yes, makes sense, but no difference in this case.

Comment 21 Damien Ciabrini 2016-09-30 16:59:32 UTC
I am able to reproduce the failure reported in "nova service-list" on an OSP8 by moving VIP away from a node.

Before the move, all haproxy are up on all nodes, every haproxy listening to the mysql VIP.

In my test, at 15:55 I'm moving the VIP away from ctrl1 to ctrl0:
[root@overcloud-controller-1 heat-admin]# pcs resource move ip-172.16.2.5
Warning: Creating location constraint cli-ban-ip-172.16.2.5-on-overcloud-controller-1 with a score of -INFINITY for resource ip-172.16.2.5 on node overcloud-controller-1.
This will prevent ip-172.16.2.5 from running on overcloud-controller-1 until the constraint is removed. This will be the case even if overcloud-controller-1 is the last node in the cluster.

A few seconds later, various nova services are seen down with "nova service-list"

[root@overcloud-controller-1 heat-admin]# nova service-list
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary           | Host                               | Zone     | Status  | State | Updated_at                 | Disabled Reason |
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+
| 3  | nova-scheduler   | overcloud-controller-0.localdomain | internal | enabled | up    | 2016-09-30T16:00:34.000000 | -               |
| 6  | nova-scheduler   | overcloud-controller-2.localdomain | internal | enabled | up    | 2016-09-30T16:00:32.000000 | -               |
| 9  | nova-scheduler   | overcloud-controller-1.localdomain | internal | enabled | down  | 2016-09-30T15:55:06.000000 | -               |
| 12 | nova-consoleauth | overcloud-controller-1.localdomain | internal | enabled | down  | 2016-09-30T15:55:03.000000 | -               |
| 15 | nova-consoleauth | overcloud-controller-0.localdomain | internal | enabled | up    | 2016-09-30T16:00:33.000000 | -               |
| 18 | nova-consoleauth | overcloud-controller-2.localdomain | internal | enabled | up    | 2016-09-30T16:00:37.000000 | -               |
| 21 | nova-conductor   | overcloud-controller-1.localdomain | internal | enabled | down  | 2016-09-30T15:55:09.000000 | -               |
| 33 | nova-conductor   | overcloud-controller-2.localdomain | internal | enabled | up    | 2016-09-30T16:00:32.000000 | -               |
| 36 | nova-conductor   | overcloud-controller-0.localdomain | internal | enabled | up    | 2016-09-30T16:00:40.000000 | -               |
| 39 | nova-compute     | overcloud-compute-1.localdomain    | nova     | enabled | up    | 2016-09-30T16:00:35.000000 | -               |
| 42 | nova-compute     | overcloud-compute-0.localdomain    | nova     | enabled | up    | 2016-09-30T16:00:40.000000 | -               |
+----+------------------+------------------------------------+----------+---------+-------+----------------------------+-----------------+

Oddly, the instances of nova service which are reported as "down" are those located on the controller which originally hosted the VIP (ctrl1).

Nova services located on other nodes don't seem to be impacted by the VIP moves, so at this stage it doesn't seem like HAProxy is causing the failure of nova services.

We have captured tcpdump on both ctrl1 and ctrl0 to analyze traffic to the mysql VIP and see if failure could be related to TCP keepalive packets on nova service side, or if anything else is involved. Analysis is still ongoing.

Comment 25 Damien Ciabrini 2016-10-03 16:48:07 UTC
Update to comment #21,

So the failure of the nova services with long-running DB connection (consoleauth, scheduler, conductor) comes from the way the services initially connect to the VIP.

In the example from comment #21 (where the VIP is located on ctrl1), when service is started on ctrl a socket is created and connects to the mysql VIP.

The socket having a default source address of *, the kernel will select the source address with the most direct path to the destination address to connect to (i.e. the VIP). On ctrl1, this results in a connected socket with source address bound to VIP.

When the failover occurs, the VIP disappears from ctrl1. Eventually when a keepalive probe detects that, the kernel cannot send a RST packet to close the client side of the connection because the VIP is already gone.

Comment 29 Irina Petrova 2016-10-12 11:21:14 UTC
Guys, since we have some clarity on what's causing this peculiar behavior, do we know what's needed in order to get it fixed?

If I'm getting this correctly, it looks like an underlying issue that's creeping in all versions of Openstack?

Comment 30 Damien Ciabrini 2016-10-14 16:16:55 UTC
Irina,

When the DB VIP fails over from one node to another,the OpenStack services with long-running connection to the DB recreate automatically a new connection next time they need to access the DB.

The VIP failover failure which is impacting the OpenStack services colocated with the VIP (same host) is due to the inability on kernel-side to notify the clients of the socket disconnection because both the source and the destination IP of the socket are bound to the VIP.

So in order to fix this failure we need either other means to detect socket connection failure or means to prevent the socket to bound source IP to the VIP when service and VIP are collocated. We're currently evaluating the best options to resolve that issue.

As you said, this is more of a network concern, so it impacts all versions of OpenStack.

Comment 36 Damien Ciabrini 2016-11-15 17:28:07 UTC
It appears the best way to solve this issue both for IPv4 and IPv6 is to configure OpenStack clients to connect to the DB from a specific IP address, for example that from the control place network.

In order to achieve that, we need at least to:
  1) add support in PyMySQL upstream for binding to a specific source
     IP before connect()-ing to the DB
  2) update tripleo packages to generate the proper configuration flags
     for Openstack services.

For 1), I just created a pull request upstream [1]. Once it will get discussed and approved, we will design 2) accordingly. 

[1] https://github.com/PyMySQL/PyMySQL/pull/529

Comment 37 Michael Bayer 2016-11-16 17:38:52 UTC
So, one wrinkle is that oslo.db has an opinion about arguments that can pass through to create_engine().  I think we might have to change oslo.db to support this new argument (SQLAlchemy allows pass through options for the MySQL DBAPIs already).

Comment 38 Michael Bayer 2016-11-16 17:51:42 UTC
oh wait I think I'm mistaken.  We'd send this as a question mark variable in the URL and oslo will let that pass through:

   mysql+pymysql://user:pass@hostname/dbname?host=<hostname>

Comment 39 Damien Ciabrini 2017-02-02 09:44:33 UTC
OK, fix merged in all upstream branches, I'll now prepare the backport for downstream.

Comment 40 Damien Ciabrini 2017-02-09 08:57:03 UTC
As a heads up, unfortunately the fix merged upstream conflicts with Nova's cells_v2 feature in Ocata, so it has been reverted [1].

We will shortly propose another version of the patch, which still pursues the strategy from comment #36, but is deployed differently in tripleo.

[1] https://review.openstack.org/430183

Comment 41 Damien Ciabrini 2017-03-08 13:56:53 UTC
New fix has been merged upstream in Master [1] and Ocata [2].

It now needs to be adapted for Mitaka and Liberty. Once done, we'll focus on backporting those patches.


[1] https://review.openstack.org/431424
[2] https://review.openstack.org/436144

Comment 42 Damien Ciabrini 2018-07-20 08:29:07 UTC
So this bug is fixed starting OSP11.

The fix is a fairly big change in that it introduces new mysql config file and changes the DBURI in all OpenStack services' configs. 

While we could backport it down to OSP10, we believe that this is a bit risky as it could impact FFU from OSP10 -> OSP13.

Consequently I'm closing it now, feel free to reopen if you disagree with the reasoning.


Note You need to log in before you can comment on or make changes to this bug.