Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1104957

Summary: unable to turn off max_connect_errors
Product: Red Hat Enterprise Linux 7 Reporter: Fabio Massimo Di Nitto <fdinitto>
Component: mariadbAssignee: Honza Horak <hhorak>
Status: CLOSED NOTABUG QA Contact: qe-baseos-daemons
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: databases-maint, fdinitto
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-11 11:48:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1083890    

Description Fabio Massimo Di Nitto 2014-06-05 06:03:38 UTC
Description of problem:

When deploying mariadb in High Availability and LoadBalacend environment, all LB hosts will perform regular checks to verify that mariadb is listening on a given host:port.

The connection check will simply establish that the network socket is listening but will not do a full / complete mysql login/check.

mariadb starts logging those as incomplete connections that sooner or later will result in the LB host to be banned from connecting.

Even raising the value of max_connect_errors to big big num, does not help since it's entirely possible that no real connections will arrive from a given LB host during that time frame.

This clearly creates a runtime/operational problem in a failure event where a new LB host takes over dispatching connections to the db.

I can't find a way to disable this behaviour and it is critical for OpenStack deployments.

We can potentially workaround it, but i'd rather see a real fix.

Comment 2 Honza Horak 2014-06-06 14:26:17 UTC
Well, the way how upstream workarounds it is the following (http://lists.mysql.com/mysql/161223):
To disable for practical purposes, set it to 2^32-1 = 4294967295. On top, once a 
day run FLUSH HOSTS

However, I'll try to ask mariadb upstream, if they would accept a new-option solution.

I've tried to simulate the described behaviour by random not-proper work with the socket, but I'm still not able to achieve this. Can you, please, provide some reproducer or a code snippet which you use for performing the regular checks?

Comment 3 Fabio Massimo Di Nitto 2014-06-06 14:32:08 UTC
(In reply to Honza Horak from comment #2)
> Well, the way how upstream workarounds it is the following
> (http://lists.mysql.com/mysql/161223):
> To disable for practical purposes, set it to 2^32-1 = 4294967295. On top,
> once a 
> day run FLUSH HOSTS

Right, I set at 10000000 or something. I am having issues to setup a custom cron just FLUSH HOSTS tho.

Would you consider shipping a "disabled" cron job by default that checks for /etc/sysconfig/mariadb for CLEAN_HOSTS=yes and set proper values automatically?

> 
> However, I'll try to ask mariadb upstream, if they would accept a new-option
> solution.
> 
> I've tried to simulate the described behaviour by random not-proper work
> with the socket, but I'm still not able to achieve this. Can you, please,
> provide some reproducer or a code snippet which you use for performing the
> regular checks?

I am using ha-proxy to check the mariadb network socket every second.

You can take a look at ha-proxy config here:

http://rhel-ha.etherpad.corp.redhat.com/RHOS-RHEL-HA-how-to-mrgcloud-rhos5-on-rhel7-lb

and mariadb:

http://rhel-ha.etherpad.corp.redhat.com/RHOS-RHEL-HA-how-to-mrgcloud-rhos5-on-rhel7-db

don´t get too scared by the cluster setup :) the db is running only one machine.

The lb nodes will poll on the node running mariadb every sec.

If possible at all i´d prefer not to rely on timer changes since they don´t solve the problem permanently, but just expand the window.

Comment 4 Honza Horak 2014-06-06 14:40:47 UTC
(In reply to Fabio Massimo Di Nitto from comment #3)
> (In reply to Honza Horak from comment #2)
> > Well, the way how upstream workarounds it is the following
> > (http://lists.mysql.com/mysql/161223):
> > To disable for practical purposes, set it to 2^32-1 = 4294967295. On top,
> > once a 
> > day run FLUSH HOSTS
> 
> Right, I set at 10000000 or something. I am having issues to setup a custom
> cron just FLUSH HOSTS tho.
> 
> Would you consider shipping a "disabled" cron job by default that checks for
> /etc/sysconfig/mariadb for CLEAN_HOSTS=yes and set proper values
> automatically?

This seems to me like too big over-engineering. I'd be more willing to either add a new option or better to change the behavior only to not perform the error check in case the max_connect_errors is set to 0.

> > However, I'll try to ask mariadb upstream, if they would accept a new-option
> > solution.
> > 
> > I've tried to simulate the described behaviour by random not-proper work
> > with the socket, but I'm still not able to achieve this. Can you, please,
> > provide some reproducer or a code snippet which you use for performing the
> > regular checks?
> 
> I am using ha-proxy to check the mariadb network socket every second.
> 
> You can take a look at ha-proxy config here:
> 
> http://rhel-ha.etherpad.corp.redhat.com/RHOS-RHEL-HA-how-to-mrgcloud-rhos5-
> on-rhel7-lb
> 
> and mariadb:
> 
> http://rhel-ha.etherpad.corp.redhat.com/RHOS-RHEL-HA-how-to-mrgcloud-rhos5-
> on-rhel7-db
> 
> don´t get too scared by the cluster setup :) the db is running only one
> machine.
> 
> The lb nodes will poll on the node running mariadb every sec.
> 
> If possible at all i´d prefer not to rely on timer changes since they don´t
> solve the problem permanently, but just expand the window.

Understood. I'll take a look at it more closely and will let you know. I guess, since there is a workaround (not very nice but should work), this is not something that would block you totally, right?

Comment 5 Fabio Massimo Di Nitto 2014-06-06 14:43:50 UTC
(In reply to Honza Horak from comment #4)
> (In reply to Fabio Massimo Di Nitto from comment #3)
> > (In reply to Honza Horak from comment #2)
> > > Well, the way how upstream workarounds it is the following
> > > (http://lists.mysql.com/mysql/161223):
> > > To disable for practical purposes, set it to 2^32-1 = 4294967295. On top,
> > > once a 
> > > day run FLUSH HOSTS
> > 
> > Right, I set at 10000000 or something. I am having issues to setup a custom
> > cron just FLUSH HOSTS tho.
> > 
> > Would you consider shipping a "disabled" cron job by default that checks for
> > /etc/sysconfig/mariadb for CLEAN_HOSTS=yes and set proper values
> > automatically?
> 
> This seems to me like too big over-engineering. I'd be more willing to
> either add a new option or better to change the behavior only to not perform
> the error check in case the max_connect_errors is set to 0.

Good point :) whatever works for you as long as we achieve the goal.

> 
> > > However, I'll try to ask mariadb upstream, if they would accept a new-option
> > > solution.
> > > 
> > > I've tried to simulate the described behaviour by random not-proper work
> > > with the socket, but I'm still not able to achieve this. Can you, please,
> > > provide some reproducer or a code snippet which you use for performing the
> > > regular checks?
> > 
> > I am using ha-proxy to check the mariadb network socket every second.
> > 
> > You can take a look at ha-proxy config here:
> > 
> > http://rhel-ha.etherpad.corp.redhat.com/RHOS-RHEL-HA-how-to-mrgcloud-rhos5-
> > on-rhel7-lb
> > 
> > and mariadb:
> > 
> > http://rhel-ha.etherpad.corp.redhat.com/RHOS-RHEL-HA-how-to-mrgcloud-rhos5-
> > on-rhel7-db
> > 
> > don´t get too scared by the cluster setup :) the db is running only one
> > machine.
> > 
> > The lb nodes will poll on the node running mariadb every sec.
> > 
> > If possible at all i´d prefer not to rely on timer changes since they don´t
> > solve the problem permanently, but just expand the window.
> 
> Understood. I'll take a look at it more closely and will let you know. I
> guess, since there is a workaround (not very nice but should work), this is
> not something that would block you totally, right?

It can potentially block Openstack 5 release because it require ha-proxy to monitor the db. I´ll check with the OSP guys if we can deploy a workaround while we look into a fix.

Thanks!

Comment 6 Honza Horak 2014-06-06 14:51:48 UTC
(In reply to Fabio Massimo Di Nitto from comment #5)
> It can potentially block Openstack 5 release because it require ha-proxy to
> monitor the db. I´ll check with the OSP guys if we can deploy a workaround
> while we look into a fix.

Well, setting max possible value (4294967295 for 32bit, 64bit arch should accept even bigger) means the window would be 1 years when you have permanently 100 connections per second. That should not be a blocker imho, but I may miss some consequences. I will definitely try to solve it properly though.

Comment 7 Honza Horak 2014-06-09 15:47:44 UTC
Steps to reproduce blocking:
1. set max_connect_errors=2 in /etc/my.cnf
2. configure mariadb so it accepts connection from different machine
3. on another machine run `for i in {1..3} ; do echo "" | telnet mariadbserver 3306 ; done`
4. mysql -h mariadbserver

Comment 8 Honza Horak 2014-06-09 15:49:23 UTC
Upstream contacted:
https://lists.launchpad.net/maria-developers/msg07355.html

Comment 9 Honza Horak 2014-06-11 07:30:40 UTC
There was quite interesting comment on the upstream mailing list -- "using '--skip-name-resolve' should bypass the max_connect_errors mechanism altogether."

Fabio, could you, please, provide your feedback, if this would be usable in your use case?

Comment 10 Fabio Massimo Di Nitto 2014-06-11 08:38:40 UTC
(In reply to Honza Horak from comment #9)
> There was quite interesting comment on the upstream mailing list -- "using
> '--skip-name-resolve' should bypass the max_connect_errors mechanism
> altogether."
> 
> Fabio, could you, please, provide your feedback, if this would be usable in
> your use case?

Hi Honza,

as long as it´s a configuration option I see no problem with it. Do you know if it can be added to my.cfg or does it need to be on command line? either should work but i generally prefer to keep it all in .cfg if possible (for consistency and avoid to remember what´s from where).

Thanks a lot for all your help btw. it´s very much appreciated.

Fabio

Comment 11 Honza Horak 2014-06-11 11:08:38 UTC
(In reply to Fabio Massimo Di Nitto from comment #10)
> as long as it´s a configuration option I see no problem with it. Do you know
> if it can be added to my.cfg or does it need to be on command line? either
> should work but i generally prefer to keep it all in .cfg if possible (for
> consistency and avoid to remember what´s from where).

It can be used as both, so for configuration option just add:
  skip-name-resolve=1
into your my.cnf and you should be set.

Doc for this option is here:
http://dev.mysql.com/doc/refman/5.5/en/server-options.html#option_mysqld_skip-name-resolve

(In reply to Fabio Massimo Di Nitto from comment #0)
> The connection check will simply establish that the network socket is
> listening but will not do a full / complete mysql login/check.

Well, I'd like to go back to the original issue, because the way you check if a server is up does not seem correct. Actually, we do something similar in SysV init script/systemd unit file after daemon is started, so the script returns no sooner than the daemon is really able to accept connections.

The way how we do it is to run `mysqladmin ping` and then we take either success or failure with 'Access denied for user' error as a sign that the server *is* ready:
http://pkgs.fedoraproject.org/cgit/mariadb.git/tree/mariadb-wait-ready#n39

You may consider using something similar to check vitality of a server, instead of current approach.

Also, in case you find any of the solutions above is good enough for you, please, close this request.

Comment 12 Fabio Massimo Di Nitto 2014-06-11 11:48:18 UTC
(In reply to Honza Horak from comment #11)
> (In reply to Fabio Massimo Di Nitto from comment #10)
> > as long as it´s a configuration option I see no problem with it. Do you know
> > if it can be added to my.cfg or does it need to be on command line? either
> > should work but i generally prefer to keep it all in .cfg if possible (for
> > consistency and avoid to remember what´s from where).
> 
> It can be used as both, so for configuration option just add:
>   skip-name-resolve=1
> into your my.cnf and you should be set.
> 
> Doc for this option is here:
> http://dev.mysql.com/doc/refman/5.5/en/server-options.
> html#option_mysqld_skip-name-resolve
> 
> (In reply to Fabio Massimo Di Nitto from comment #0)
> > The connection check will simply establish that the network socket is
> > listening but will not do a full / complete mysql login/check.
> 
> Well, I'd like to go back to the original issue, because the way you check
> if a server is up does not seem correct.

This is not something I decided or implemented :) I am a mortal user of the whole thing ;)

> Actually, we do something similar
> in SysV init script/systemd unit file after daemon is started, so the script
> returns no sooner than the daemon is really able to accept connections.
> 
> The way how we do it is to run `mysqladmin ping` and then we take either
> success or failure with 'Access denied for user' error as a sign that the
> server *is* ready:
> http://pkgs.fedoraproject.org/cgit/mariadb.git/tree/mariadb-wait-ready#n39
> 
> You may consider using something similar to check vitality of a server,
> instead of current approach.
> 
> Also, in case you find any of the solutions above is good enough for you,
> please, close this request.

It's not an option to change the way we check, but it's not a problem either because clients would happily reconnected to the server when it dies.

I am closing this bug, but it might be worth making it a documentation note for the future.

Comment 13 Honza Horak 2014-06-13 08:27:49 UTC
(In reply to Fabio Massimo Di Nitto from comment #12)
> It's not an option to change the way we check, but it's not a problem either
> because clients would happily reconnected to the server when it dies.
> 
> I am closing this bug, but it might be worth making it a documentation note
> for the future.

So, which way have you chosen to go with in the end? Setting the skip-name-resolve?

Comment 14 Fabio Massimo Di Nitto 2014-06-13 08:33:01 UTC
(In reply to Honza Horak from comment #13)
> (In reply to Fabio Massimo Di Nitto from comment #12)
> > It's not an option to change the way we check, but it's not a problem either
> > because clients would happily reconnected to the server when it dies.
> > 
> > I am closing this bug, but it might be worth making it a documentation note
> > for the future.
> 
> So, which way have you chosen to go with in the end? Setting the
> skip-name-resolve?

[mysqld]
skip-name-resolve=1

Yes, even tho the changes it involves in the user table needs to be adjusted, but it´s the fastest solution we have at the moment.

I would still like to see a "max_connect_errors = 0" and retain the user ACL via host entries.