Bug 1041084 - [RFE][nova]: Automatic recovery from transient db connection failures
Summary: [RFE][nova]: Automatic recovery from transient db connection failures
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: rc
: 5.0 (RHEL 7)
Assignee: RHOS Maint
QA Contact: Ami Jeain
URL: https://blueprints.launchpad.net/nova...
Whiteboard: upstream_milestone_icehouse-rc1 upstr...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-12 13:35 UTC by RHOS Integration
Modified: 2019-09-09 14:46 UTC (History)
6 users (show)

Fixed In Version: openstack-nova-2014.1-3.el7ost
Doc Type: Enhancement
Doc Text:
Transient database-connection failures are now recovered automatically. There are a variety of circumstances which can cause a transient failure in database connection (for example, the restart or upgrade of the database, migration of VIP between an HA pair, or a network failure). Compute now catches these "db-has-gone-away" errors by automatically reconnecting and retrying the last operation in such a way that the caller is able to continue whatever operation was in progress. The user no longer has to abort long-running operations (such as 'nova boot' or 'glance image-create') just because of a momentary interruption in database connectivity.
Clone Of:
Environment:
Last Closed: 2014-07-08 15:27:16 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0853 0 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform Enhancement - Compute 2014-07-08 19:22:38 UTC

Description RHOS Integration 2013-12-12 13:35:56 UTC
Cloned from launchpad blueprint https://blueprints.launchpad.net/nova/+spec/db-reconnect.

Description:

There are a variety of circumstances which can cause a transient failure in database connections, for example: restart / upgrade of the database, migration of VIP between HA pair, or just a network failure. Nova (and all projects connecting to a database) would benefit from the db/api catching these "db-has-gone-away" errors and automatically reconnecting and retrying the last operation, in such a way that the caller is able to continue what ever operation was in process. It is not necessary to abort long-running operations (such as nova boot or glance image-create) just because of a momentary interruption in db connectivity.

A (slightly brute-force) patch was previously proposed: https://review.openstack.org/#/c/10797/. To enable retries safely, more work is probably going to be required.

Specification URL (additional information):

None

Comment 8 errata-xmlrpc 2014-07-08 15:27:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0853.html


Note You need to log in before you can comment on or make changes to this bug.