Bug 1042964

Summary: [RFE][oslo]: Automatic recovery from transient db connection failures
Product: Red Hat OpenStack Reporter: RHOS Integration <rhos-integ>
Component: RFEsAssignee: RHOS Maint <rhos-maint>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: markmc, yeylon
Target Milestone: Upstream M3Keywords: FutureFeature
Target Release: 5.0 (RHEL 7)   
Hardware: Unspecified   
OS: Unspecified   
URL: https://blueprints.launchpad.net/oslo/+spec/db-reconnect
Whiteboard: upstream_milestone_icehouse-3 upstream_status_implemented upstream_definition_approved
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-27 13:49:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description RHOS Integration 2013-12-13 16:48:35 UTC
Cloned from launchpad blueprint https://blueprints.launchpad.net/oslo/+spec/db-reconnect.

Description:

There are a variety of circumstances which can cause a transient failure in database connections, for example:
- restart / upgrade of the database,
- migration of VIP between HA pair,
- just a network failure
- and so on.

All projects, connected to a database, would benefit from the db/api catching these "db-has-gone-away" errors and automatically reconnecting and retrying the last operation, in such a way that the caller is able to continue what ever operation was in process.

It is not necessary to abort long-running operations (such as nova boot or glance image-create) just because of a momentary interruption in db connectivity.

A (slightly brute-force) patch was previously proposed to Nova: https://review.openstack.org/#/c/10797/

Current bp is similar to Nova blueprint, proposed by Devananda van der Veen. See https://blueprints.launchpad.net/nova/+spec/db-reconnect

Specification URL (additional information):

None

Comment 2 Stephen Gordon 2014-02-06 14:08:45 UTC
Updating based on BP milestone