Bug 1412012 - HPE [RFE] [Fault Management] Low latency fault detection and notification on failure
Summary: HPE [RFE] [Fault Management] Low latency fault detection and notification on ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-aodh
Version: unspecified
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: ga
: ---
Assignee: Aaron Smith
QA Contact: Aaron Smith
URL:
Whiteboard:
Depends On:
Blocks: 1476900 1521118
TreeView+ depends on / blocked
 
Reported: 2017-01-11 00:50 UTC by hrushi
Modified: 2019-05-17 16:58 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-17 16:58:44 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description hrushi 2017-01-11 00:50:26 UTC
Description of problem:

NFV use case that deals with availability and manageability of VNF needs immediate notification of unavailability of virtualized resources from VIM, to process recovery of VNFs on them.

A combination of workload high-availability with a choice of "notify only" using notification as a service is a critical ask from most of the telco service providers. VNF Manager after spawning a VNF subscribes to alarm engine to ensure if any virtualized resource failure impacts the VNF, it should be notified immediately. On notification VNF Manager can take appropriate action. A virtualization failure, in this scenario, targets a compute node failure. Few implementation uses ceilometer-aodh as alarming service with remote pacemaker clustering for compute node failure detection. 

This requirement suggests productizing this implementation through OSPd.

Comment 1 hrushi 2017-01-11 00:52:03 UTC
Additional Info:
https://wiki.opnfv.org/display/doctor/Doctor+Home

Comment 2 Franck Baudin 2017-01-20 14:06:00 UTC
We do have an on-going activity in OPNFV doctor and we are working on the end to end solution. But nothing will be ready for RHOSP12. I will let the experts comment.

Comment 3 Franck Baudin 2017-01-27 14:05:46 UTC
This is heavily under development upstream in OPNFV and won't be ready for product inclusion before being ready upstream. Meaning this is out of RHOSP12 scope, and I believe RHOSP13, let's reassess when upstream will be ready.

Comment 4 Franck Baudin 2017-01-27 14:11:10 UTC
Let's flag it for RHOSP13 as this is the further we can post-pone it and reassess in 4 months when scoping RHOSP13.

Comment 5 Aaron Smith 2017-01-30 13:12:49 UTC
(In reply to Franck Baudin from comment #2)
> We do have an on-going activity in OPNFV doctor and we are working on the
> end to end solution. But nothing will be ready for RHOSP12. I will let the
> experts comment.

We have been monitoring the Doctor and Barometer projects closely.  Our focus is on the monitoring and notification aspect of an NFV HA solution.  We hope to work with the Doctor and Barometer projects to define a monitoring and notification framework that operates within the performance constraints typical in NFV; failure detection + notification + repair < 50ms.  Failure detection will focus on NIC interface, Kernel, and VM failure on a node.  For node failure, discrimination of switch vs node failure with central theme.

Comment 6 Franck Baudin 2017-07-05 13:47:04 UTC
Won't be included in RHOSP13, covered by Red Hat SLA monitoring initiative and not Telemetry


Note You need to log in before you can comment on or make changes to this bug.