Description of problem:
NFV use case that deals with availability and manageability of VNF needs immediate notification of unavailability of virtualized resources from VIM, to process recovery of VNFs on them.
A combination of workload high-availability with a choice of "notify only" using notification as a service is a critical ask from most of the telco service providers. VNF Manager after spawning a VNF subscribes to alarm engine to ensure if any virtualized resource failure impacts the VNF, it should be notified immediately. On notification VNF Manager can take appropriate action. A virtualization failure, in this scenario, targets a compute node failure. Few implementation uses ceilometer-aodh as alarming service with remote pacemaker clustering for compute node failure detection.
This requirement suggests productizing this implementation through OSPd.
We do have an on-going activity in OPNFV doctor and we are working on the end to end solution. But nothing will be ready for RHOSP12. I will let the experts comment.
This is heavily under development upstream in OPNFV and won't be ready for product inclusion before being ready upstream. Meaning this is out of RHOSP12 scope, and I believe RHOSP13, let's reassess when upstream will be ready.
Let's flag it for RHOSP13 as this is the further we can post-pone it and reassess in 4 months when scoping RHOSP13.
(In reply to Franck Baudin from comment #2)
> We do have an on-going activity in OPNFV doctor and we are working on the
> end to end solution. But nothing will be ready for RHOSP12. I will let the
> experts comment.
We have been monitoring the Doctor and Barometer projects closely. Our focus is on the monitoring and notification aspect of an NFV HA solution. We hope to work with the Doctor and Barometer projects to define a monitoring and notification framework that operates within the performance constraints typical in NFV; failure detection + notification + repair < 50ms. Failure detection will focus on NIC interface, Kernel, and VM failure on a node. For node failure, discrimination of switch vs node failure with central theme.
Won't be included in RHOSP13, covered by Red Hat SLA monitoring initiative and not Telemetry