Bug 1877478 - [RFE] collect network metrics in DWH ( rx and tx drop )
Summary: [RFE] collect network metrics in DWH ( rx and tx drop )
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-dwh
Version: 4.4.1
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.4.7
: 4.4.7
Assignee: Aviv Litman
QA Contact: Pavel Novotny
URL:
Whiteboard:
Depends On:
Blocks: 1789090 1937714 1938964
TreeView+ depends on / blocked
 
Reported: 2020-09-09 17:43 UTC by Andrea Perotti
Modified: 2022-08-23 10:08 UTC (History)
15 users (show)

Fixed In Version: ovirt-engine-4.4.7 ovirt-engine-dwh-4.4.7.1
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-22 15:12:18 UTC
oVirt Team: Metrics
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1938964 1 medium CLOSED [Docs] Add rx and tx drop to data warehouse guide 2021-06-24 13:53:20 UTC
Red Hat Knowledge Base (Solution) 6087171 0 None None None 2021-05-31 14:22:26 UTC
Red Hat Product Errata RHSA-2021:2865 0 None None None 2021-07-22 15:12:56 UTC
oVirt gerrit 114187 0 master MERGED database: add network metrics to dwh 2021-05-21 06:18:38 UTC
oVirt gerrit 114188 0 master MERGED Add network metrics to dwh 2021-05-27 07:27:45 UTC

Internal Links: 1938964

Description Andrea Perotti 2020-09-09 17:43:35 UTC
1. What is the nature and description of the request?  

Being RHV a complete virtualization solution, is it expected to be able to control the status of the virtualization infra and of its components, especially hypevisor, in great detail, in order to assist virt. admins in maintaning the infra and properly react in case of misbehaviour.

This include the status of the network, of the networking ports of each hypevisor, like it has already been started doing with "network usage" alert. 
Any misbehaviour or hint of malfunction should be reported to RHV Manager that should be able to properly collect historical data of those metrics and eventually alert in case of threshold exceed (# of packet loss, TCP ack/nak/re-transmits)
Those data should be exposed via API as well, in order to allow 3rd party virtualization monitoring tools to benefit from it (like IBM Tivoli Monitoring for Virtual Environments [0])

2. Why would you need this? (List the business requirements here)

In case of hardware fault, intermittent issues or misconfiguration on the network, RHV Admin should have the appropriate information in order to be able to spot anomalies in order to focus their troubleshooting efforts. 

3. How would you like to achieve this? (List the functional requirements here)  

a. Display more detailed network statistics from the RHV Hosts directly on the RHV Portal
b. Getting Alerts from RHV Manger in case of critical network misbehaviour
c. Retrieving same data via RHV Manager API , in order to collect them by external monitoring tools


4. For each functional requirement listed, specify how you can test to confirm the requirement is successfully implemented.   

a) simulate a constant packet loss/retransmission (i.e. via netem[1]) and ensure that data is collected by RHV Manager and values saved into DWH
b) in case of a) make that visible as an visible alert in the GUI and ensure is possible to instruct ovirt-notifier to send email about that event.
c) make sure the #packet loss/retransmission instant data is exposed via API as well, like hypervisor CPU or memory

5. Do you have any specific timeline dependencies ?  

As soon as possible: this is a big deficiency that will be greatly beneficial for all RHV user 

6. Would you be able to assist in testing this functionality if implemented?

yes
 

[0] https://www.ibm.com/support/pages/ibm-tivoli-monitoring-virtual-environment-linux-kernel-based-virtual-machines-agent
[1] https://wiki.linuxfoundation.org/networking/netem#emulating_wide_area_network_delays

Comment 2 Sandro Bonazzola 2020-09-10 08:00:19 UTC
If the request is to get network failure notifications in the engine you may be interested in https://www.ovirt.org/develop/release-management/features/gluster/nagios-integration.html

Comment 3 Dominik Holler 2020-10-29 16:08:59 UTC
> Being RHV a complete virtualization solution, is it expected to be able to control the status of the virtualization infra and of its components, especially hypevisor, in great detail, in order to assist virt. admins in maintaning the infra and properly react in case of misbehaviour.

Why is it required to use RHV, instead of a dedicated monitoring software?
Even if we would add full-featured network monitoring to RHV, this would not address monitoring other areas.

Comment 6 Martin Perina 2021-02-04 13:31:24 UTC
It's better to use dedicated monitoring software for advanced network monitoring, it doesn't make sense to reimplement that functionality in RHV.

Comment 27 Pavel Novotny 2021-07-08 16:02:07 UTC
Verified in 
ovirt-engine-4.4.7.6-0.11.el8ev.noarch
ovirt-engine-dwh-4.4.7.3-1.el8ev.noarch

The Rx/Tx dropped packets statistics are now collected and displayed in several dashboards in Grafana. See bug 1937714 comment 6 for details.

Comment 31 errata-xmlrpc 2021-07-22 15:12:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) security update [ovirt-4.4.7]), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2865


Note You need to log in before you can comment on or make changes to this bug.