1. What is the nature and description of the request? Being RHV a complete virtualization solution, is it expected to be able to control the status of the virtualization infra and of its components, especially hypevisor, in great detail, in order to assist virt. admins in maintaning the infra and properly react in case of misbehaviour. This include the status of the network, of the networking ports of each hypevisor, like it has already been started doing with "network usage" alert. Any misbehaviour or hint of malfunction should be reported to RHV Manager that should be able to properly collect historical data of those metrics and eventually alert in case of threshold exceed (# of packet loss, TCP ack/nak/re-transmits) Those data should be exposed via API as well, in order to allow 3rd party virtualization monitoring tools to benefit from it (like IBM Tivoli Monitoring for Virtual Environments [0]) 2. Why would you need this? (List the business requirements here) In case of hardware fault, intermittent issues or misconfiguration on the network, RHV Admin should have the appropriate information in order to be able to spot anomalies in order to focus their troubleshooting efforts. 3. How would you like to achieve this? (List the functional requirements here) a. Display more detailed network statistics from the RHV Hosts directly on the RHV Portal b. Getting Alerts from RHV Manger in case of critical network misbehaviour c. Retrieving same data via RHV Manager API , in order to collect them by external monitoring tools 4. For each functional requirement listed, specify how you can test to confirm the requirement is successfully implemented. a) simulate a constant packet loss/retransmission (i.e. via netem[1]) and ensure that data is collected by RHV Manager and values saved into DWH b) in case of a) make that visible as an visible alert in the GUI and ensure is possible to instruct ovirt-notifier to send email about that event. c) make sure the #packet loss/retransmission instant data is exposed via API as well, like hypervisor CPU or memory 5. Do you have any specific timeline dependencies ? As soon as possible: this is a big deficiency that will be greatly beneficial for all RHV user 6. Would you be able to assist in testing this functionality if implemented? yes [0] https://www.ibm.com/support/pages/ibm-tivoli-monitoring-virtual-environment-linux-kernel-based-virtual-machines-agent [1] https://wiki.linuxfoundation.org/networking/netem#emulating_wide_area_network_delays
If the request is to get network failure notifications in the engine you may be interested in https://www.ovirt.org/develop/release-management/features/gluster/nagios-integration.html
> Being RHV a complete virtualization solution, is it expected to be able to control the status of the virtualization infra and of its components, especially hypevisor, in great detail, in order to assist virt. admins in maintaning the infra and properly react in case of misbehaviour. Why is it required to use RHV, instead of a dedicated monitoring software? Even if we would add full-featured network monitoring to RHV, this would not address monitoring other areas.
It's better to use dedicated monitoring software for advanced network monitoring, it doesn't make sense to reimplement that functionality in RHV.
Verified in ovirt-engine-4.4.7.6-0.11.el8ev.noarch ovirt-engine-dwh-4.4.7.3-1.el8ev.noarch The Rx/Tx dropped packets statistics are now collected and displayed in several dashboards in Grafana. See bug 1937714 comment 6 for details.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) security update [ovirt-4.4.7]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2865