Bug 1366242

Summary: Network utilization is not calculated properly
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Daniel Horák <dahorak>
Component: coreAssignee: anmol babu <anbabu>
core sub component: monitoring QA Contact: Daniel Horák <dahorak>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: anbabu, asriram, kdreyer, mkudlej, nthomas, rghatvis, shtripat, vsarmila
Version: 2   
Target Milestone: ---   
Target Release: 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rhscon-agent-0.0.19-1.el7scon.noarch Doc Type: Bug Fix
Doc Text:
A full-duplex channel is available for communication in both directions simultaneously and hence the effective bandwidth is twice the actual bandwidth. This was not accounted for and as a consequence, the host network utilization crossed 100% if all, or some of its full-duplex channels were near-full or fully utilized. The underlying source code has been modified, and the network utilization is now calculated properly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-19 15:21:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1346350, 1353450, 1357777    
Attachments:
Description Flags
Network utilization charts shows nearly 200%. none

Description Daniel Horák 2016-08-11 11:15:29 UTC
Created attachment 1190029 [details]
Network utilization charts shows nearly 200%.

Description of problem:
  Network utilization is calculated from the sum of maximum speed of connected networks (which is used as 100% reference value) and from the sum of actual speed on each network card for both (rx and tx) directions.
  This leads to strange behavior because when the network is utilized in both directions, the utilization might be higher than 100%, in my case nearly 200%.

Version-Release number of selected component (if applicable):
  USM Server (RHEL 7.2):
  ceph-ansible-1.0.5-32.el7scon.noarch
  ceph-installer-1.0.14-1.el7scon.noarch
  libcollection-0.6.2-25.el7.x86_64
  rhscon-ceph-0.0.40-1.el7scon.x86_64
  rhscon-core-0.0.41-1.el7scon.x86_64
  rhscon-core-selinux-0.0.41-1.el7scon.noarch
  rhscon-ui-0.0.52-1.el7scon.noarch

  Ceph node (RHEL 7.2):
  calamari-server-1.4.8-1.el7cp.x86_64
  ceph-base-10.2.2-36.el7cp.x86_64
  ceph-common-10.2.2-36.el7cp.x86_64
  ceph-mon-10.2.2-36.el7cp.x86_64
  ceph-osd-10.2.2-36.el7cp.x86_64
  ceph-selinux-10.2.2-36.el7cp.x86_64
  collectd-ping-5.5.1-1.1.el7.x86_64
  collectd-5.5.1-1.1.el7.x86_64
  libcephfs1-10.2.2-36.el7cp.x86_64
  libcollection-0.6.2-25.el7.x86_64
  python-cephfs-10.2.2-36.el7cp.x86_64
  rhscon-agent-0.0.18-1.el7scon.noarch
  rhscon-core-selinux-0.0.41-1.el7scon.noarch

How reproducible:
  100%

Steps to Reproduce:
1. Prepare USM - Ceph cluster on real HW servers.
2. Utilize network between the hosts:
  On one server run iperf server:
    # iperf -s
  On second server run iperf as client for all available networks:
    # iperf -c CLUSTER_IP --dualtest --time 3600
    # iperf -c ACCESS_IP --dualtest --time 3600
  (CLUSTER_IP and ACCESS_IP are IP addresses or hostnames of the first server on different interfaces)
3. Wait for a while and check Host dashboard -> Network utilization charts.

Actual results:
  In my case (as you can see on the attached screenshot) the network is utilized for 199%.

Expected results:
  The network utilization shouldn't be higher than 100%.

Additional info:
  The easiest solution for this issue seems to be to use average from rx and tx speeds or for the "reference" (1OO%) value count the maximal speeds of all the connected networks in both directions.
  But I'm not sure that either of this solutions will be the best way how to show the utilization, because when I'll utilize just one direction, it will show 50%, but in reality it will be utilized to 100% in that direction and I'll not be able to utilize it more (in that direction).

  Also the behavior of the circular chart is strange for values higher than 100%.

Comment 2 anmol babu 2016-08-16 10:33:03 UTC
Looks good to me

Comment 5 Daniel Horák 2016-10-01 07:51:41 UTC
Tested and verified on:
  Red Hat Enterprise Linux Server release 7.3 (Maipo)

USM Server:
  ceph-ansible-1.0.5-34.el7scon.noarch
  ceph-installer-1.0.15-2.el7scon.noarch
  graphite-web-0.9.12-8.1.el7.noarch
  graphite2-1.3.6-1.el7_2.x86_64
  libcollection-0.6.2-27.el7.x86_64
  rhscon-ceph-0.0.43-1.el7scon.x86_64
  rhscon-core-0.0.45-1.el7scon.x86_64
  rhscon-core-selinux-0.0.45-1.el7scon.noarch
  rhscon-ui-0.0.59-1.el7scon.noarch
  salt-selinux-0.0.43-1.el7scon.noarch
  selinux-policy-3.13.1-102.el7.noarch
  selinux-policy-targeted-3.13.1-102.el7.noarch

Ceph OSD/MON node:
  calamari-server-1.4.8-1.el7cp.x86_64
  ceph-base-10.2.2-41.el7cp.x86_64
  ceph-common-10.2.2-41.el7cp.x86_64
  ceph-mon-10.2.2-41.el7cp.x86_64
  ceph-osd-10.2.2-41.el7cp.x86_64
  ceph-selinux-10.2.2-41.el7cp.x86_64
  collectd-ping-5.5.1-1.1.el7.x86_64
  collectd-5.5.1-1.1.el7.x86_64
  graphite2-1.3.6-1.el7_2.x86_64
  libcollection-0.6.2-27.el7.x86_64
  python-cephfs-10.2.2-41.el7cp.x86_64
  rhscon-agent-0.0.19-1.el7scon.noarch
  rhscon-core-selinux-0.0.45-1.el7scon.noarch
  salt-selinux-0.0.45-1.el7scon.noarch
  selinux-policy-targeted-3.13.1-102.el7.noarch
  selinux-policy-3.13.1-102.el7.noarch

SELinux in enforcing mode.

Tested only with full duplex interfaces.
It shows correct values in charts accordingly to algorithm described in comment 3 (for full duplex interface).

>> VERIFIED

Comment 8 anmol babu 2016-10-17 06:41:43 UTC
I have made some minor changes...

Comment 10 anmol babu 2016-10-18 09:25:13 UTC
Looks good to me

Comment 11 errata-xmlrpc 2016-10-19 15:21:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2082