Bug 1372294 - rx_dropped caused on host when it becomes one of the hosts in hosted-engine environment.
Summary: rx_dropped caused on host when it becomes one of the hosts in hosted-engine e...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 4.0.4
Hardware: x86_64
OS: Linux
unspecified
high vote
Target Milestone: ---
: ---
Assignee: meital avital
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-01 10:59 UTC by Nikolai Sednev
Modified: 2016-11-29 00:16 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-23 09:16:37 UTC
oVirt Team: Network
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
virtual media disconnected (498.15 KB, image/png)
2016-09-01 11:16 UTC, Nikolai Sednev
no flags Details

Description Nikolai Sednev 2016-09-01 10:59:05 UTC
Description of problem:
rx_dropped caused on host when it becomes one of the hosts in hosted-engine environment.
I see rx_dropped counter being incremented on hosts that became part of hosted-engine setups.
I've installed clean RHEL7.2 on pair of hosts, then checked on thir 10Gig interfaces rx_dropped counters and they were 0. Then I ran iperf between the hosts and rx_dropped counters remained 0. Then I've deployed hosted-engine on one of the hosts and saw that rx_dropped counter started to increment dropped packets. 
I ran iperf again between hosted-engine host and regular clean el7.2 host and saw that only on hosted-engine host the counter being incremented.
I've added the second clean el7.2 host as additional hosted-engine host and it began to increment the rx_dropped counter value.

[root@alma03 ~]# ethtool -S enp5s0f0 | grep rx_dropped
     rx_dropped: 0
[root@alma03 ~]# ethtool -S enp5s0f0 | grep rx_dropped
     rx_dropped: 0
[root@alma03 ~]# vi /etc/iscsi/initiatorname.iscsi 
[root@alma03 ~]# ethtool -S enp5s0f0 | grep rx_dropped
     rx_dropped: 9


Version-Release number of selected component (if applicable):
Hosts:
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
ovirt-hosted-engine-setup-2.0.2-1.el7ev.noarch
libvirt-client-1.2.17-13.el7_2.5.x86_64
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-host-deploy-1.5.2-1.el7ev.noarch
sanlock-3.2.4-3.el7_2.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-ha-2.0.3-1.el7ev.noarch
vdsm-4.18.12-1.el7ev.x86_64
ovirt-engine-sdk-python-3.6.8.0-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-imageio-daemon-0.3.0-0.el7ev.noarch
mom-0.5.5-1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
Linux version 3.10.0-327.36.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Aug 17 03:02:37 EDT 2016
Linux 3.10.0-327.36.1.el7.x86_64 #1 SMP Wed Aug 17 03:02:37 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Engine:
ovirt-engine-lib-4.0.4-0.1.el7ev.noarch
ovirt-host-deploy-java-1.5.2-1.el7ev.noarch
ovirt-engine-webadmin-portal-debuginfo-4.0.4-0.1.el7ev.noarch
ovirt-engine-restapi-4.0.4-0.1.el7ev.noarch
ovirt-engine-sdk-python-3.6.8.0-1.el7ev.noarch
ovirt-engine-dashboard-1.0.3-1.el7ev.x86_64
ovirt-engine-setup-plugin-ovirt-engine-4.0.4-0.1.el7ev.noarch
ovirt-iso-uploader-4.0.1-1.el7ev.noarch
ovirt-engine-userportal-4.0.4-0.1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
ovirt-vmconsole-proxy-1.0.4-1.el7ev.noarch
ovirt-engine-setup-base-4.0.4-0.1.el7ev.noarch
ovirt-engine-setup-plugin-websocket-proxy-4.0.4-0.1.el7ev.noarch
ovirt-engine-setup-4.0.4-0.1.el7ev.noarch
ovirt-image-uploader-4.0.1-1.el7ev.noarch
ovirt-engine-dbscripts-4.0.4-0.1.el7ev.noarch
ovirt-engine-dwh-setup-4.0.2-1.el7ev.noarch
ovirt-engine-4.0.4-0.1.el7ev.noarch
ovirt-engine-dwh-4.0.2-1.el7ev.noarch
ovirt-imageio-proxy-setup-0.3.0-0.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-4.0.4-0.1.el7ev.noarch
ovirt-host-deploy-1.5.2-1.el7ev.noarch
ovirt-engine-vmconsole-proxy-helper-4.0.4-0.1.el7ev.noarch
ovirt-engine-userportal-debuginfo-4.0.4-0.1.el7ev.noarch
ovirt-engine-tools-backup-4.0.4-0.1.el7ev.noarch
ovirt-engine-webadmin-portal-4.0.4-0.1.el7ev.noarch
ovirt-engine-tools-4.0.4-0.1.el7ev.noarch
ovirt-engine-cli-3.6.8.1-1.el7ev.noarch
ovirt-engine-setup-plugin-vmconsole-proxy-helper-4.0.4-0.1.el7ev.noarch
ovirt-log-collector-4.0.1-1.el7ev.noarch
python-ovirt-engine-sdk4-4.0.0-1.el7ev.x86_64
ovirt-engine-websocket-proxy-4.0.4-0.1.el7ev.noarch
ovirt-engine-extensions-api-impl-4.0.4-0.1.el7ev.noarch
ovirt-engine-backend-4.0.4-0.1.el7ev.noarch
ovirt-imageio-proxy-0.3.0-0.el7ev.noarch
ovirt-engine-extension-aaa-jdbc-1.1.0-1.el7ev.noarch
rhev-guest-tools-iso-4.0-5.el7ev.noarch
rhev-hypervisor7-7.2-20160209.2.bz1288237.el6ev.noarch
rhevm-setup-plugins-4.0.0.2-1.el7ev.noarch
rhev-release-4.0.4-1-001.noarch
rhevm-doc-4.0.0-3.el7ev.noarch
rhevm-guest-agent-common-1.0.12-3.el7ev.noarch
rhevm-branding-rhev-4.0.0-5.el7ev.noarch
rhev-release-4.0.2-9-001.noarch
rhevm-4.0.4-0.1.el7ev.noarch
rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch
rhev-release-4.0.3-1-001.noarch
rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch
rhevm-dependencies-4.0.0-1.el7ev.noarch
rhev-release-4.0.1-2-001.noarch
Linux version 3.10.0-327.36.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Wed Aug 17 03:02:37 EDT 2016
Linux 3.10.0-327.36.1.el7.x86_64 #1 SMP Wed Aug 17 03:02:37 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

How reproducible:
100%

Steps to Reproduce:
1.Reprovision 2 clean el7.2 hosts.
2.Run iperf between two hosts.
3.Check on both hosts the >ethtool -S <ifname> | grep rx_dropped".
4.rx_dropped should be zero on both hosts.
5.Install and deploy hosted-engine on first host using iSCSI storage domain for both HE-VM and for data storage domain.
6.Check again "ethtool -S <ifname> | grep rx_dropped" on hosted-engine host, value of the counter starts incrementing.
7.Run iperf between hosted-engine host and remained clean el7.2 host.
8.Check again "ethtool -S <ifname> | grep rx_dropped" on both hosts, you will see that on hosted-engine host counter being incremented continuously, while on clean el7.2 counter is still zero.
9.Add remaining clean el7.2 host as additional hosted-engine host and check "ethtool -S <ifname> | grep rx_dropped" on it.
10.On additionally added hosted-engine host from step 9, you will now see that rx_dropped counter starts counting dropped packets.

Actual results:
rx_dropped counter shows that on hosted-engine hosts there is a packet loss.

Expected results:
rx_dropped should always remain zero, regardless of its hosted-engine host or regular bare-metal host without virtualization on it.

Additional info:

Comment 1 Nikolai Sednev 2016-09-01 11:14:45 UTC
To give you some more realistic scenario in which this bug causing trouble to our customers, I'll provide you with the screen-shot of me, being disconnected from virtual media session, which I was running from VM to bare-metal server, during which I was trying to install OS on bare-metal host from my VM and was disconnected, as there are bunch of rx_dropped packets and they're influencing real-time applications, like virtual media OS installations.

Comment 2 Nikolai Sednev 2016-09-01 11:16:09 UTC
Created attachment 1196739 [details]
virtual media disconnected

Comment 20 Yaniv Kaul 2016-09-07 12:19:46 UTC
It's clear it's not a RHEV/oVirt problem. Please move it away from it.

Comment 21 Nikolai Sednev 2016-09-08 05:42:50 UTC
(In reply to Yaniv Kaul from comment #20)
> It's clear it's not a RHEV/oVirt problem. Please move it away from it.

Not clear to me why it's not a RHEVM problem, as until I'm turning the host to be part of the virtualization environment it's working just fine.

Comment 32 Dan Kenigsberg 2016-11-23 09:16:37 UTC
Nikolai, apparently, rx_dropped has nothing to do with the communication failure that you report. Thus, we'd be closing this bug.

However, if your problem persist, someone needs to look into it deeper. Please open a new bug (on kernel networking) after you can give more exact information about the communication failure.

I hope Eric can help solve the new bug once it is open.

Comment 33 Nikolai Sednev 2016-11-23 10:02:51 UTC
Cloned to https://bugzilla.redhat.com/show_bug.cgi?id=1397742


Note You need to log in before you can comment on or make changes to this bug.