Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1157224

Summary:

vdsm sometime reports an invalid nic speed of 2**32-1

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

GenadiC <gcheresh>

Component:

vdsm

Assignee:

Petr Horáček <phoracek>

Status:

CLOSED CURRENTRELEASE

QA Contact:

GenadiC <gcheresh>

Severity:

urgent

Docs Contact:

Priority:

medium

Version:

3.5.0

CC:

aberezin, bazulay, danken, ecohen, gcheresh, gklein, iheim, lpeer, lsurette, lvernia, myakove, oourfali, rbalakri, Rhev-m-bugs, yeylon

Target Milestone:

---

Target Release:

3.5.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

network

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-02-16 13:40:26 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Network

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
engine log	none
engine and vdsm logs	none

Description GenadiC 2014-10-26 10:36:02 UTC

Created attachment 950768 [details]
engine log

Description of problem:
Moving the Host between different DC/Cluster results in "Incorrect vdsm version for cluster" error.
It happened when running the automation tests (locally and jenkins)

Version-Release number of selected component (if applicable):


How reproducible:
Only by automation at this point

Steps to Reproduce:
1. Run Network Label automation Cases 15 or 16
2. Or try to move the host between different DC/Cluster version when there are labels on the Host interface
3.

Actual results:
ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-96) Failure to refresh Vds rose09.qa.lab.tlv.redhat.com runtime info. Incorrect vdsm version for cluster Global_Cluster0: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer


Expected results:
Moving the Host between supported Cluster should work

Additional info:

Comment 1 Lior Vernia 2014-10-26 19:14:08 UTC

How is this an automation blocker? It's an automated test the fails, it shouldn't block any other automated test.

Comment 2 Lior Vernia 2014-10-26 20:36:10 UTC

It's quite strange that this only happened after some moving around of the host and not always. However, what happens seems possible only if vdsm started returning NIC speed as long as part of getVdsCaps (at least in some scenario) - Dan, could this be the case? Possibly related to jsonrpc?

Comment 3 Dan Kenigsberg 2014-10-26 21:16:42 UTC

Sorry Lior, I don't understand your getVdsCaps hypothesis (or the bug). Could you elaborate?

Which Vdsm version is it? Which cluster level is involved?

Comment 4 Lior Vernia 2014-10-26 21:47:38 UTC

The ClassCastException in the engine log appears to be referring to code casting the NIC speed, extracted from the getVdsCaps dictionary, to an Integer (and apparently it's deserialized to Long).

Comment 5 GenadiC 2014-10-27 07:29:01 UTC

VDSM version - vdsm-4.16.7.1-1.el7.x86_64
Engine version 3.5.0-0.17.beta.el6ev
I was trying to move to 3.5 version Cluster from 3.5 Cluster

Comment 6 Gil Klein 2014-10-27 13:43:59 UTC

Ok, will remove the flag from this one.

Comment 7 Lior Vernia 2014-10-27 13:46:25 UTC

Doesn't sound like high priority due to the difficulty in reproduction.

Comment 8 Lior Vernia 2014-10-27 15:39:22 UTC

Genadi, could you please run the tests using xmlrpc to communicate with the host and let us know if that works?

Comment 9 GenadiC 2014-10-28 15:02:05 UTC

Lior, indeed it works without any problem with xmlrpc
Tested it twice and it worked, before that it failed every time

Comment 10 Lior Vernia 2014-10-28 15:16:51 UTC

Great, so it's either a general issue with how the engine now deserializes numbers passed from vdsm, or a specific issue with what's passed to vdsm as part of the interface speed in getVdsCaps (on jsonrpc). Still waiting to hear from Dan about the latter hypothesis.

Comment 11 Lior Vernia 2014-10-28 15:17:32 UTC

...and from Oved about the former :)

Comment 12 Oved Ourfali 2014-10-29 09:42:51 UTC

It seems like a specific issue with this one. For some reason it is caught in VdsUpdateRunTimeInfo, and results in the "Incorrect vdsm version" error, although it isn't related at all. Don't think it is a "generic" issue, but I guess Dan can respond on both hypothesis....

Comment 13 Dan Kenigsberg 2014-10-29 09:58:51 UTC

Could you provide vdsm.log (particularly the response to getCapabilities after adding the host to the new cluster).

Comment 14 GenadiC 2014-10-29 11:23:56 UTC

Created attachment 951753 [details]
engine and vdsm logs

Comment 15 Dan Kenigsberg 2014-10-30 21:16:32 UTC

Eureka: in one occasion, Vdsm returned a speed=2**32-1 for some reason. XMLRPC cannot carry this as number, so we would have seen a vdsm-side exception in that case. jsonrpc lets this through, and it explodes within Engine.

Thread-160::DEBUG::2014-10-29 07:13:32,809::__init__::498::jsonrpc.JsonRpcServer::(_serveRequest) Return 'Host.getCapabilities' in bridge with .... 'eno2': {'addr': '', 'cfg': {'DEVICE': 'eno2', 'HWADDR': 'd4:ae:52:b9:c0:c6', 'ONBOOT': 'yes', 'NM_CONTROLLED': 'no', 'MTU': '1500'}, 'ipv6addrs': [], 'mtu': '1500', 'netmask': '', 'ipv4addrs': [], 'hwaddr': 'd4:ae:52:b9:c0:c6', 'speed': 4294967295}

Can you attach connectivity.log? I wonder what shows up there.

Comment 16 Dan Kenigsberg 2014-10-30 21:34:10 UTC

http://gerrit.ovirt.org/4320 introduced this bug: if 2*32-1 is read from /sys/class/net/%s/speed it is passed as it is, since it's bigger than 0 (!)

Comment 17 GenadiC 2014-12-10 11:05:04 UTC

Verified in vt13.1