|Summary:||Vdsm reports wrong NIC state, Error while sampling stats|
|Product:||Red Hat Enterprise Virtualization Manager||Reporter:||Michael Burman <mburman>|
|Component:||vdsm||Assignee:||Dan Kenigsberg <danken>|
|Status:||CLOSED ERRATA||QA Contact:||Michael Burman <mburman>|
|Version:||3.5.0||CC:||bazulay, danken, gklein, lpeer, lsurette, myakove, nyechiel, ybronhei, yeylon, ykaul|
|Fixed In Version:||vdsm-4.17.0-632.git19a83a2.el7.x86_64||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2016-03-09 19:27:50 UTC||Type:||Bug|
|oVirt Team:||Network||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Michael Burman 2014-12-14 08:03:59 UTC
Created attachment 968363 [details] vdsm-error while sampling Description of problem: Vdsm reports wrong NIC state, Error while sampling stats. After configuring ethtool on a host NIC(eth2) via GUI, eth2 reported as down, even after 'refresh capabilities'. - kernel reports NIC is up: ip a| grep eth2 eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 - in the event log eth2 was reported as down in 15:46 'Interface eth2 on host orange-vdsc.qa.lab.tlv.redhat.com, changed state to down' - connectivity.log report eth2 and eth2.164 as down - vdsStats report eth2 and eth2.164 as down, when there is no vlan actually attached to NIC any more. - vdsCaps report eth2 without vlan - In setupNetworks there is no network attached to eth2 NIC It seems that we have a race when an interface disappears while sampling its statistics. Thread-12::ERROR::2014-12-10 15:48:11,842::sampling::534::vds:run) Error while sampling stats Traceback (most recent call last): File "/usr/share/vdsm/virt/sampling.py", line 516, in run sample = self.sample() File "/usr/share/vdsm/virt/sampling.py", line 506, in sample hs = HostSample(self._pid) File "/usr/share/vdsm/virt/sampling.py", line 261, in __init__ (link.name, InterfaceSample(link)) for link in getLinks()) File "/usr/share/vdsm/virt/sampling.py", line 261, in <genexpr> (link.name, InterfaceSample(link)) for link in getLinks()) File "/usr/share/vdsm/virt/sampling.py", line 112, in __init__ self.speed = _getLinkSpeed(link) File "/usr/share/vdsm/virt/sampling.py", line 690, in _getLinkSpeed speed = netinfo.vlanSpeed(dev.name) File "/usr/lib/python2.6/site-packages/vdsm/netinfo.py", line 224, in vlanSpeed vlanDevName = getVlanDevice(vlanName) File "/usr/lib/python2.6/site-packages/vdsm/netinfo.py", line 756, in getVlanDevice vlanLink = getLink(vlan) File "/usr/lib/python2.6/site-packages/vdsm/ipwrapper.py", line 300, in getLink return Link.fromDict(netlink.get_link(dev)) File "/usr/lib/python2.6/site-packages/vdsm/netlink.py", line 66, in get_link name) IOError: [Errno 19] eth2.164 is not present in the system Version-Release number of selected component (if applicable): 3.5.0-0.23.beta.el6ev vdsm-220.127.116.11-2.el6ev.x86_64 Relevant host - orange-vdsc.qa.lab.tlv.redhat.com Upgrade engine- 10.35.161.37 Relevant time: :2014-12-10 15:46:11
Comment 1 Lior Vernia 2014-12-14 12:55:41 UTC
Marking this for 3.5.z as we don't know how common this race is, and it can be quite annoying for users to encounter it. Based on Ido's input I understand this bug was introduced in 3.5, so no need to backport further. Dan, feel free to override me :)
Comment 2 Eyal Edri 2015-02-25 08:45:35 UTC
3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2
Comment 3 Dan Kenigsberg 2015-02-25 09:26:07 UTC
The code has already been merged to the stable branch, and would be part of rhev-3.5.1. It solves a rare race, and has been tested not to cause regressions elsewhere. It does not need a specific z-stream QE.
Comment 4 Michael Burman 2015-04-21 05:40:25 UTC
Dan, On which version this bug should be tested? 3.6? Is vdsm-4.17.0-632.git19a83a2.el7.x86_64 includes this fix? Thanks,
Comment 5 Dan Kenigsberg 2015-04-21 09:11:05 UTC
to find where this was fixed in the master branch, take note of the fixing patch https://gerrit.ovirt.org/#/c/36138/. `git log --grep 36138 19a83a2` shows that indeed it exists in your 19a83a2 build.
Comment 6 Michael Burman 2015-04-21 14:59:19 UTC
Dan, i need the exact qa build version to test this. Thanks. Fixed in version must be provided when moving bugs to ON_QA. If we have a build for qa, then fixed in version must be set. We are not testing from nightly master any more.
Comment 7 Dan Kenigsberg 2015-04-21 16:27:37 UTC
As I said, vdsm-4.17.0-632.git19a83a2.el7.x86_64 includes the patch. I also explain how you can verify this yourself in the future.
Comment 8 Michael Burman 2015-04-22 08:02:51 UTC
Thank you Dan, I know i can verify this by my self, but it shouldn't be this way, this information must be set when moving bugs to ON_QA, specially, when there is a qa build. Verified on - 3.6.0-0.0.master.20150412172306.git55ba764.el6 with vdsm-4.17.0-632.git19a83a2.el7.x86_64
Comment 12 errata-xmlrpc 2016-03-09 19:27:50 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html