Bug 1409834 - Exception on VDSM after host becomes non-responsive
Summary: Exception on VDSM after host becomes non-responsive
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.1.0-beta
: 4.19.2
Assignee: Milan Zamazal
QA Contact: sefi litmanovich
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-03 15:08 UTC by Arik
Modified: 2017-02-01 14:34 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-02-01 14:34:12 UTC
oVirt Team: Virt
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: planning_ack+
tjelinek: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
vdsm log (15.19 MB, text/plain)
2017-01-03 15:08 UTC, Arik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 69555 0 master MERGED virt: Don't crash in Vm.getIoTunePolicy when ioTune is unavailable 2017-01-15 21:54:24 UTC
oVirt gerrit 70487 0 ovirt-4.1 MERGED virt: Don't crash in Vm.getIoTunePolicy when ioTune is unavailable 2017-01-17 11:12:58 UTC

Description Arik 2017-01-03 15:08:11 UTC
Created attachment 1236910 [details]
vdsm log

Description of problem:
Unplugging the cable from the host, waiting for it to be non-responsive and then plugging it back VDSM is down with an error.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
vdsm jsonrpc.JsonRpcServer ERROR Internal server error
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in _handle_request
 res = method(**params)
File "/usr/lib/python2.7/site-packages/vdsm/rpc/Bridge.py", line 202, in _dynamicMethod
 result = fn(*methodArgs)
File "/usr/share/vdsm/API.py", line 1410, in getAllVmIoTunePolicies
 io_tune_policies_dict = self._cif.getAllVmIoTunePolicies()
File "/usr/share/vdsm/clientIF.py", line 447, in getAllVmIoTunePolicies
 vm_io_tune_policies[v.id] = {'policy': v.getIoTunePolicy(),
File "/usr/share/vdsm/virt/vm.py", line 2730, in getIoTunePolicy
 io_tune = vmxml.find_first(qos, "ioTune", None)
File "/usr/share/vdsm/virt/vmxml.py", line 110, in find_first
 return next(find_all(element, tag))
File "/usr/share/vdsm/virt/vmxml.py", line 89, in find_all
 if tag(element) == tag_:
File "/usr/share/vdsm/virt/vmxml.py", line 148, in tag
 return element.tag
AttributeError: 'NoneType' object has no attribute 'tag'

Expected results:
VDSM is up

Additional info:

Comment 1 Dan Kenigsberg 2017-01-03 15:17:59 UTC
which precise vdsm.rpm is this?

Comment 2 Arik 2017-01-03 20:18:47 UTC
(In reply to Dan Kenigsberg from comment #1)
> which precise vdsm.rpm is this?

Version     : 4.20.0
Release     : 38.git59c645a.fc24
From repo   : ovirt-master-snapshot

Comment 3 Moran Goldboim 2017-01-04 09:46:43 UTC
I assume you meant network cable?
wasn't sure about the impact here? is vdsm down afterwards and not coming up?

thanks.

Comment 4 Arik 2017-01-04 09:48:46 UTC
(In reply to Moran Goldboim from comment #3)
> I assume you meant network cable?

Yes :) 

> wasn't sure about the impact here? is vdsm down afterwards and not coming up?

Right, it doesn't come up

Comment 5 Milan Zamazal 2017-01-04 10:07:53 UTC
I can't reproduce the problem (my host's network is only virtual after all) but I understand it can happen under certain circumstances and I believe the posted patch fixes it.

Comment 6 Milan Zamazal 2017-01-04 16:14:09 UTC
Arik, could you please verify the patch (http://gerrit.ovirt.org/69555) whether it fixes the problem?

Comment 7 Arik 2017-01-12 22:29:59 UTC
(In reply to Milan Zamazal from comment #6)
> Arik, could you please verify the patch (http://gerrit.ovirt.org/69555)
> whether it fixes the problem?

Yes, done.

Comment 8 sefi litmanovich 2017-01-26 11:20:29 UTC
Verified with rhevm-4.1.0.2-0.2.el7 and host: vdsm-4.19.2-2.el7ev.x86_64.
host is nested and I was unplugging the virtual cable, so I hope this is enough, but don't see reason why behaviour should be different.
After plugging back the network, host returned back to state 'up' and the error attached in the description doesn't appear.


Note You need to log in before you can comment on or make changes to this bug.