Bug 1018364

Summary: can't contact vdsm after storage issues
Product: Red Hat Enterprise Virtualization Manager Reporter: Amador Pahim <asegundo>
Component: vdsmAssignee: Sergey Gotliv <sgotliv>
Status: CLOSED CURRENTRELEASE QA Contact: Aharon Canan <acanan>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: amureini, asegundo, bazulay, fromani, iheim, lpeer, scohen, sgotliv, tnisan, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: All   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-07 07:20:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Amador Pahim 2013-10-11 18:55:39 UTC
Description of problem:
After some errors contacting storage, RHEV Manager could not contact Hypervisor until a vdsm service restart. Hypervisor become Unresponsive.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130709.0.el6_4)
vdsm-4.10.2-23.0.el6ev.x86_64

How reproducible:
Not reproduced. Seems like one time issue. 
Maybe related with https://bugzilla.redhat.com/show_bug.cgi?id=871355? (but no defunct processes created)

Additional info:

Relevant logs at the issue time:

Thread-641739::ERROR::2013-09-18 21:51:32,875::domainMonitor::225::Storage.DomainMonitorThread::(_monitorDomain) Error while collecting domain e310a1af-5aa2-4371-b3c6-dbf36d6cbc50 monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 201, in _monitorDomain
  File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
  File "/usr/share/vdsm/storage/nfsSD.py", line 127, in findDomain
  File "/usr/share/vdsm/storage/nfsSD.py", line 117, in findDomainPath
StorageDomainDoesNotExist: Storage domain does not exist: (u'e310a1af-5aa2-4371-b3c6-dbf36d6cbc50',)

...

BindingXMLRPC::ERROR::2013-09-18 21:54:35,915::BindingXMLRPC::72::vds::(threaded_start) xml-rpc handler exception
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 68, in threaded_start
  File "/usr/lib64/python2.6/SocketServer.py", line 268, in handle_request
  File "/usr/lib64/python2.6/SocketServer.py", line 278, in _handle_request_noblock
  File "/usr/lib64/python2.6/SocketServer.py", line 446, in get_request
  File "/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py", line 116, in accept
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 167, in accept
  File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Connection.py", line 156, in accept_ssl
SSLError: (110, 'Connection timed out')

Comment 3 Ayal Baron 2013-12-18 09:22:19 UTC
Sergey, any update on this one?