Bug 785749

Summary: [ovirt] [vdsm] deadlock after prepareForShutdown when connection to storage is blocked with VMs running
Product: [Retired] oVirt Reporter: Haim <hateya>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED WONTFIX QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, acathrow, amureini, bazulay, iheim, mgoldboi, michal.skrivanek, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.3.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-07 19:54:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm logs none

Description Haim 2012-01-30 14:27:13 UTC
Description of problem:

- hsm host running some vms 
- block connection from host to all storage domains 
- at some point, vdsm prints that connection to libvirt is broken, and calls fro prepareForShutdown; 
- it calls is several times: 

Thread-153::ERROR::2012-01-27 16:36:28,334::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-137::ERROR::2012-01-27 16:36:28,336::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-48::ERROR::2012-01-27 16:36:28,335::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-23::ERROR::2012-01-27 16:36:28,337::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-169::ERROR::2012-01-27 16:36:28,338::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-84::ERROR::2012-01-27 16:36:28,339::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-137::DEBUG::2012-01-27 16:36:28,340::clientIF::178::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently
Thread-153::DEBUG::2012-01-27 16:36:28,340::task::588::TaskManager.Task::(_updateState) Task=`938d8853-798e-4d46-9a50-12155adc7181`::moving from state init -> state preparing
Thread-48::DEBUG::2012-01-27 16:36:28,341::clientIF::178::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently
Thread-72::ERROR::2012-01-27 16:36:28,342::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-100::ERROR::2012-01-27 16:36:28,343::libvirtconnection::89::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-169::DEBUG::2012-01-27 16:36:28,343::clientIF::178::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently
Thread-84::DEBUG::2012-01-27 16:36:28,344::clientIF::178::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently
Thread-137::ERROR::2012-01-27 16:36:28,345::utils::393::vm.Vm::(collect) vmId=`84066f51-2b3d-4afb-b16f-7d576f2d6235`::Stats function failed: <AdvancedStatsFunction _highWrite at 0x17fbf60>
Traceback (most recent call last):
  File "/usr/share/vdsm/utils.py", line 389, in collect
    statsFunction()
  File "/usr/share/vdsm/utils.py", line 266, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 79, in _highWrite
    self._vm._dom.blockInfo(vmDrive.path, 0)
  File "/usr/share/vdsm/libvirtvm.py", line 483, in f
    ret = attr(*args, **kwargs)
  File "/usr/share/vdsm/libvirtconnection.py", line 79, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1465, in blockInfo
    if ret is None: raise libvirtError ('virDomainGetBlockInfo() failed', dom=self)
libvirtError: End of file while reading data: Input/output error


- Then, we see the following, request to stop HSM_MailMonitor thread:

Thread-210::INFO::2012-01-27 16:36:29,108::storage_mailbox::444::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mail monitoring thread stopped, clearing outgoing mail
Thread-210::INFO::2012-01-27 16:36:29,109::storage_mailbox::340::Storage.MailBox.HsmMailMonitor::(_sendMail) HSM_MailMonitor sending mail to SPM - ['dd', 'of=/rhev/data-center/61751252-99fe-45bc-bf24-dbf248172475/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=512', 'seek=8']
Thread-210::DEBUG::2012-01-27 16:36:29,109::storage_mailbox::344::Storage.Misc.excCmd::(_sendMail) 'dd of=/rhev/data-center/61751252-99fe-45bc-bf24-dbf248172475/mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=512 seek=8' (cwd None)
Thread-196::ERROR::2012-01-27 16:37:29,796::domainMonitor::120::Storage.DomainMonitor::(_monitorDomain) Error while collecting domain `b466b318-6a31-4670-bcaa-cd9ccf5cf1de` monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 105, in _monitorDomain
    nextStatus.readDelay = domain.getReadDelay()
  File "/usr/share/vdsm/storage/blockSD.py", line 423, in getReadDelay
    f.read(4096)
  File "/usr/share/vdsm/storage/fileUtils.py", line 287, in read
    raise OSError(err, msg)
OSError: [Errno 5] Input/output error
Thread-196::DEBUG::2012-01-27 16:37:29,817::domainMonitor::130::Storage.DomainMonitor::(_monitorDomain) Domain `b466b318-6a31-4670-bcaa-cd9ccf5cf1de` changed its status to Invalid
Thread-201::ERROR::2012-01-27 16:37:30,810::domainMonitor::120::Storage.DomainMonitor::(_monitorDomain) Error while collecting domain `3f5edaa4-699b-4b62-bd34-3f6b04a29070` monitoring information
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 105, in _monitorDomain
    nextStatus.readDelay = domain.getReadDelay()
  File "/usr/share/vdsm/storage/blockSD.py", line 423, in getReadDelay
    f.read(4096)
  File "/usr/share/vdsm/storage/fileUtils.py", line 287, in read
    raise OSError(err, msg)
OSError: [Errno 5] Input/output error
Thread-201::DEBUG::2012-01-27 16:37:30,811::domainMonitor::130::Storage.DomainMonitor::(_monitorDomain) Domain `3f5edaa4-699b-4b62-bd34-3f6b04a29070` changed its status to Invalid
Thread-210::DEBUG::2012-01-27 16:37:30,911::storage_mailbox::344::Storage.Misc.excCmd::(_sendMail) FAILED: <err> = "dd: writing `/rhev/data-center/61751252-99fe-45bc-bf24-dbf248172475/mastersd/dom_md/inbox': Input/output error\n1+0 records in\n0+0 records out\n0 bytes (0 B) copied, 61.756 s, 0.0 kB/s\n"; <rc> = 1

Comment 1 Haim 2012-01-30 14:31:53 UTC
Created attachment 558342 [details]
vdsm logs

Comment 2 Michal Skrivanek 2012-10-01 08:08:53 UTC
does it cause any issue in fact? The "cannot run prepareForShutdown concurrently" is a simple protection measure, as long as the first thread does shutdown vdsm successfully those other threads are nonsignificant

Comment 3 Haim 2012-10-04 06:22:53 UTC
(In reply to comment #2)
> does it cause any issue in fact? The "cannot run prepareForShutdown
> concurrently" is a simple protection measure, as long as the first thread
> does shutdown vdsm successfully those other threads are nonsignificant

deadlock ? vdsm doesn't get restarted and move to non-responsive (getVdsCaps is not answered).

Comment 4 Itamar Heim 2013-04-07 19:54:03 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.