Bug 745722

Summary: Lose connection to hypervisor
Product: Red Hat Enterprise Linux 6 Reporter: Olivier Renault <orenault>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED DUPLICATE QA Contact: yeylon <yeylon>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: abaron, bazulay, iheim, srevivo, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-13 11:53:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
sosreport from the hypervisor none

Description Olivier Renault 2011-10-13 08:12:59 UTC
Description of problem:
I am losing connection to the hypervisor from time to time.

Version-Release number of selected component (if applicable):


How reproducible:
I have seen this behaviour few times. 

Steps to Reproduce:
1. Start a VM on PXE booting and stop it on boot process 
2. Switch off the VM from GUI using the stop button
3. After a while the host move to unresponsive
  
Actual results:
The VM does not stop and the hypervisor lose communication to the manager.

Expected results:
The VM stops and we do not lose communication to the manager

Additional info:
On RHEVM, the VM is being displayed up. 
On the hypervisor the VM is still running. 

If I look in vdsm.log, i can see the following : 
Thread-15::DEBUG::2011-10-12 21:06:39,723::taskManager::69::TaskManager::(prepareForShutdown) Request to stop all tasks
Thread-15::DEBUG::2011-10-12 21:06:40,915::task::492::TaskManager.Task::(_debug) Task b7ee3a22-716a-4fe7-a745-c9502f7e112b: finished: None
Thread-15::DEBUG::2011-10-12 21:06:40,915::task::492::TaskManager.Task::(_debug) Task b7ee3a22-716a-4fe7-a745-c9502f7e112b: moving from state preparing -> state finished
Thread-15::DEBUG::2011-10-12 21:06:40,916::resourceManager::786::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-15::DEBUG::2011-10-12 21:06:40,916::resourceManager::821::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-15::DEBUG::2011-10-12 21:06:40,916::task::492::TaskManager.Task::(_debug) Task b7ee3a22-716a-4fe7-a745-c9502f7e112b: ref 0 aborting False
Thread-15::INFO::2011-10-12 21:06:40,917::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: prepareForShutdown, Return response: {'status': {'message': 'OK', 'code': 0}}
Thread-15::ERROR::2011-10-12 21:06:40,917::utils::369::vm.Vm::(collect) vmId=`728fb1e2-d2e9-40bc-8dea-0ff5ca31b3fe`::Stats function failed: <AdvancedStatsFunction _sampleCpu at 0x1a2eb20>
Traceback (most recent call last):
  File "/usr/share/vdsm/utils.py", line 366, in collect
    statsFunction()
  File "/usr/share/vdsm/utils.py", line 250, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libviNrtvm.py", line 85, in _sampleCpu
    state, maxMem, memory, nrVirtCpu, cpuTime = self._vm._dom.info()
  File "/usr/share/vdsm/libvirtvm.py", line 327, in f
    ret = attr(*args, **kwargs)
  File "/usr/share/vdsm/libvirtconnection.py", line 63, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1406, in info
    if ret is None: raise libvirtError ('virDomainGetInfo() failed', dom=self)
libvirtError: Cannot write data: Broken pipe
Thread-15::ERROR::2011-10-12 21:06:40,919::libvirtconnection::73::vds::(wrapper) connection to libvirt broken. taking vdsm down.
Thread-15::DEBUG::2011-10-12 21:06:40,920::clientIF::145::vds::(prepareForShutdown) cannot run prepareForShutdown twice
Thread-15::ERROR::2011-10-12 21:06:40,920::utils::369::vm.Vm::(collect) vmId=`728fb1e2-d2e9-40bc-8dea-0ff5ca31b3fe`::Stats function failed: <AdvancedStatsFunction _sampleNet at 0x1a2ec00>
Traceback (most recent call last):
  File "/usr/share/vdsm/utils.py", line 366, in collect
    statsFunction()
  File "/usr/share/vdsm/utils.py", line 250, in __call__
    retValue = self._function(*args, **kwargs)
  File "/usr/share/vdsm/libvirtvm.py", line 101, in _sampleNet
    netSamples[vmIface] = self._vm._dom.interfaceStats(vmIface)
  File "/usr/share/vdsm/libvirtvm.py", line 327, in f
    ret = attr(*args, **kwargs)
  File "/usr/share/vdsm/libvirtconnection.py", line 63, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1412, in interfaceStats
    if ret is None: raise libvirtError ('virDomainInterfaceStats() failed', dom=self)
libvirtError: Cannot write data: Broken pipe
Thread-15::DEBUG::2011-10-12 21:06:40,921::utils::345::vm.Vm::(run) vmId=`728fb1e2-d2e9-40bc-8dea-0ff5ca31b3fe`::Stats thread finished

In libvirtd.log, i can see the following 
21:06:21.708: 30721: debug : virDomainInterfaceStats:6445 : dom=0x7f8e8c001de0, (VM: name=test, uuid=728fb1e2-d2e9-40bc-8dea-0ff5ca31b3fe), path=vnet0, stats=0x7f8e9e071330, size=64
21:06:26.710: 30721: debug : virDomainInterfaceStats:6445 : dom=0x7f8e8c001de0, (VM: name=test, uuid=728fb1e2-d2e9-40bc-8dea-0ff5ca31b3fe), path=vnet0, stats=0x7f8e9e071330, size=64
21:06:31.712: 30721: debug : virDomainInterfaceStats:6445 : dom=0x7f8e8c001de0, (VM: name=test, uuid=728fb1e2-d2e9-40bc-8dea-0ff5ca31b3fe), path=vnet0, stats=0x7f8e9e071330, size=64
21:06:36.714: 30721: debug : virDomainGetInfo:3636 : dom=0x7f8e8c001de0, (VM: name=test, uuid=728fb1e2-d2e9-40bc-8dea-0ff5ca31b3fe), info=0x7f8e9e071350
21:06:36.714: 30721: error : virNetSocketWriteWire:938 : Cannot write data: Broken pipe
21:06:40.919: 30721: debug : virDomainInterfaceStats:6445 : dom=0x7f8e8c001de0, (VM: name=test, uuid=728fb1e2-d2e9-40bc-8dea-0ff5ca31b3fe), path=vnet0, stats=0x7f8e9e071330, size=64
21:06:40.919: 30721: error : virNetSocketWriteWire:938 : Cannot write data: Broken pipe

In messages, we can see that libvirt is being respawn.
Oct 12 21:06:33 rh3 init: libvirtd main process (30609) killed by SEGV signal
Oct 12 21:06:33 rh3 init: libvirtd main process ended, respawning

If I restart the vdsmd service, my system is connecting back to RHEVM, the GUI is displaying my VM as running and I can use the GUI to kill it. 



If I check, libvirt and vdsm are running.

Comment 3 Olivier Renault 2011-10-13 08:26:17 UTC
Created attachment 527883 [details]
sosreport from the hypervisor

Comment 4 Olivier Renault 2011-10-13 08:29:20 UTC
Hi Yaniv,

Pls find the sosreport as well as the version for libvirt / vdsm.

libvirt-0.9.4-12
vdsm-4.9-96.1

Regards,
Olivier

Comment 5 Dan Kenigsberg 2011-10-13 11:53:00 UTC
libvirt-0.9.4-12? It is know to be broken. Do not use it. I bet you'd find it's cores stacked in /var/log/core.

vdsm.log has the following, which makes me suspect you are hitting Bug 744755.

Thread-33::INFO::2011-10-12 10:51:36,287::storage_mailbox::434::Storage.MailBox.HsmMailMonitor::(run) HSM_MailboxMonitor - Incoming mail monitoring thread stopped, clearing outgoing mail
Thread-33::INFO::2011-10-12 10:51:36,287::storage_mailbox::330::Storage.MailBox.HsmMailMonitor::(_sendMail) HSM_MailMonitor sending mail to SPM - ['dd', 'of=/rhev/data-center/9a8843e8-eff2-11e0-b600-0017a4779c0c/mastersd/dom_md/inbox', 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=512', 'seek=8']
Thread-33::DEBUG::2011-10-12 10:51:36,288::storage_mailbox::334::Storage.Misc.excCmd::(_sendMail) 'dd of=/rhev/data-center/9a8843e8-eff2-11e0-b600-0017a4779c0c/mastersd/dom_md/inbox iflag=fullblock oflag=direct conv=notrunc bs=512 seek=8' (cwd None)
Thread-33::DEBUG::2011-10-12 10:51:36,299::storage_mailbox::334::Storage.Misc.excCmd::(_sendMail) SUCCESS: <err> = '8+0 records in\n8+0 records out\n4096 bytes (4.1 kB) copied, 0.00145234 s, 2.8 MB/s\n'; <rc> = 0
MainThread::INFO::2011-10-12 10:51:37,079::vdsm::76::vds::(run) VDSM main thread ended. Waiting for 27 other threads...
MainThread::INFO::2011-10-12 10:51:37,084::vdsm::79::vds::(run) <_MainThread(MainThread, started 140293428688640)>

*** This bug has been marked as a duplicate of bug 744755 ***