Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 629625

Summary: [vdsm] [libvirt] vdsm loses connection with 'kvm' process during concurrent migration
Product: Red Hat Enterprise Linux 6 Reporter: Haim <hateya>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED CURRENTRELEASE QA Contact: Haim <hateya>
Severity: high Docs Contact:
Priority: low    
Version: 6.1CC: bazulay, hateya, iheim, mgoldboi, Rhev-m-bugs, yeylon, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-21 14:17:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 659310    
Bug Blocks:    
Attachments:
Description Flags
vdsm log files of both pele and silver none

Description Haim 2010-09-02 14:40:42 UTC
Description of problem:

vdsm losses connection with kvm process during concurrent migration of at least 2 guests per host (must).
it seems like vms actually goes down in rhevm, but qemu process is live and active (also seen by virsh). 
there are some disturbing errors in log which might have caused this state: 

libvirtEventLoop::ERROR::2010-09-02 17:47:59,771::libvirtvm::933::vds::Traceback (most recent call last):
  File "/usr/share/vdsm/libvirtvm.py", line 912, in __eventCallback
    v._onLibvirtLifecycleEvent(event, detail, None)
  File "/usr/share/vdsm/libvirtvm.py", line 878, in _onLibvirtLifecycleEvent
    hooks.after_vm_start(self._dom.XMLDesc(0), self.conf)
AttributeError: 'NoneType' object has no attribute 'XMLDesc'

libvirtEventLoop::ERROR::2010-09-02 17:41:33,692::libvirtvm::933::vds::Traceback (most recent call last):
  File "/usr/share/vdsm/libvirtvm.py", line 912, in __eventCallback
    v._onLibvirtLifecycleEvent(event, detail, None)
  File "/usr/share/vdsm/libvirtvm.py", line 872, in _onLibvirtLifecycleEvent
    self._onQemuDeath()
  File "/usr/share/vdsm/vm.py", line 1006, in _onQemuDeath
    "Lost connection with kvm process")
  File "/usr/share/vdsm/vm.py", line 1753, in setDownStatus
    self.saveState()
  File "/usr/share/vdsm/libvirtvm.py", line 772, in saveState
    vm.Vm.saveState(self)
  File "/usr/share/vdsm/vm.py", line 1228, in saveState
    os.rename(tmpFile, self._recoveryFile)
OSError: [Errno 2] No such file or directory

this bug is consistent on latest vdsm version 4-9.14. 

repro steps: 

1) make sure you have 2 hosts 
2) make sure you have 4 running vms, 2 per host 
3) run concurrent migration so vms running on server X will be deported to server Y, and vms running on server Y will be deported to server X. 

open log.

Comment 2 Haim 2010-09-02 15:01:29 UTC
Created attachment 442648 [details]
vdsm log files of both pele and silver

Comment 3 Haim 2010-09-02 15:03:11 UTC
note that during migration 3 out of 4 vms died (from vdsmd perspective).

Comment 5 Barak 2010-11-21 16:17:18 UTC
Haim,

This probably happened due to one of the hooks issues post migration.
It was fixed long time ago.
Please try to reproduce or close.

Comment 6 Haim 2010-11-29 07:13:56 UTC
(In reply to comment #5)
> Haim,
> 
> This probably happened due to one of the hooks issues post migration.
> It was fixed long time ago.
> Please try to reproduce or close.

sorry Barak, due to libvirt bug that blocks migration (crash upon migration), i can't reproduce this bug, once i'll get a proper build with their build, i will try to reproduce.

Comment 8 Haim 2010-12-05 15:26:45 UTC
sorry - can't reproduce due to dead lock in libvirt on concurrent migration. set the bug dependencies accordingly.

Comment 9 Itamar Heim 2010-12-19 14:48:02 UTC
please try with latest libvirt again. thanks.

Comment 10 Haim 2010-12-21 13:30:09 UTC
(In reply to comment #9)
> please try with latest libvirt again. thanks.

no repro on latest libvirt, guess it was solved on the way, can either move to on_qa or closed as CURRENT_RELEASE.

Comment 11 Dan Kenigsberg 2010-12-21 14:17:32 UTC
why wait? closing after verification at comment 10.