Bug 629625
| Summary: | [vdsm] [libvirt] vdsm loses connection with 'kvm' process during concurrent migration | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> | ||||
| Component: | vdsm | Assignee: | Dan Kenigsberg <danken> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Haim <hateya> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 6.1 | CC: | bazulay, hateya, iheim, mgoldboi, Rhev-m-bugs, yeylon, ykaul | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2010-12-21 14:17:32 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 659310 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
Created attachment 442648 [details]
vdsm log files of both pele and silver
note that during migration 3 out of 4 vms died (from vdsmd perspective). Haim, This probably happened due to one of the hooks issues post migration. It was fixed long time ago. Please try to reproduce or close. (In reply to comment #5) > Haim, > > This probably happened due to one of the hooks issues post migration. > It was fixed long time ago. > Please try to reproduce or close. sorry Barak, due to libvirt bug that blocks migration (crash upon migration), i can't reproduce this bug, once i'll get a proper build with their build, i will try to reproduce. sorry - can't reproduce due to dead lock in libvirt on concurrent migration. set the bug dependencies accordingly. please try with latest libvirt again. thanks. (In reply to comment #9) > please try with latest libvirt again. thanks. no repro on latest libvirt, guess it was solved on the way, can either move to on_qa or closed as CURRENT_RELEASE. why wait? closing after verification at comment 10. |
Description of problem: vdsm losses connection with kvm process during concurrent migration of at least 2 guests per host (must). it seems like vms actually goes down in rhevm, but qemu process is live and active (also seen by virsh). there are some disturbing errors in log which might have caused this state: libvirtEventLoop::ERROR::2010-09-02 17:47:59,771::libvirtvm::933::vds::Traceback (most recent call last): File "/usr/share/vdsm/libvirtvm.py", line 912, in __eventCallback v._onLibvirtLifecycleEvent(event, detail, None) File "/usr/share/vdsm/libvirtvm.py", line 878, in _onLibvirtLifecycleEvent hooks.after_vm_start(self._dom.XMLDesc(0), self.conf) AttributeError: 'NoneType' object has no attribute 'XMLDesc' libvirtEventLoop::ERROR::2010-09-02 17:41:33,692::libvirtvm::933::vds::Traceback (most recent call last): File "/usr/share/vdsm/libvirtvm.py", line 912, in __eventCallback v._onLibvirtLifecycleEvent(event, detail, None) File "/usr/share/vdsm/libvirtvm.py", line 872, in _onLibvirtLifecycleEvent self._onQemuDeath() File "/usr/share/vdsm/vm.py", line 1006, in _onQemuDeath "Lost connection with kvm process") File "/usr/share/vdsm/vm.py", line 1753, in setDownStatus self.saveState() File "/usr/share/vdsm/libvirtvm.py", line 772, in saveState vm.Vm.saveState(self) File "/usr/share/vdsm/vm.py", line 1228, in saveState os.rename(tmpFile, self._recoveryFile) OSError: [Errno 2] No such file or directory this bug is consistent on latest vdsm version 4-9.14. repro steps: 1) make sure you have 2 hosts 2) make sure you have 4 running vms, 2 per host 3) run concurrent migration so vms running on server X will be deported to server Y, and vms running on server Y will be deported to server X. open log.