| Summary: | Migration of VMs fails and those VMs became paused | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Rami Vaknin <rvaknin> | ||||
| Component: | vdsm | Assignee: | Dan Kenigsberg <danken> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Tomas Dosek <tdosek> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.1 | CC: | abaron, bazulay, danken, dnaori, iheim, ilvovsky, mgoldboi, tdosek, yeylon, ykaul | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | vdsm-4.9-51 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: |
vdsm-4.9-47
libvirt-0.8.7-5.el6.x86_64
qemu-kvm-0.12.1.2-2.129.el6.x86_64
|
|||||
| Last Closed: | 2011-08-19 15:14:23 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Verified on vdsm-4.9-49 Reproduced on vdsm-4.9-50
Worked on the issue with danken and he figured out the following action fix the issue:
def saveState (self):
if self.destroyed:
return
from copy import deepcopy
toSave = deepcopy(self.status())
toSave['startTime'] = self._startTime
if self.lastStatus != 'Down' and self._vmStats and self.guestAgent:
toSave['username'] = self.guestAgent.guestInfo['username']
toSave['guestIPs'] = self.guestAgent.guestInfo['guestIPs']
else:
toSave['username'] = ""
toSave['guestIPs'] = ""
if 'sysprepInf' in toSave:
del toSave['sysprepInf']
if 'floppy' in toSave: del toSave['floppy']
for drive in toSave.get('drives', []):
for d in self._drives:
if drive.get('volumeID') == d.volumeID:
drive['truesize'] = str(d.truesize)
drive['apparentsize'] = str(d.apparentsize)
import tempfile
with tempfile.NamedTemporaryFile(dir=constants.P_VDSM_RUN, delete=False) as f:
pickle.dump(toSave, f)
try:
os.rename(f.name, self._recoveryFile)
except:
self.log.error("DDDD %s:%s %s:%s", f.name, os.access(f.name, 0), self._recoveryFile, os.access(self._recoveryFile, 0), exc_info=True)
raise
vdsm-4.9-51 - migration behaves pretty strange - in case migration fails vms stay on source host and are up (that's ok I assume), but when trying to migrate more than 4 vms to another host (migration policies are off) all migrations fail (the second host is loadless). Log: 2011-03-02 10:36:20,380 ERROR [com.redhat.rhevm.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-97) Rerun vm 13f9f894-5502-4872-86ab-075382dd50a3. Called from vds 10.34.57.206 2011-03-02 10:36:20,435 INFO [com.redhat.rhevm.bll.RunVmCommandBase] (pool-11-thread-40) DecreasePendingVms::Decreasing vds 10.34.57.207 pending vcpu count failed, its already 0 or null 2011-03-02 10:36:20,435 INFO [com.redhat.rhevm.bll.RunVmCommandBase] (pool-11-thread-40) DecreasePendingVms::Decreasing vds 10.34.57.207 pending vmem size failed, its already 0 or null 2011-03-02 10:36:20,435 INFO [com.redhat.rhevm.bll.RunVmCommandBase] (pool-11-thread-39) DecreasePendingVms::Decreasing vds 10.34.57.207 pending vcpu count failed, its already 0 or null 2011-03-02 10:36:20,435 INFO [com.redhat.rhevm.bll.RunVmCommandBase] (pool-11-thread-39) DecreasePendingVms::Decreasing vds 10.34.57.207 pending vmem size failed, its already 0 or null 2011-03-02 10:36:20,437 INFO [com.redhat.rhevm.bll.RunVmCommandBase] (pool-11-thread-41) DecreasePendingVms::Decreasing vds 10.34.57.207 pending vcpu count failed, its already 0 or null 2011-03-02 10:36:20,437 INFO [com.redhat.rhevm.bll.RunVmCommandBase] (pool-11-thread-41) DecreasePendingVms::Decreasing vds 10.34.57.207 pending vmem size failed, its already 0 or null 2011-03-02 10:36:20,569 INFO [com.redhat.rhevm.bll.CommandBase] (pool-11-thread-39) CanDoAction of action MigrateVmToServer failed. Reasons:VAR__ACTION__MIGRATE,VAR__TYPE__VM,VAR__ACTION__MIGRATE,VAR__TYPE__VM,VDS_INVALID_SERVER_ID 2011-03-02 10:36:20,578 INFO [com.redhat.rhevm.bll.CommandBase] (pool-11-thread-40) CanDoAction of action MigrateVmToServer failed. Reasons:VAR__ACTION__MIGRATE,VAR__TYPE__VM,VAR__ACTION__MIGRATE,VAR__TYPE__VM,VDS_INVALID_SERVER_ID 2011-03-02 10:36:20,582 INFO [com.redhat.rhevm.bll.CommandBase] (pool-11-thread-41) CanDoAction of action MigrateVmToServer failed. Reasons:VAR__ACTION__MIGRATE,VAR__TYPE__VM,VAR__ACTION__MIGRATE,VAR__TYPE__VM,VDS_INVALID_SERVER_ID 2011-03-02 10:36:22,572 ERROR [com.redhat.rhevm.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-79) Rerun vm 7f79b379-306e-4f2c-8413-c7f34ea5f2a4. Called from vds 10.34.57.206 2011-03-02 10:36:22,604 INFO [com.redhat.rhevm.bll.RunVmCommandBase] (pool-11-thread-40) DecreasePendingVms::Decreasing vds 10.34.57.207 pending vcpu count failed, its already 0 or null 2011-03-02 10:36:22,604 INFO [com.redhat.rhevm.bll.RunVmCommandBase] (pool-11-thread-40) DecreasePendingVms::Decreasing vds 10.34.57.207 pending vmem size failed, its already 0 or null 2011-03-02 10:36:22,654 INFO [com.redhat.rhevm.bll.CommandBase] (pool-11-thread-40) CanDoAction of action MigrateVmToServer failed. Reasons:VAR__ACTION__MIGRATE,VAR__TYPE__VM,VAR__ACTION__MIGRATE,VAR__TYPE__VM,VDS_INVALID_SERVER_ID 2011-03-02 10:38:27,428 INFO [com.redhat.rhevm.bll.DbUserCacheManager] (QuartzScheduler_Worker-85) DbUserCacheManager::refreshAllUserData() - entered 2011-03-02 10:38:27,436 INFO [com.redhat.rhevm.bll.adbroker.AdBrokerCommandBase] (QuartzScheduler_Worker-85) Running AD command: com.redhat.rhevm.bll.adbroker.ADGetAdUserByUserIdListCommand 2011-03-02 10:38:27,438 INFO [com.redhat.rhevm.bll.adbroker.AdBrokerCommandBase] (QuartzScheduler_Worker-85) Running AD command: com.redhat.rhevm.bll.adbroker.ADSearchUserByQueryCommand 2011-03-02 10:41:22,069 INFO [com.redhat.rhevm.vdsbroker.VdsManager] (QuartzScheduler_Worker-33) OnVdsDuringFailureTimer of vds 10.34.57.207 entered. Time:3/2/11 10:41 AM. Attemts after2 Verified on -vdsm-4.9-51 Excuse me Tomas didnt see your comment, moving back to on qa till danken will replay. Verified - vdsm-4.9-51 - above specified scenario doesn't reproduce anymore (it turned out that the trouble verifying this bug was in our env.) |
Created attachment 477874 [details] VDSM Logs Scenario: 1. Define a cluster of two RHEL6 hosts with few running VMs (both HA, non-HA and Pinned VMs) 2. Mark few VMs and perform migration Few of the VMs are becoming paused and their migration fails. It was reproduced 4 times up until now, 3 of them were on the same host and the 4th time was on other host, both hosts were defined as HA. Here are vdsm log snippets from two reproductions with different calls to the same place: Thread-9451::INFO::2011-02-09 18:15:33,842::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: teardownVolume, args: ( sdUUID=8431f208-1bf3-4f1f-a05e-7def6d6d0239 spUUID=ade9f73e-272a-42bb-8b8c-bb3288e0637e imgUUID=d4bd49bf-1b88-4f7a-9f9f-1845db0b243b volUUID=4802a4d9-7ab8-4d0a-b014-034ed86f502e rw=True) Thread-9451::DEBUG::2011-02-09 18:15:33,853::task::491::TaskManager.Task::(_debug) Task f7e1bd18-bffb-4511-bd7b-d9abf938ace5: moving from state init -> state preparing Thread-9451::INFO::2011-02-09 18:15:33,855::sp::1378::Storage.StoragePool::(getDomains) Get storage pool domains: {'8431f208-1bf3-4f1f-a05e-7def6d6d0239': 'Active', '2e46da5a-2ad6-4672-a297-02b6d3713274': 'Active'} Thread-9451::INFO::2011-02-09 18:15:33,861::blockSD::596::Storage.StorageDomain::(validate) sdUUID=8431f208-1bf3-4f1f-a05e-7def6d6d0239 Thread-9457::DEBUG::2011-02-09 18:15:33,863::guestIF::78::root::(_connect) Attempting connection to /var/lib/libvirt/qemu/channels/host6_rhel6.com.redhat.rhevm.vdsm Thread-9457::DEBUG::2011-02-09 18:15:33,865::utils::570::Storage.Misc.excCmd::(execCmd) '/usr/bin/sudo -n /usr/share/vdsm/prepare-vmchannel /var/lib/libvirt/qemu/channels/host6_rhel6.com.redhat.rhevm.vdsm' (cwd None) Thread-9438::DEBUG::2011-02-09 18:15:33,865::utils::570::Storage.Misc.excCmd::(execCmd) '/usr/bin/sudo -n /usr/share/vdsm/get-vm-pid host6_rhel6' (cwd None) Thread-9451::DEBUG::2011-02-09 18:15:33,867::supervdsm::30::SuperVdsmProxy::(__connect) Trying to conenct to Super Vdsm libvirtEventLoop::ERROR::2011-02-09 18:15:33,883::libvirtconnection::47::vds::(__eventCallback) Traceback (most recent call last): File "/usr/share/vdsm/libvirtconnection.py", line 26, in __eventCallback v._onLibvirtLifecycleEvent(event, detail, None) File "/usr/share/vdsm/libvirtvm.py", line 1020, in _onLibvirtLifecycleEvent self._onQemuDeath() File "/usr/share/vdsm/vm.py", line 999, in _onQemuDeath "Lost connection with kvm process") File "/usr/share/vdsm/vm.py", line 1766, in setDownStatus self.saveState() File "/usr/share/vdsm/libvirtvm.py", line 905, in saveState vm.Vm.saveState(self) File "/usr/share/vdsm/vm.py", line 1227, in saveState os.rename(tmpFile, self._recoveryFile) OSError: [Errno 2] No such file or directory Thread-9451::DEBUG::2011-02-09 18:15:33,835::utils::570::Storage.Misc.excCmd::(execCmd) SUCCESS: <err> = ''; <rc> = 0 Thread-9451::DEBUG::2011-02-09 18:15:33,840::libvirtvm::944::vds.vmlog.2c1cbd82-c369-4fad-92bc-132bd40e21b5::(destroy) Total desktops after destroy of 2c1cbd82-c369-4fad-92bc-132bd40e21b5 is 10 Thread-9451::INFO::2011-02-09 18:15:33,842::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: teardownVolume, args: ( sdUUID=8431f208-1bf3-4f1f-a05e-7def6d6d0239 spUUID=ade9f73e-272a-42bb-8b8c-bb3288e0637e imgUUID=d4bd49 bf-1b88-4f7a-9f9f-1845db0b243b volUUID=4802a4d9-7ab8-4d0a-b014-034ed86f502e rw=True) Thread-9451::DEBUG::2011-02-09 18:15:33,853::task::491::TaskManager.Task::(_debug) Task f7e1bd18-bffb-4511-bd7b-d9abf938ace5: moving from state init -> state preparing Thread-9451::INFO::2011-02-09 18:15:33,855::sp::1378::Storage.StoragePool::(getDomains) Get storage pool domains: {'8431f208-1bf3-4f1f-a05e-7def6d6d0239': 'Active', '2e46da5a-2ad6-4672-a297-02b6d3713274': 'Active'} Thread-9451::INFO::2011-02-09 18:15:33,861::blockSD::596::Storage.StorageDomain::(validate) sdUUID=8431f208-1bf3-4f1f-a05e-7def6d6d0239 Thread-9457::DEBUG::2011-02-09 18:15:33,863::guestIF::78::root::(_connect) Attempting connection to /var/lib/libvirt/qemu/channels/host6_rhel6.com.redhat.rhevm.vdsm Thread-9457::DEBUG::2011-02-09 18:15:33,865::utils::570::Storage.Misc.excCmd::(execCmd) '/usr/bin/sudo -n /usr/share/vdsm/prepare-vmchannel /var/lib/libvirt/qemu/channels/host6_rhel6.com.redhat.rhevm.vdsm' (cwd None) Thread-9438::DEBUG::2011-02-09 18:15:33,865::utils::570::Storage.Misc.excCmd::(execCmd) '/usr/bin/sudo -n /usr/share/vdsm/get-vm-pid host6_rhel6' (cwd None) Thread-9451::DEBUG::2011-02-09 18:15:33,867::supervdsm::30::SuperVdsmProxy::(__connect) Trying to conenct to Super Vdsm libvirtEventLoop::ERROR::2011-02-09 18:15:33,883::libvirtconnection::47::vds::(__eventCallback) Traceback (most recent call last): File "/usr/share/vdsm/libvirtconnection.py", line 26, in __eventCallback v._onLibvirtLifecycleEvent(event, detail, None) File "/usr/share/vdsm/libvirtvm.py", line 1020, in _onLibvirtLifecycleEvent self._onQemuDeath() File "/usr/share/vdsm/vm.py", line 999, in _onQemuDeath "Lost connection with kvm process") File "/usr/share/vdsm/vm.py", line 1766, in setDownStatus self.saveState() File "/usr/share/vdsm/libvirtvm.py", line 905, in saveState vm.Vm.saveState(self) File "/usr/share/vdsm/vm.py", line 1227, in saveState os.rename(tmpFile, self._recoveryFile) OSError: [Errno 2] No such file or directory