Description of problem: Switch host to maintenance with one VM, the migration completed BUT VM is dowm with error: Lost connection with qemu process Version-Release number of selected component (if applicable): RHEVM:rhevm-3.6.0.2-0.1.el6.noarch VDSM: vdsm-4.17.10-5.el7ev.noarch libvirt: libvirt-1.2.17-4.el7.x86_64 How reproducible: All the time Steps to Reproduce: Switch host to maintenance with one VM Actual results: 1. Host switch to maintenance 2. Migration completed 3. VM is down Additional info: ---------------------------- From engine log: ## Migration completed and host switch to maintenance ## 2015-11-01 15:35:50,499 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-83) [11ddbfa5] Correlation ID: 14176bb3, Job ID: daff9288-65e0-4574-aaf0-e66ac3e34414, Call Stack: null, Custom Event ID: -1, Message: Migration completed (VM: golden_env_mixed_virtio_0, Source: host_mixed_1, Destination: host_mixed_2, Duration: 18 seconds, Total: 18 seconds, Actual downtime: 25ms) 2015-11-01 15:35:50,499 INFO [org.ovirt.engine.core.bll.InternalMigrateVmCommand] (ForkJoinPool-1-worker-83) [11ddbfa5] Lock freed to object 'EngineLock:{exclusiveLocks='[6e7e9891-79ed-4b8c-8dce-0e0d67db9358=<VM, ACTION_TYPE_FAILED_VM_IS_BEING_MIGRATED$VmName golden_env_mixed_virtio_0>]', sharedLocks='null'}' 2015-11-01 15:35:50,788 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-92) [11ddbfa5] START, DestroyVDSCommand(HostName = host_mixed_1, DestroyVmVDSCommandParameters:{runAsync='true', hostId='c84dd08c-a044-4159-a2bf-32c0c615001c', vmId='6e7e9891-79ed-4b8c-8dce-0e0d67db9358', force='false', secondsToWait='0', gracefully='false', reason=''}), log id: 34e6a40f 2015-11-01 15:35:50,804 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-92) [11ddbfa5] FINISH, DestroyVDSCommand, log id: 34e6a40f 2015-11-01 15:35:50,806 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-92) [11ddbfa5] RefreshVmList VM id '6e7e9891-79ed-4b8c-8dce-0e0d67db9358' status = 'Down' on VDS 'host_mixed_1' ignoring it in the refresh until migration is done 2015-11-01 15:35:51,276 INFO [org.ovirt.engine.core.vdsbroker.HostMonitoring] (DefaultQuartzScheduler_Worker-32) [4b2c3daa] Updated vds status from 'Preparing for Maintenance' to 'Maintenance' in database, vds 'host_mixed_1'(c84dd08c-a044-4159-a2bf-32c0c615001c) ## VM down ### 2015-11-01 15:35:59,227 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (ForkJoinPool-1-worker-92) [11ddbfa5] FINISH, DestroyVDSCommand, log id: 15622138 2015-11-01 15:35:59,243 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-92) [11ddbfa5] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM golden_env_mixed_virtio_0 is down with error. Exit message: Lost connection with qemu process. 2015-11-01 15:35:59,244 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (ForkJoinPool-1-worker-92) [11ddbfa5] VM '6e7e9891-79ed-4b8c-8dce-0e0d67db9358(golden_env_mixed_virtio_0) is running in db and not running in VDS 'host_mixed_2' --------------------------------------------------------- From VDSM log: periodic/4::ERROR::2015-11-01 15:35:59,199::sampling::538::virt.sampling.VMBulkSampler::(__call__) vm sampling failed Traceback (most recent call last): File "/usr/share/vdsm/virt/sampling.py", line 526, in __call__ bulk_stats = self._conn.getAllDomainStats(self._stats_flags) File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 5104, in getAllDomainStats raise libvirtError("virConnectGetAllDomainStats() failed", conn=self) libvirtError: Unable to read from monitor: Connection reset by peer
Created attachment 1088359 [details] engine_log
Created attachment 1088360 [details] hosts_logs
whenever a qemu crash is involved, please include libvirt and qemu logs as well
This bug is not marked for z-stream, yet the milestone is for a z-stream version, therefore the milestone has been reset. Please set the correct milestone or add the z-stream flag.
I update libvirt severity to Debug, and restart libvirt and vdsm. The problem did not reproduce. will update if happen again. (In reply to Michal Skrivanek from comment #3) > whenever a qemu crash is involved, please include libvirt and qemu logs as > well I update libvirt severity to Debug, and restart libvirt and vdsm. The problem did not reproduce. will update if happen again.
let's see in a week or so, i not reproduce we would have to unfortunately close this with insufficient data
Reducing severity for now till we check if we can reproduce it
Closing as not reproducible. Israel - if you do manage to reproduce, please verify it's not a qemu bug.