Bug 1145636 - migrate VM fail on start vm on destination host.
Summary: migrate VM fail on start vm on destination host.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.5.0
Assignee: Francesco Romani
QA Contact: Ilanit Stein
URL:
Whiteboard: virt
: 1149493 1152973 1224355 (view as bug list)
Depends On:
Blocks: 1112359
TreeView+ depends on / blocked
 
Reported: 2014-09-23 12:02 UTC by Ilanit Stein
Modified: 2021-03-11 14:22 UTC (History)
26 users (show)

Fixed In Version: vdsm-4.16.7.2-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1149669 (view as bug list)
Environment:
Last Closed: 2015-02-16 13:37:30 UTC
oVirt Team: ---
Target Upstream Version:
Embargoed:
istein: needinfo+
lsvaty: testing_plan_complete-


Attachments (Terms of Use)
logs (1.67 MB, application/x-bzip)
2014-09-23 12:09 UTC, Ilanit Stein
no flags Details
VDSM logs with more debug (80.00 KB, application/x-tar)
2014-10-02 08:26 UTC, Francesco Romani
no flags Details
libvirt debug logs of a reproduction of the problem (64.23 KB, application/x-gzip)
2014-10-03 13:17 UTC, Francesco Romani
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 34201 0 None MERGED core: removing CR char after encoding 2020-09-12 10:47:24 UTC
oVirt gerrit 34203 0 master ABANDONED mkimage: more robust content signature 2020-09-12 10:47:25 UTC
oVirt gerrit 34350 0 ovirt-engine-3.5.0 MERGED core: removing CR char after encoding 2020-09-12 10:47:25 UTC
oVirt gerrit 34352 0 ovirt-engine-3.5 MERGED core: removing CR char after encoding 2020-09-12 10:47:24 UTC

Description Ilanit Stein 2014-09-23 12:02:45 UTC
Description of problem:


Version-Release number of selected component (if applicable):
This test case failure started at rhevm build:
3.5.0-0.1000.36.0b20477.master.el6ev
On this build it PASSED:
3.5.0-0.1000.32.db64497.master.el6ev

vdsm & libvirt versions were not changed.

How reproducible:
100% - On each automatic test run on ci since 

Steps to Reproduce:
Run automatic test: http://jenkins-ci.eng.lab.tlv.redhat.com/job/rhevm_3.5_automation_coretools_two_hosts_restapi_vms_nfs_rest_factory/
Test flow:
1. 2 "UP" hosts. Put host running a VM in maintenance. VM migrates to the other host "automatically".
2. After migration completed, activate host back, and run VM migrate.
=> VM migrate fail on start VM on dst host.

Expected results:
VM migration should succeed.

Additional info:
Failure on vdsm.log (Fail to start vM on dst host):

Thread-227::ERROR::2014-09-22 12:42:37,170::vm::2336::vm.Vm::(_startUnderlyingVm) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2276, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 3311, in _run
    devices = self.buildConfDevices()
  File "/usr/share/vdsm/virt/vm.py", line 2081, in buildConfDevices
    self.normalizeDrivesIndices(devices[DISK_DEVICES])
  File "/usr/share/vdsm/virt/vm.py", line 2244, in normalizeDrivesIndices
    if drv['iface'] not in self._usedIndices:
KeyError: 'iface'
Thread-227::DEBUG::2014-09-22 12:42:37,174::vm::2848::vm.Vm::(setDownStatus) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Changed state to Down: 'iface' (code=1)

Comment 1 Ilanit Stein 2014-09-23 12:09:01 UTC
Created attachment 940410 [details]
logs

Comment 2 Ilanit Stein 2014-09-23 12:10:02 UTC
engine.log part:

2014-09-22 12:42:36,925 INFO  [org.ovirt.engine.core.bll.MigrateVmCommand] (ajp-/127.0.0.1:8702-7) [6b1314d1-9e4a-4d80_migrateVm_syncAction] Running command: MigrateVmCommand internal: false. Entities affected :  ID: ec964bd3-8c65-49b4-b9a4-5022fc628d54 Type: VMAction group MIGRATE_VM with role type USER
2014-09-22 12:42:36,987 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (ajp-/127.0.0.1:8702-7) [6b1314d1-9e4a-4d80_migrateVm_syncAction] START, MigrateVDSCommand(HostName = cinteg09.ci.lab.tlv.redhat.com, HostId = 3451c38a-e13c-48c1-adeb-42330002f1bb, vmId=ec964bd3-8c65-49b4-b9a4-5022fc628d54, srcHost=10.35.148.56, dstVdsId=b7e23fa1-0006-4712-85e5-6d950af15d2b, dstHost=10.35.148.50:54321, migrationMethod=ONLINE, tunnelMigration=false, migrationDowntime=0), log id: 73caa4bb
2014-09-22 12:42:36,990 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (ajp-/127.0.0.1:8702-7) [6b1314d1-9e4a-4d80_migrateVm_syncAction] START, MigrateBrokerVDSCommand(HostName = cinteg09.ci.lab.tlv.redhat.com, HostId = 3451c38a-e13c-48c1-adeb-42330002f1bb, vmId=ec964bd3-8c65-49b4-b9a4-5022fc628d54, srcHost=10.35.148.56, dstVdsId=b7e23fa1-0006-4712-85e5-6d950af15d2b, dstHost=10.35.148.50:54321, migrationMethod=ONLINE, tunnelMigration=false, migrationDowntime=0), log id: 1a24f50e
2014-09-22 12:42:37,000 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateBrokerVDSCommand] (ajp-/127.0.0.1:8702-7) [6b1314d1-9e4a-4d80_migrateVm_syncAction] FINISH, MigrateBrokerVDSCommand, log id: 1a24f50e
2014-09-22 12:42:37,005 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (ajp-/127.0.0.1:8702-7) [6b1314d1-9e4a-4d80_migrateVm_syncAction] FINISH, MigrateVDSCommand, return: MigratingFrom, log id: 73caa4bb
2014-09-22 12:42:37,011 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) [6b1314d1-9e4a-4d80_migrateVm_syncAction] Correlation ID: 6b1314d1-9e4a-4d80_migrateVm_syncAction, Job ID: 2faaaf15-34fb-4380-8518-8503e08be4e7, Call Stack: null, Custom Event ID: -1, Message: Migration started (VM: migration_vm, Source: cinteg09.ci.lab.tlv.redhat.com, Destination: cinteg06.ci.lab.tlv.redhat.com, User: admin).
2014-09-22 12:42:38,426 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler_Worker-26) START, DestroyVDSCommand(HostName = cinteg06.ci.lab.tlv.redhat.com, HostId = b7e23fa1-0006-4712-85e5-6d950af15d2b, vmId=ec964bd3-8c65-49b4-b9a4-5022fc628d54, force=false, secondsToWait=0, gracefully=false, reason=), log id: 17a017da
2014-09-22 12:42:38,450 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler_Worker-26) FINISH, DestroyVDSCommand, log id: 17a017da
2014-09-22 12:42:38,454 INFO  [org.ovirt.engine.core.bll.MigrateVmCommand] (DefaultQuartzScheduler_Worker-26) Lock freed to object EngineLock [exclusiveLocks= key: ec964bd3-8c65-49b4-b9a4-5022fc628d54 value: VM
, sharedLocks= ]
2014-09-22 12:42:38,457 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-26) RefreshVmList vm id ec964bd3-8c65-49b4-b9a4-5022fc628d54 status = Down on vds cinteg06.ci.lab.tlv.redhat.com ignoring it in the refresh until migration is done
2014-09-22 12:42:42,258 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler_Worker-33) Setting new tasks map. The map contains now 0 tasks
2014-09-22 12:42:42,259 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler_Worker-33) Cleared all tasks of pool e3b03197-b119-451d-b039-61a56e058eaa.
2014-09-22 12:43:13,724 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-28) VM migration_vm ec964bd3-8c65-49b4-b9a4-5022fc628d54 moved from MigratingFrom --> Up
2014-09-22 12:43:13,724 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-28) Adding VM ec964bd3-8c65-49b4-b9a4-5022fc628d54 to re-run list
2014-09-22 12:43:13,733 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-28) Rerun vm ec964bd3-8c65-49b4-b9a4-5022fc628d54. Called from vds cinteg09.ci.lab.tlv.redhat.com
2014-09-22 12:43:18,957 INFO  [org.ovirt.engine.core.bll.RemoveDiskCommand] (ajp-/127.0.0.1:8702-11) [a42ba139-8396-401c_removeDisk_delete] Lock Acquired to object EngineLock [exclusiveLocks= key: bc593676-8901-4114-839f-895f69264150 value: DISK
, sharedLocks= key: ec964bd3-8c65-49b4-b9a4-5022fc628d54 value: VM
]
2014-09-22 12:43:18,959 WARN  [org.ovirt.engine.core.bll.RemoveDiskCommand] (ajp-/127.0.0.1:8702-11) [a42ba139-8396-401c_removeDisk_delete] CanDoAction of action RemoveDisk failed. Reasons:VAR__ACTION__REMOVE,VAR__TYPE__VM_DISK,ACTION_TYPE_FAILED_VM_IS_NOT_DOWN
2014-09-22 12:43:18,959 INFO  [org.ovirt.engine.core.bll.RemoveDiskCommand] (ajp-/127.0.0.1:8702-11) [a42ba139-8396-401c_removeDisk_delete] Lock freed to object EngineLock [exclusiveLocks= key: bc593676-8901-4114-839f-895f69264150 value: DISK
, sharedLocks= key: ec964bd3-8c65-49b4-b9a4-5022fc628d54 value: VM
]
2014-09-22 12:43:18,966 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (ajp-/127.0.0.1:8702-11) Operation Failed: [Cannot remove Virtual Machine Disk. At least one of the VMs is not down.]

Comment 3 Omer Frenkel 2014-09-24 06:46:07 UTC
missing libvirt.log the attached has only libvirtd.log
please note that new libvirt has new logging, see https://bugzilla.redhat.com/show_bug.cgi?id=1141763#c15

Comment 4 Francesco Romani 2014-09-29 13:39:30 UTC
narrowed down. Destination VDSM receives this strange device which it doesn't know how to handle

{'device': 'file', 'alias': 'ide0-1-0', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'},'deviceType': 'disk', 'type': 'disk'}

Comment 5 Francesco Romani 2014-09-29 13:47:41 UTC
Actually the whole block device part look suspicious.
This is what destination VDSM receives, and which cannot handle properly:

    {'index': '3', 'iface': 'ide', 'name': 'hdd', 'alias': 'ide0-1-1', 'specParams': {'vmPayload': {'volId': 'config-2', 'file': {'openstack/latest/meta_data.json': 'ewogICJsYXVuY2hfaW5kZXgiIDogIjAiLAogICJhdmFpbGFiaWxpdHlfem9uZSIgOiAibm92YSIs\nCiAgInV1aWQiIDogIjA2OTRiMGQxLTI4YmUtNGZkZC04MzQwLTk5ZDZjNDFlYThjNSIsCiAgIm1l\ndGEiIDogewogICAgImVzc2VudGlhbCIgOiAiZmFsc2UiLAogICAgInJvbGUiIDogInNlcnZlciIs\nCiAgICAiZHNtb2RlIiA6ICJsb2NhbCIKICB9Cn0=\n', 'openstack/latest/user_data': 'I2Nsb3VkLWNvbmZpZwpzc2hfcHdhdXRoOiB0cnVlCmRpc2FibGVfcm9vdDogMApvdXRwdXQ6CiAg\nYWxsOiAnPj4gL3Zhci9sb2cvY2xvdWQtaW5pdC1vdXRwdXQubG9nJwpjaHBhc3N3ZDoKICBleHBp\ncmU6IGZhbHNlCnJ1bmNtZDoKLSAnc2VkIC1pICcnL15kYXRhc291cmNlX2xpc3Q6IC9kJycgL2V0\nYy9jbG91ZC9jbG91ZC5jZmc7IGVjaG8gJydkYXRhc291cmNlX2xpc3Q6CiAgWyJOb0Nsb3VkIiwg\nIkNvbmZpZ0RyaXZlIl0nJyA+PiAvZXRjL2Nsb3VkL2Nsb3VkLmNmZycK\n'}}}, 'readonly': 'True', 'deviceType': 'disk', 'deviceId': 'dade11cf-b6b7-4bb6-b069-a8c9bdc86eec', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '1'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'},
    {'index': '2', 'iface': 'ide', 'name': 'hdd', 'alias': 'ide0-1-1', 'specParams': {'path': ''}, 'readonly': 'True', 'deviceType': 'disk', 'deviceId': '9f8e1bce-2968-49a0-8762-bd6046957b17', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '1'}, 'device': 'cdrom', 'shared': 'false', 'path': '', 'type': 'disk'},
    {'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'volumeInfo': {'domainID': '40d2c2ac-d6b5-4e33-aecd-380a38a20212', 'volType': 'path', 'leaseOffset': 0, 'volumeID': 'c755b30e-67b6-49c9-9c00-518fa63098ab', 'leasePath': '/rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm28__nfs__2014__09__22__12__37__50__682803/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab.lease', 'imageID': 'bc593676-8901-4114-839f-895f69264150', 'path': '/rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm28__nfs__2014__09__22__12__37__50__682803/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab'}, 'index': '0', 'iface': 'virtio', 'apparentsize': '197120', 'alias': 'virtio-disk0', 'imageID': 'bc593676-8901-4114-839f-895f69264150', 'readonly': 'False', 'deviceType': 'disk', 'shared': 'false', 'truesize': '143360', 'type': 'disk', 'domainID': '40d2c2ac-d6b5-4e33-aecd-380a38a20212', 'reqsize': '0', 'format': 'cow', 'deviceId': 'bc593676-8901-4114-839f-895f69264150', 'poolID': 'e3b03197-b119-451d-b039-61a56e058eaa', 'device': 'disk', 'path': '/rhev/data-center/e3b03197-b119-451d-b039-61a56e058eaa/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab', 'propagateErrors': 'off', 'optional': 'false', 'name': 'vda', 'volumeID': 'c755b30e-67b6-49c9-9c00-518fa63098ab', 'specParams': {}, 'volumeChain': [{'domainID': '40d2c2ac-d6b5-4e33-aecd-380a38a20212', 'volType': 'path', 'leaseOffset': 0, 'volumeID': 'c755b30e-67b6-49c9-9c00-518fa63098ab', 'leasePath': '/rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm28__nfs__2014__09__22__12__37__50__682803/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab.lease', 'imageID': 'bc593676-8901-4114-839f-895f69264150', 'path': '/rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm28__nfs__2014__09__22__12__37__50__682803/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab'}]},
    {'device': 'file', 'alias': 'ide0-1-0', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'},'deviceType': 'disk', 'type': 'disk'}

Comment 6 Francesco Romani 2014-09-30 10:03:04 UTC
I'm still not 100% sure because I can't yet reproduce locally, but from the logs the sequence of the events seems the following.

* [cinteg6]
  a VM is created with unattached cdrom (hdc) and a cloud-init cdrom (hdd).
  Please not that the cloud-init path is local to the target hypervisor (not
  on a shared mount)
* [cinteg6->cinteg9]
  VM is migrated. The fact that fist migration is automatic should not be
  relevant.
* [cinteg6->cinteg9]
  during migration, the hdd path is somehow lost. This confuses the target VDSM,
  with does the matching based on path (and it was like this since forever).
  This makes target VDSM overwrite data relative to 'hdc' with data of 'hdd'.

This is how it looks like (reconstructed) on src [cinteg6] just before the migration

             {'address': {'bus': '1',
                          'controller': '0',
                          'target': '0',
                          'type': 'drive',
                          'unit': '1'},
              'alias': 'ide0-1-1',
              'device': 'cdrom',
              'deviceId': 'dade11cf-b6b7-4bb6-b069-a8c9bdc86eec',
              'deviceType': 'disk',
              'iface': 'ide',
              'index': '3',
              'name': 'hdd',
              'path': '/var/run/vdsm/payload/ec964bd3-8c65-49b4-b9a4-5022fc628d54.9adb85c26621d449eac2eb9c338be502.img',
              'readonly': 'True',
              'shared': 'false',
              'specParams': {'vmPayload': {'file': {'openstack/latest/meta_data.json': 'ewogICJsYXVuY2hfaW5kZXgiIDogIjAiLAogICJhdmFpbGFiaWxpdHlfem9uZSIgOiAibm92YSIs\nCiAgInV1aWQiIDogIjA2OTRiMGQxLTI4YmUtNGZkZC04MzQwLTk5ZDZjNDFlYThjNSIsCiAgIm1l\ndGEiIDogewogICAgImVzc2VudGlhbCIgOiAiZmFsc2UiLAogICAgInJvbGUiIDogInNlcnZlciIs\nCiAgICAiZHNtb2RlIiA6ICJsb2NhbCIKICB9Cn0=\n',
                                                    'openstack/latest/user_data': 'I2Nsb3VkLWNvbmZpZwpzc2hfcHdhdXRoOiB0cnVlCmRpc2FibGVfcm9vdDogMApvdXRwdXQ6CiAg\nYWxsOiAnPj4gL3Zhci9sb2cvY2xvdWQtaW5pdC1vdXRwdXQubG9nJwpjaHBhc3N3ZDoKICBleHBp\ncmU6IGZhbHNlCnJ1bmNtZDoKLSAnc2VkIC1pICcnL15kYXRhc291cmNlX2xpc3Q6IC9kJycgL2V0\nYy9jbG91ZC9jbG91ZC5jZmc7IGVjaG8gJydkYXRhc291cmNlX2xpc3Q6CiAgWyJOb0Nsb3VkIiwg\nIkNvbmZpZ0RyaXZlIl0nJyA+PiAvZXRjL2Nsb3VkL2Nsb3VkLmNmZycK\n'},
                                           'volId': 'config-2'}},
              'type': 'disk'},
             {'address': {'bus': '1',
                          'controller': '0',
                          'target': '0',
                          'type': 'drive',
                          'unit': '0'},
              'alias': 'ide0-1-0',
              'device': 'cdrom',
              'deviceId': '9f8e1bce-2968-49a0-8762-bd6046957b17',
              'deviceType': 'disk',
              'iface': 'ide',
              'index': '2',
              'name': 'hdc',
              'path': '',
              'readonly': 'True',
              'shared': 'false',
              'specParams': {'path': ''},
              'type': 'disk'},

This is how it is (reconstructed) on dst [cinteg9] after the migration

             {'address': {u'bus': u'1',
                          u'controller': u'0',
                          u'target': u'0',
                          u'type': u'drive',
                          u'unit': u'1'},
              'alias': u'ide0-1-1',
              'device': 'cdrom',
              'deviceId': 'dade11cf-b6b7-4bb6-b069-a8c9bdc86eec',
              'deviceType': 'disk',
              'iface': 'ide',
              'index': '3',
              'name': u'hdd',
              'path': '',
              'readonly': 'True',
              'shared': 'false',
              'specParams': {'vmPayload': {'file': {'openstack/latest/meta_data.json': 'ewogICJsYXVuY2hfaW5kZXgiIDogIjAiLAogICJhdmFpbGFiaWxpdHlfem9uZSIgOiAibm92YSIs\nCiAgInV1aWQiIDogIjA2OTRiMGQxLTI4YmUtNGZkZC04MzQwLTk5ZDZjNDFlYThjNSIsCiAgIm1l\ndGEiIDogewogICAgImVzc2VudGlhbCIgOiAiZmFsc2UiLAogICAgInJvbGUiIDogInNlcnZlciIs\nCiAgICAiZHNtb2RlIiA6ICJsb2NhbCIKICB9Cn0=\n',
                                                    'openstack/latest/user_data': 'I2Nsb3VkLWNvbmZpZwpzc2hfcHdhdXRoOiB0cnVlCmRpc2FibGVfcm9vdDogMApvdXRwdXQ6CiAg\nYWxsOiAnPj4gL3Zhci9sb2cvY2xvdWQtaW5pdC1vdXRwdXQubG9nJwpjaHBhc3N3ZDoKICBleHBp\ncmU6IGZhbHNlCnJ1bmNtZDoKLSAnc2VkIC1pICcnL15kYXRhc291cmNlX2xpc3Q6IC9kJycgL2V0\nYy9jbG91ZC9jbG91ZC5jZmc7IGVjaG8gJydkYXRhc291cmNlX2xpc3Q6CiAgWyJOb0Nsb3VkIiwg\nIkNvbmZpZ0RyaXZlIl0nJyA+PiAvZXRjL2Nsb3VkL2Nsb3VkLmNmZycK\n'},
                                           'volId': 'config-2'}},
              'type': 'disk'},
             {'address': {u'bus': u'1',
                          u'controller': u'0',
                          u'target': u'0',
                          u'type': u'drive',
                          u'unit': u'1'},
              'alias': u'ide0-1-1',
              'device': 'cdrom',
              'deviceId': '9f8e1bce-2968-49a0-8762-bd6046957b17',
              'deviceType': 'disk',
              'iface': 'ide',
              'index': '2',
              'name': u'hdd',
              'path': '',
              'readonly': 'True',
              'shared': 'false',
              'specParams': {'path': ''},
              'type': 'disk'},

Note: index 2 is now 'hdd': match devices by deviceId.
Now, dst [cinteg9] is able to run correctly, but when the VM is migrated back on src host [cinteg6].

It is not yet clear if this is a separate issue, possibly of lesser impact, of it this is the cause that leads, in turn, to this bogus device

             {'address': {u'bus': u'1',
                          u'controller': u'0',
                          u'target': u'0',
                          u'type': u'drive',
                          u'unit': u'0'},
              'alias': u'ide0-1-0',
              'device': u'file',
              'deviceType': u'disk',
              'type': u'disk'}

appearing, which seems somehow a relic of the past 'hdc'.
It is not also clear why this issue started to manifest itself because we had some changes in that timeframe due to device handling (livemerge) but I don't see how that could have affected this code path.

Comment 7 Ilanit Stein 2014-09-30 11:58:32 UTC
(In reply to Omer Frenkel from comment #3)
> missing libvirt.log the attached has only libvirtd.log
> please note that new libvirt has new logging, see
> https://bugzilla.redhat.com/show_bug.cgi?id=1141763#c15

As it turned out the libvirt logs are not missing, but rather changed to contain less logging, clearing this need info flag.

Comment 8 Francesco Romani 2014-09-30 12:08:04 UTC
I can't yet reproduce locally and this is blocking the advance of the investigation.
Any chance to re-have the verbose logs on libvirt?

What is needed to do so is to append these two lines

--- cut here ---
log_outputs="1:file:/var/log/libvirt/libvirtd.log"
log_filters="3:virobject 3:virfile 2:virnetlink     3:cgroup 3:event 3:json 1:libvirt 1:util 1:qemu"
--- cut here ---

to /etc/libvirt/libvirtd.conf,

then to rerun the failing test.

Comment 9 Francesco Romani 2014-09-30 13:05:20 UTC
(In reply to Francesco Romani from comment #6)

> Note: index 2 is now 'hdd': match devices by deviceId.
> Now, dst [cinteg9] is able to run correctly, but when the VM is migrated
> back on src host [cinteg6].
> 
> It is not yet clear if this is a separate issue, possibly of lesser impact,
> of it this is the cause that leads, in turn, to this bogus device
> 
>              {'address': {u'bus': u'1',
>                           u'controller': u'0',
>                           u'target': u'0',
>                           u'type': u'drive',
>                           u'unit': u'0'},
>               'alias': u'ide0-1-0',
>               'device': u'file',
>               'deviceType': u'disk',
>               'type': u'disk'}
> 
> appearing, which seems somehow a relic of the past 'hdc'.
> It is not also clear why this issue started to manifest itself because we
> had some changes in that timeframe due to device handling (livemerge) but I
> don't see how that could have affected this code path.

After a long and painful chase, turns out that it is definitely possible that if
1. destination path for the cloud-init fake cdrom device disappears, thus
2. 'hdd' device overwrites 'hdc' data, then
3. 'hdc' is detected as 'unknown device', then will have just these minimal fields - 'deviceType' is added on the fly bi Bridge.py

Then, the disappeareanche of the cloud-init data path is our best lead yet.
To go further I need either
1. a way to reproduce locally or
2. the XML is being used on destination VDSM after the first migration

It is not clear yet why this started to happen recently, no relevant changes
are coming to my mind yet.

Comment 10 Francesco Romani 2014-09-30 14:05:12 UTC
I'd just like to add the reason why I believe that the cloud-init device, hdd, "disappears"

After the first migration, on dst[cinteg9], we have:
Thread-212::DEBUG::2014-09-22 12:41:37,554::vm::5311::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Looking for drive with attributes {'name': u'hdc', 'bootOrder': '', 'boot': [], 'readonly': True, 'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, 'path': '', 'type': u'cdrom'}
Thread-212::DEBUG::2014-09-22 12:41:37,554::vm::5332::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'hdc', u'hdc'), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (True, True), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}), 'path': ('', ''), 'type': (u'cdrom', u'cdrom')}
Thread-212::DEBUG::2014-09-22 12:41:37,555::vm::5351::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'hdc', None), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (True, None), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, None), 'path': ('', None), 'type': (u'cdrom', None)}
Thread-212::DEBUG::2014-09-22 12:41:37,555::vm::5311::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Looking for drive with attributes {'name': u'hdd', 'bootOrder': '', 'boot': [], 'readonly': True, 'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}, 'path': '', 'type': u'cdrom'}
Thread-212::WARNING::2014-09-22 12:41:37,555::vm::5321::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::updating drive ide0-1-1 path from /var/run/vdsm/payload/ec964bd3-8c65-49b4-b9a4-5022fc628d54.788d4b3d039d7c11a8a57dc8118770c4.img to 
Thread-212::DEBUG::2014-09-22 12:41:37,561::vm::5332::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'hdd', u'hdd'), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (True, True), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}, {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}), 'path': ('', ''), 'type': (u'cdrom', u'cdrom')}
Thread-212::DEBUG::2014-09-22 12:41:37,562::vm::5332::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'hdd', u'hdd'), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (True, True), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}, {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}), 'path': ('', ''), 'type': (u'cdrom', u'cdrom')}
Thread-212::WARNING::2014-09-22 12:41:37,562::vm::5342::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::updating drive ide0-1-1 config path from /var/run/vdsm/payload/ec964bd3-8c65-49b4-b9a4-5022fc628d54.788d4b3d039d7c11a8a57dc8118770c4.img to 
Thread-212::DEBUG::2014-09-22 12:41:37,562::vm::5351::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'hdd', None), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (True, None), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}, None), 'path': ('', None), 'type': (u'cdrom', None)}
Thread-212::DEBUG::2014-09-22 12:41:37,562::vm::5351::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'hdd', None), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (True, None), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}, None), 'path': ('', None), 'type': (u'cdrom', None)}
Thread-212::DEBUG::2014-09-22 12:41:37,563::vm::5311::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Looking for drive with attributes {'name': u'vda', 'bootOrder': '', 'boot': [], 'readonly': False, 'address': {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'path': u'/rhev/data-center/e3b03197-b119-451d-b039-61a56e058eaa/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab', 'type': u'disk'}
Thread-212::DEBUG::2014-09-22 12:41:37,563::vm::5332::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'vda', u'vda'), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (False, False), 'address': ({u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}), 'path': (u'/rhev/data-center/e3b03197-b119-451d-b039-61a56e058eaa/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab', u'/rhev/data-center/e3b03197-b119-451d-b039-61a56e058eaa/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab'), 'type': (u'disk', u'disk')}
Thread-212::DEBUG::2014-09-22 12:41:37,563::vm::5351::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'vda', None), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (False, None), 'address': ({u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, None), 'path': (u'/rhev/data-center/e3b03197-b119-451d-b039-61a56e058eaa/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab', None), 'type': (u'disk', None)}



compare with what happened seconds before on src [cinteg6]:

Thread-161::DEBUG::2014-09-22 12:41:24,959::vm::5311::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Looking for drive with attributes {'name': u'hdc', 'bootOrder': '', 'boot': [], 'readonly': True, 'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, 'path': '', 'type': u'cdrom'}
Thread-161::DEBUG::2014-09-22 12:41:24,960::vm::5332::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'hdc', u'hdc'), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (True, True), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}), 'path': ('', ''), 'type': (u'cdrom', u'cdrom')}
Thread-161::DEBUG::2014-09-22 12:41:24,960::vm::5351::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'hdc', None), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (True, None), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'0'}, None), 'path': ('', None), 'type': (u'cdrom', None)}
Thread-161::DEBUG::2014-09-22 12:41:24,961::vm::5311::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Looking for drive with attributes {'name': u'hdd', 'bootOrder': '', 'boot': [], 'readonly': True, 'address': {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}, 'path': u'/var/run/vdsm/payload/ec964bd3-8c65-49b4-b9a4-5022fc628d54.9adb85c26621d449eac2eb9c338be502.img', 'type': u'cdrom'}
Thread-161::DEBUG::2014-09-22 12:41:24,961::vm::5332::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'hdd', u'hdd'), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (True, True), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}, {u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}), 'path': (u'/var/run/vdsm/payload/ec964bd3-8c65-49b4-b9a4-5022fc628d54.9adb85c26621d449eac2eb9c338be502.img', u'/var/run/vdsm/payload/ec964bd3-8c65-49b4-b9a4-5022fc628d54.9adb85c26621d449eac2eb9c338be502.img'), 'type': (u'cdrom', u'cdrom')}
Thread-161::DEBUG::2014-09-22 12:41:24,961::vm::5351::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'hdd', None), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (True, None), 'address': ({u'bus': u'1', u'controller': u'0', u'type': u'drive', u'target': u'0', u'unit': u'1'}, None), 'path': (u'/var/run/vdsm/payload/ec964bd3-8c65-49b4-b9a4-5022fc628d54.9adb85c26621d449eac2eb9c338be502.img', None), 'type': (u'cdrom', None)}
Thread-161::DEBUG::2014-09-22 12:41:24,962::vm::5311::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Looking for drive with attributes {'name': u'vda', 'bootOrder': '', 'boot': [], 'readonly': False, 'address': {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, 'path': u'/rhev/data-center/e3b03197-b119-451d-b039-61a56e058eaa/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab', 'type': u'disk'}
Thread-161::DEBUG::2014-09-22 12:41:24,962::vm::5332::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'vda', u'vda'), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (False, False), 'address': ({u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, {u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}), 'path': (u'/rhev/data-center/e3b03197-b119-451d-b039-61a56e058eaa/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab', u'/rhev/data-center/e3b03197-b119-451d-b039-61a56e058eaa/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab'), 'type': (u'disk', u'disk')}
Thread-161::DEBUG::2014-09-22 12:41:24,963::vm::5351::vm.Vm::(_getUnderlyingDriveInfo) vmId=`ec964bd3-8c65-49b4-b9a4-5022fc628d54`::Matched {'name': (u'vda', None), 'bootOrder': ('', None), 'boot': ([], None), 'readonly': (False, None), 'address': ({u'slot': u'0x06', u'bus': u'0x00', u'domain': u'0x0000', u'type': u'pci', u'function': u'0x0'}, None), 'path': (u'/rhev/data-center/e3b03197-b119-451d-b039-61a56e058eaa/40d2c2ac-d6b5-4e33-aecd-380a38a20212/images/bc593676-8901-4114-839f-895f69264150/c755b30e-67b6-49c9-9c00-518fa63098ab', None), 'type': (u'disk', None)}

Comment 11 Michal Skrivanek 2014-10-01 13:20:35 UTC
it doesn't reproduce when run manually on its own. Might be sequence of steps or something setup specific. Would appreciate help narrowing it down

Comment 13 Francesco Romani 2014-10-02 08:25:33 UTC
(In reply to Francesco Romani from comment #10)
> I'd just like to add the reason why I believe that the cloud-init device,
> hdd, "disappears"

And indeed it does.

First migration is causing all the issues. On dst host on the first migration, we see

Thread-227::DEBUG::2014-10-01 18:11:30,858::BindingXMLRPC::1132::vds::(wrapper) client [10.35.148.72]::call vmMigrationCreate with ({'guestFQDN': '', 'acpiEnable': 'true', 'emulatedMachine': 'rhel6.5.0', 'afterMigrationStatus': '', 'spiceSecureChannels': 'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', 'vmId': '9bf662e3-9c72-450e-98e0-2a658b42c602', 'memGuaranteedSize': 1536, 'transparentHugePages': 'true', 'displaySecurePort': '5900', 'spiceSslCipherSuite': 'DEFAULT', 'cpuType': 'Conroe', 'smp': '1', 'pauseCode': 'NOERR', 'migrationDest': 'libvirt', 'numaTune': {'nodeset': '0', 'mode': 'interleave'}, 'custom': {}, 'vmType': 'kvm', '_srcDomXML': "<domain type='kvm' id='1'>\n  <name>migration_vm</name>\n  <uuid>9bf662e3-9c72-450e-98e0-2a658b42c602</uuid>\n  <memory unit='KiB'>1572864</memory>\n  <currentMemory unit='KiB'>1572864</currentMemory>\n  <vcpu placement='static' current='1'>16</vcpu>\n  <cputune>\n    <shares>1020</shares>\n  </cputune>\n  <numatune>\n    <memory mode='interleave' nodeset='0'/>\n  </numatune>\n  <sysinfo type='smbios'>\n    <system>\n      <entry name='manufacturer'>Red Hat</entry>\n      <entry name='product'>RHEV Hypervisor</entry>\n      <entry name='version'>6Server-6.6.0.1.el6</entry>\n      <entry name='serial'>7619fd27-c4f5-49c7-bca0-0d3e18478672</entry>\n      <entry name='uuid'>9bf662e3-9c72-450e-98e0-2a658b42c602</entry>\n    </system>\n  </sysinfo>\n  <os>\n    <type arch='x86_64' machine='rhel6.5.0'>hvm</type>\n    <boot dev='hd'/>\n    <smbios mode='sysinfo'/>\n  </os>\n  <features>\n    <acpi/>\n  </features>\n  <cpu mode='custom' match='exact'>\n    <model fallback='allow'>Conroe</model>\n    <topology sockets='16' cores='1' threads='1'/>\n  </cpu>\n  <clock offset='variable' adjustment='0' basis='utc'>\n    <timer name='rtc' tickpolicy='catchup'/>\n    <timer name='pit' tickpolicy='delay'/>\n    <timer name='hpet' present='no'/>\n  </clock>\n  <on_poweroff>destroy</on_poweroff>\n  <on_reboot>restart</on_reboot>\n  <on_crash>destroy</on_crash>\n  <devices>\n    <emulator>/usr/libexec/qemu-kvm</emulator>\n    <disk type='file' device='cdrom'>\n      <driver name='qemu' type='raw'/>\n      <source startupPolicy='optional'/>\n      <target dev='hdc' bus='ide'/>\n      <readonly/>\n      <serial></serial>\n      <alias name='ide0-1-0'/>\n      <address type='drive' controller='0' bus='1' target='0' unit='0'/>\n    </disk>\n    <disk type='file' device='cdrom'>\n      <driver name='qemu' type='raw'/>\n      <source file='/var/run/vdsm/payload/9bf662e3-9c72-450e-98e0-2a658b42c602.c6b55053c12d077f4f6ec07c7f6714e4.img' startupPolicy='optional'/>\n      <target dev='hdd' bus='ide'/>\n      <readonly/>\n      <serial></serial>\n      <alias name='ide0-1-1'/>\n      <address type='drive' controller='0' bus='1' target='0' unit='1'/>\n    </disk>\n    <disk type='file' device='disk' snapshot='no'>\n      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads'/>\n      <source file='/rhev/data-center/a3a7280b-47fb-4360-9c55-8a2f2d29924a/490d87e8-fa64-46b6-85a9-3cb3bd1deca2/images/c1ce3ff9-eddd-4d98-8887-1489ac659511/62b69b7f-d196-41b9-aa56-be73189096a7'>\n        <seclabel model='selinux' relabel='no'/>\n      </source>\n      <target dev='vda' bus='virtio'/>\n      <serial>c1ce3ff9-eddd-4d98-8887-1489ac659511</serial>\n      <alias name='virtio-disk0'/>\n      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>\n    </disk>\n    <controller type='scsi' index='0' model='virtio-scsi'>\n      <alias name='scsi0'/>\n      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>\n    </controller>\n    <controller type='virtio-serial' index='0' ports='16'>\n      <alias name='virtio-serial0'/>\n      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>\n    </controller>\n    <controller type='usb' index='0'>\n      <alias name='usb0'/>\n      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>\n    </controller>\n    <controller type='ide' index='0'>\n      <alias name='ide0'/>\n      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>\n    </controller>\n    <channel type='unix'>\n      <source mode='bind' path='/var/lib/libvirt/qemu/channels/9bf662e3-9c72-450e-98e0-2a658b42c602.com.redhat.rhevm.vdsm'/>\n      <target type='virtio' name='com.redhat.rhevm.vdsm'/>\n      <alias name='channel0'/>\n      <address type='virtio-serial' controller='0' bus='0' port='1'/>\n    </channel>\n    <channel type='unix'>\n      <source mode='bind' path='/var/lib/libvirt/qemu/channels/9bf662e3-9c72-450e-98e0-2a658b42c602.org.qemu.guest_agent.0'/>\n      <target type='virtio' name='org.qemu.guest_agent.0'/>\n      <alias name='channel1'/>\n      <address type='virtio-serial' controller='0' bus='0' port='2'/>\n    </channel>\n    <channel type='spicevmc'>\n      <target type='virtio' name='com.redhat.spice.0'/>\n      <alias name='channel2'/>\n      <address type='virtio-serial' controller='0' bus='0' port='3'/>\n    </channel>\n    <input type='mouse' bus='ps2'/>\n    <graphics type='spice' tlsPort='5900' autoport='yes' keymap='en-us' passwdValidTo='1970-01-01T00:00:01'>\n      <listen type='network' address='10.35.148.72' network='vdsm-rhevm'/>\n      <channel name='main' mode='secure'/>\n      <channel name='display' mode='secure'/>\n      <channel name='inputs' mode='secure'/>\n      <channel name='cursor' mode='secure'/>\n      <channel name='playback' mode='secure'/>\n      <channel name='record' mode='secure'/>\n      <channel name='smartcard' mode='secure'/>\n      <channel name='usbredir' mode='secure'/>\n    </graphics>\n    <sound model='ich6'>\n      <alias name='sound0'/>\n      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>\n    </sound>\n    <video>\n      <model type='qxl' ram='65536' vram='32768' heads='1'/>\n      <alias name='video0'/>\n      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>\n    </video>\n    <memballoon model='virtio'>\n      <alias name='balloon0'/>\n      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>\n    </memballoon>\n  </devices>\n  <seclabel type='dynamic' model='selinux' relabel='yes'>\n    <label>system_u:system_r:svirt_t:s0:c293,c409</label>\n    <imagelabel>system_u:object_r:svirt_image_t:s0:c293,c409</imagelabel>\n  </seclabel>\n</domain>\n", 'memSize': 1536, 'smpCoresPerSocket': '1', 'vmName': 'migration_vm', 'nice': '0', 'username': 'Unknown', 'bootMenuEnable': 'false', 'copyPasteEnable': 'true', 'displayIp': '10.35.148.72', 'keyboardLayout': 'en-us', 'displayPort': '-1', 'smartcardEnable': 'false', 'clientIp': '', 'fileTransferEnable': 'true', 'nicModel': 'rtl8139,pv', 'elapsedTimeOffset': 10.85105299949646, 'kvmEnable': 'true', 'displayNetwork': 'rhevm', 'devices': [{'specParams': {}, 'alias': 'sound0', 'deviceType': 'sound', 'deviceId': '38c59c55-20f7-40a6-9320-55679f22ea8b', 'address': {'slot': '0x03', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'ich6', 'type': 'sound'}, {'target': 1572864, 'specParams': {'model': 'virtio'}, 'alias': 'balloon0', 'deviceType': 'balloon', 'deviceId': '7967b03f-ee27-47d8-818b-95a468097a0a', 'address': {'slot': '0x07', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'memballoon', 'type': 'balloon'}, {'index': '0', 'alias': 'scsi0', 'specParams': {}, 'deviceType': 'controller', 'deviceId': 'f4e62c01-cafa-40ce-9097-0165f624bcd3', 'address': {'slot': '0x04', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'scsi', 'model': 'virtio-scsi', 'type': 'controller'}, {'specParams': {}, 'alias': 'virtio-serial0', 'deviceType': 'controller', 'deviceId': '2b7b6b52-18b0-42d9-89bb-33f064290ac6', 'address': {'slot': '0x05', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'virtio-serial', 'type': 'controller'}, {'specParams': {'vram': '32768', 'heads': '1'}, 'alias': 'video0', 'deviceType': 'video', 'deviceId': '61025711-7985-4e81-b973-d6cdbd1234c5', 'address': {'slot': '0x02', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'device': 'qxl', 'type': 'video'}, {'device': 'spice', 'specParams': {'displayNetwork': 'rhevm', 'spiceSecureChannels': 'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', 'keyMap': 'en-us', 'displayIp': '10.35.148.72', 'copyPasteEnable': 'true'}, 'deviceType': 'graphics', 'tlsPort': '5900', 'type': 'graphics'}, {'index': '3', 'iface': 'ide', 'name': 'hdd', 'alias': 'ide0-1-1', 'shared': 'false', 'specParams': {'vmPayload': {'volId': 'config-2', 'file': {'openstack/latest/meta_data.json': 'ewogICJsYXVuY2hfaW5kZXgiIDogIjAiLAogICJhdmFpbGFiaWxpdHlfem9uZSIgOiAibm92YSIs\nCiAgInV1aWQiIDogImY5Yjg2MWM5LWM5OGEtNDhiNS1hYWM2LTU4ODI3ZjAxNjlhZCIsCiAgIm1l\ndGEiIDogewogICAgImVzc2VudGlhbCIgOiAiZmFsc2UiLAogICAgInJvbGUiIDogInNlcnZlciIs\nCiAgICAiZHNtb2RlIiA6ICJsb2NhbCIKICB9Cn0=\n', 'openstack/latest/user_data': 'I2Nsb3VkLWNvbmZpZwpzc2hfcHdhdXRoOiB0cnVlCmRpc2FibGVfcm9vdDogMApvdXRwdXQ6CiAg\nYWxsOiAnPj4gL3Zhci9sb2cvY2xvdWQtaW5pdC1vdXRwdXQubG9nJwpjaHBhc3N3ZDoKICBleHBp\ncmU6IGZhbHNlCnJ1bmNtZDoKLSAnc2VkIC1pICcnL15kYXRhc291cmNlX2xpc3Q6IC9kJycgL2V0\nYy9jbG91ZC9jbG91ZC5jZmc7IGVjaG8gJydkYXRhc291cmNlX2xpc3Q6CiAgWyJOb0Nsb3VkIiwg\nIkNvbmZpZ0RyaXZlIl0nJyA+PiAvZXRjL2Nsb3VkL2Nsb3VkLmNmZycK\n'}}}, 'readonly': 'True', 'deviceType': 'disk', 'deviceId': '6e2cb6d2-5c2b-4a3c-8036-27b10384181a', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '1'}, 'device': 'cdrom', 'path': '/var/run/vdsm/payload/9bf662e3-9c72-450e-98e0-2a658b42c602.c6b55053c12d077f4f6ec07c7f6714e4.img', 'type': 'disk'}, {'index': '2', 'iface': 'ide', 'name': 'hdc', 'alias': 'ide0-1-0', 'shared': 'false', 'specParams': {'path': ''}, 'readonly': 'True', 'deviceType': 'disk', 'deviceId': 'd44fc0e3-9a40-46eb-8277-9d02390926ad', 'address': {'bus': '1', 'controller': '0', 'type': 'drive', 'target': '0', 'unit': '0'}, 'device': 'cdrom', 'path': '', 'type': 'disk'}, {'address': {'slot': '0x06', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x0'}, 'reqsize': '0', 'index': '0', 'iface': 'virtio', 'apparentsize': '197120', 'specParams': {}, 'imageID': 'c1ce3ff9-eddd-4d98-8887-1489ac659511', 'readonly': 'False', 'deviceType': 'disk', 'shared': 'false', 'truesize': '143360', 'type': 'disk', 'domainID': '490d87e8-fa64-46b6-85a9-3cb3bd1deca2', 'volumeInfo': {'domainID': '490d87e8-fa64-46b6-85a9-3cb3bd1deca2', 'volType': 'path', 'leaseOffset': 0, 'volumeID': '62b69b7f-d196-41b9-aa56-be73189096a7', 'leasePath': '/rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm01__nfs__2014__10__01__18__7__54__648306/490d87e8-fa64-46b6-85a9-3cb3bd1deca2/images/c1ce3ff9-eddd-4d98-8887-1489ac659511/62b69b7f-d196-41b9-aa56-be73189096a7.lease', 'imageID': 'c1ce3ff9-eddd-4d98-8887-1489ac659511', 'path': '/rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm01__nfs__2014__10__01__18__7__54__648306/490d87e8-fa64-46b6-85a9-3cb3bd1deca2/images/c1ce3ff9-eddd-4d98-8887-1489ac659511/62b69b7f-d196-41b9-aa56-be73189096a7'}, 'format': 'cow', 'deviceId': 'c1ce3ff9-eddd-4d98-8887-1489ac659511', 'poolID': 'a3a7280b-47fb-4360-9c55-8a2f2d29924a', 'device': 'disk', 'path': '/rhev/data-center/a3a7280b-47fb-4360-9c55-8a2f2d29924a/490d87e8-fa64-46b6-85a9-3cb3bd1deca2/images/c1ce3ff9-eddd-4d98-8887-1489ac659511/62b69b7f-d196-41b9-aa56-be73189096a7', 'propagateErrors': 'off', 'optional': 'false', 'name': 'vda', 'volumeID': '62b69b7f-d196-41b9-aa56-be73189096a7', 'alias': 'virtio-disk0', 'volumeChain': [{'domainID': '490d87e8-fa64-46b6-85a9-3cb3bd1deca2', 'volType': 'path', 'leaseOffset': 0, 'volumeID': '62b69b7f-d196-41b9-aa56-be73189096a7', 'leasePath': '/rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm01__nfs__2014__10__01__18__7__54__648306/490d87e8-fa64-46b6-85a9-3cb3bd1deca2/images/c1ce3ff9-eddd-4d98-8887-1489ac659511/62b69b7f-d196-41b9-aa56-be73189096a7.lease', 'imageID': 'c1ce3ff9-eddd-4d98-8887-1489ac659511', 'path': '/rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm01__nfs__2014__10__01__18__7__54__648306/490d87e8-fa64-46b6-85a9-3cb3bd1deca2/images/c1ce3ff9-eddd-4d98-8887-1489ac659511/62b69b7f-d196-41b9-aa56-be73189096a7'}]}, {'device': 'usb', 'alias': 'usb0', 'address': {'slot': '0x01', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x2'}, 'deviceType': 'controller', 'type': 'controller'}, {'device': 'ide', 'alias': 'ide0', 'address': {'slot': '0x01', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'function': '0x1'}, 'deviceType': 'controller', 'type': 'controller'}, {'device': 'unix', 'alias': 'channel0', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}, 'deviceType': 'channel', 'type': 'channel'}, {'device': 'unix', 'alias': 'channel1', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}, 'deviceType': 'channel', 'type': 'channel'}, {'device': 'spicevmc', 'alias': 'channel2', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '3'}, 'deviceType': 'channel', 'type': 'channel'}], 'status': 'Up', 'timeOffset': '0', 'maxVCpus': '16', 'guestIPs': '', 'display': 'qxl'},) {}

And further confirmation *VDSM* prepared the paths correctly is here:


Thread-228::INFO::2014-10-01 18:11:30,948::clientIF::331::vds::(prepareVolumePath) prepared volume path: /var/run/vdsm/payload/9bf662e3-9c72-450e-98e0-2a658b42c602.0defd0b2d811c0e399a713a0b684ce36.img
Thread-228::INFO::2014-10-01 18:11:30,948::clientIF::331::vds::(prepareVolumePath) prepared volume path: 


this suggests the hdd, cloud-init volume was recreated on dst host. Furthermore, on supervdsm logs:

MainProcess|Thread-228::DEBUG::2014-10-01 18:11:30,939::supervdsmServer::101::SuperVdsm.ServerCallback::(wrapper) call mkIsoFs with ('9bf662e3-9c72-450e-98e0-2a658b42c602', {'openstack/latest/meta_data.json': 'ewogICJsYXVuY2hfaW5kZXgiIDogIjAiLAogICJhdmFpbGFiaWxpdHlfem9uZSIgOiAibm92YSIs\nCiAgInV1aWQiIDogImY5Yjg2MWM5LWM5OGEtNDhiNS1hYWM2LTU4ODI3ZjAxNjlhZCIsCiAgIm1l\ndGEiIDogewogICAgImVzc2VudGlhbCIgOiAiZmFsc2UiLAogICAgInJvbGUiIDogInNlcnZlciIs\nCiAgICAiZHNtb2RlIiA6ICJsb2NhbCIKICB9Cn0=\n', 'openstack/latest/user_data': 'I2Nsb3VkLWNvbmZpZwpzc2hfcHdhdXRoOiB0cnVlCmRpc2FibGVfcm9vdDogMApvdXRwdXQ6CiAg\nYWxsOiAnPj4gL3Zhci9sb2cvY2xvdWQtaW5pdC1vdXRwdXQubG9nJwpjaHBhc3N3ZDoKICBleHBp\ncmU6IGZhbHNlCnJ1bmNtZDoKLSAnc2VkIC1pICcnL15kYXRhc291cmNlX2xpc3Q6IC9kJycgL2V0\nYy9jbG91ZC9jbG91ZC5jZmc7IGVjaG8gJydkYXRhc291cmNlX2xpc3Q6CiAgWyJOb0Nsb3VkIiwg\nIkNvbmZpZ0RyaXZlIl0nJyA+PiAvZXRjL2Nsb3VkL2Nsb3VkLmNmZycK\n'}, 'config-2') {}
MainProcess|Thread-228::DEBUG::2014-10-01 18:11:30,941::utils::738::root::(execCmd) /usr/bin/mkisofs -R -o /var/run/vdsm/payload/9bf662e3-9c72-450e-98e0-2a658b42c602.0defd0b2d811c0e399a713a0b684ce36.img -V config-2 /tmp/tmpgNgxNE (cwd None)
MainProcess|Thread-228::DEBUG::2014-10-01 18:11:30,947::utils::758::root::(execCmd) SUCCESS: <err> = 'Total translation table size: 0\nTotal rockridge attributes bytes: 763\nTotal directory bytes: 4458\nPath table size(bytes): 40\nMax brk space used 0\n179 extents written (0 MB)\n'; <rc> = 0
MainProcess|Thread-228::DEBUG::2014-10-01 18:11:30,947::supervdsmServer::108::SuperVdsm.ServerCallback::(wrapper) return mkIsoFs with /var/run/vdsm/payload/9bf662e3-9c72-450e-98e0-2a658b42c602.0defd0b2d811c0e399a713a0b684ce36.img


So VDSM did its job.

But libvirt configuration is messed up. dst vdsm receives:

Thread-228::DEBUG::2014-10-01 18:11:33,438::vm::4817::vm.Vm::(_getUnderlyingVmInfo) vmId=`9bf662e3-9c72-450e-98e0-2a658b42c602`::VM device changed:
<devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk device="cdrom" type="file">
      <driver name="qemu" type="raw"/>
      <source startupPolicy="optional"/>
      <target bus="ide" dev="hdc"/>
      <readonly/>
      <serial/>
      <alias name="ide0-1-0"/>
      <address bus="1" controller="0" target="0" type="drive" unit="0"/>
    </disk>
    <disk device="cdrom" type="file">
      <driver name="qemu" type="raw"/>
      <source startupPolicy="optional"/>
      <target bus="ide" dev="hdd"/>
      <readonly/>
      <serial/>
      <alias name="ide0-1-1"/>
      <address bus="1" controller="0" target="0" type="drive" unit="1"/>
    </disk>

This comes from additional debug logging I added to VDSM, on domDependentInit (VDSM re-fetches XML domain data from libvirt once the VM started)
And this causes all the havoc we seen.

To wrap up:
- we though it happens on RHEL 6.5 also, but can't reproduce here
- seems libvirt issue
- seems RHEL 6.6 regression

Comment 14 Francesco Romani 2014-10-02 08:26:38 UTC
Created attachment 943326 [details]
VDSM logs with more debug

snippet of VDSM logs of the first migration with more debug data, demonstrating the disappearance of the path.

Comment 15 Francesco Romani 2014-10-02 08:34:36 UTC
Hi Michal,

is the strange behaviour documented in https://bugzilla.redhat.com/show_bug.cgi?id=1145636#c13

a libvirt regression?

Comment 16 Michal Privoznik 2014-10-02 09:36:11 UTC
(In reply to Francesco Romani from comment #15)
> Hi Michal,
> 
> is the strange behaviour documented in
> https://bugzilla.redhat.com/show_bug.cgi?id=1145636#c13
> 
> a libvirt regression?

Well, you're setting cdroms to be optional, so if the source file doesn't exist it's dropped. Can you attach libvirt debug logs among with short description what you think is behaving wrongly? My brain's too small to parse vdsm debug logs, sorry.

Comment 17 Francesco Romani 2014-10-02 11:29:31 UTC
(In reply to Michal Privoznik from comment #16)
> (In reply to Francesco Romani from comment #15)
> > Hi Michal,
> > 
> > is the strange behaviour documented in
> > https://bugzilla.redhat.com/show_bug.cgi?id=1145636#c13
> > 
> > a libvirt regression?
> 
> Well, you're setting cdroms to be optional, so if the source file doesn't
> exist it's dropped. Can you attach libvirt debug logs among with short
> description what you think is behaving wrongly? My brain's too small to
> parse vdsm debug logs, sorry.

What I was expecting here is the drive to be dropped altogether, not just the source path. I was most likely just remembering wrongly.

So, we have to dig deeper.
On a well-behaving RHEL 6.5 host:
$ ls -lhZ d9be4550-7af1-4580-9430-4bfdaa7d50cf.f648da39aae979cd5d799be77f172429.img 
-rw-r-----. vdsm qemu system_u:object_r:virt_content_t:s0 d9be4550-7af1-4580-9430-4bfdaa7d50cf.f648da39aae979cd5d799be77f172429.img

On the failing RHEL 6.6 host:
# ls -lhZ 9bf662e3-9c72-450e-98e0-2a658b42c602.0defd0b2d811c0e399a713a0b684ce36.img 
-rw-r-----. vdsm qemu unconfined_u:object_r:virt_var_run_t:s0 9bf662e3-9c72-450e-98e0-2a658b42c602.0defd0b2d811c0e399a713a0b684ce36.img

then is selinux labeling again.

Comment 18 Michal Privoznik 2014-10-02 12:17:19 UTC
(In reply to Francesco Romani from comment #17)
> (In reply to Michal Privoznik from comment #16)
> > (In reply to Francesco Romani from comment #15)
> > > Hi Michal,
> > > 
> > > is the strange behaviour documented in
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1145636#c13
> > > 
> > > a libvirt regression?
> > 
> > Well, you're setting cdroms to be optional, so if the source file doesn't
> > exist it's dropped. Can you attach libvirt debug logs among with short
> > description what you think is behaving wrongly? My brain's too small to
> > parse vdsm debug logs, sorry.
> 
> What I was expecting here is the drive to be dropped altogether, not just
> the source path. I was most likely just remembering wrongly.

You can't do that. It would break machine ABI. We must keep the drive but with empty source path. It's like ripping CD off the tray (CD/floppies can disappear anytime in real HW too).

> 
> So, we have to dig deeper.
> On a well-behaving RHEL 6.5 host:
> $ ls -lhZ
> d9be4550-7af1-4580-9430-4bfdaa7d50cf.f648da39aae979cd5d799be77f172429.img 
> -rw-r-----. vdsm qemu system_u:object_r:virt_content_t:s0
> d9be4550-7af1-4580-9430-4bfdaa7d50cf.f648da39aae979cd5d799be77f172429.img
> 
> On the failing RHEL 6.6 host:
> # ls -lhZ
> 9bf662e3-9c72-450e-98e0-2a658b42c602.0defd0b2d811c0e399a713a0b684ce36.img 
> -rw-r-----. vdsm qemu unconfined_u:object_r:virt_var_run_t:s0
> 9bf662e3-9c72-450e-98e0-2a658b42c602.0defd0b2d811c0e399a713a0b684ce36.img
> 
> then is selinux labeling again.

Well, libvirt uses 'access($disk_src_path, F_OK) == 0' to determine if the path is accessible or not (this applies to local storage like in the case we're discussing).

Truth is, this check happens before libvirt even tries to relabel all the paths. However, libvirt is running as root:root so there's no label that would prevent libvirtd to touch.

Maybe there's a race somewhere? What if vdsm creates the file not upfront but somewhat later in the migration? For instance after the libvirtd has checked the disk presence?

Again, libvirt debug logs would shed more light.

Comment 19 Ilanit Stein 2014-10-02 13:51:30 UTC
Same VM start on dst host error in vdsm.log, occurred on vt4 engine (my local setup), and rhel6.6 hosts, with selinux Permissive.
Thread-1493::ERROR::2014-10-02 15:34:45,409::vm::2326::vm.Vm::(_startUnderlyingVm) vmId=`db35b270-56c9-4304-b0f8-92e758d7b3b8`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2266, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 3301, in _run
    devices = self.buildConfDevices()
  File "/usr/share/vdsm/virt/vm.py", line 2071, in buildConfDevices
    self.normalizeDrivesIndices(devices[DISK_DEVICES])
  File "/usr/share/vdsm/virt/vm.py", line 2234, in normalizeDrivesIndices
    if drv['iface'] not in self._usedIndices:
KeyError: 'iface'

Steps:
1. host_1 was running 3 rhel vms.
2. Move host_1 to maintenance => migration went OK.
3. Removed host_1
4. Add host_1.
5. Move host_2 to maintenance => Got the above error; VMs were not able to migrate.

Comment 20 Francesco Romani 2014-10-02 14:20:11 UTC
upgraded a box of mine to RHEL 6.6, migrated from RHEL 6.5, no luck: works as expected.

We need to nable libvirt debug logs on the failing environment.

log_outputs="1:file:/var/log/libvirt/libvirtd.log"
log_filters="3:virobject 1:virfile 2:virnetlink 3:cgroup 3:event 3:json 1:libvirt 1:util 1:qemu"

Comment 21 Michal Skrivanek 2014-10-03 11:11:59 UTC
Ilanit, can you please let us look at those 2 machines? The problematic one is the host_2, we need to see more debug information. Thanks

Comment 22 Francesco Romani 2014-10-03 13:16:04 UTC
managed to grab some debug logs, attached.

Highltighing relevant lines:

2014-10-03 12:45:31.102+0000: 59520: debug : qemuDomainCheckDiskPresence:2006 : Checking for disk presence
2014-10-03 12:45:31.107+0000: 59520: debug : virStorageFileGetMetadata:1006 : path=/var/run/vdsm/payload/f41aaf55-a521-4bee-9484-e5c2100599e7.f0066193f44d186b66f2dd5cac35309b.img format=1 uid=107 gid=107 probe=0
2014-10-03 12:45:31.107+0000: 59520: debug : virStorageFileGetMetadataRecurse:938 : path=/var/run/vdsm/payload/f41aaf55-a521-4bee-9484-e5c2100599e7.f0066193f44d186b66f2dd5cac35309b.img format=1 uid=107 gid=107 probe=0
2014-10-03 12:45:31.107+0000: 59520: error : virStorageFileGetMetadataRecurse:952 : Failed to open file '/var/run/vdsm/payload/f41aaf55-a521-4bee-9484-e5c2100599e7.f0066193f44d186b66f2dd5cac35309b.img': No such file or directory
2014-10-03 12:45:31.107+0000: 59520: debug : qemuDomainCheckRemoveOptionalDisk:1937 : Dropping disk 'hdd' on domain 'rhel-1' (UUID 'f41aaf55-a521-4bee-9484-e5c2100599e7') due to inaccessible source '/var/run/vdsm/payload/f41aaf55-a521-4bee-9484-e5c2100599e7.f0066193f44d186b66f2dd5cac35309b.img'


permissions on files:

  File: `/var/run/vdsm/payload/82604221-514a-4844-8f1e-767f397ebc03.4ca50fab90c0e9eebea1a52d6ff17e84.img'
  Size: 366592    	Blocks: 728        IO Block: 4096   regular file
Device: 803h/2051d	Inode: 23093288    Links: 1
Access: (0640/-rw-r-----)  Uid: (   36/    vdsm)   Gid: (  107/    qemu)
Access: 2014-10-02 15:01:09.000000000 +0300
Modify: 2014-10-02 15:01:09.000000000 +0300
Change: 2014-10-02 15:01:09.000000000 +0300

  File: `/var/run/vdsm/payload/db35b270-56c9-4304-b0f8-92e758d7b3b8.8329070093a08e7513cfe0b8e08d2eb1.img'
  Size: 366592    	Blocks: 728        IO Block: 4096   regular file
Device: 803h/2051d	Inode: 23093282    Links: 1
Access: (0640/-rw-r-----)  Uid: (   36/    vdsm)   Gid: (  107/    qemu)
Access: 2014-10-02 15:01:05.000000000 +0300
Modify: 2014-10-02 15:01:05.000000000 +0300
Change: 2014-10-02 15:01:05.000000000 +0300

  File: `/var/run/vdsm/payload/f41aaf55-a521-4bee-9484-e5c2100599e7.143152bdd9cf5372422f0669f030cda5.img'
  Size: 366592    	Blocks: 728        IO Block: 4096   regular file
Device: 803h/2051d	Inode: 23093281    Links: 1
Access: (0640/-rw-r-----)  Uid: (   36/    vdsm)   Gid: (  107/    qemu)
Access: 2014-10-03 15:58:01.000000000 +0300
Modify: 2014-10-03 15:58:01.000000000 +0300
Change: 2014-10-03 15:58:01.000000000 +0300

groups:

uid 36 is vdsm

qemu:x:107:107:qemu user:/:/sbin/nologin
qemu:x:107:vdsm,sanlock

Comment 23 Francesco Romani 2014-10-03 13:17:27 UTC
Created attachment 943715 [details]
libvirt debug logs of a reproduction of the problem

uid mismatch. Apparently libvirt expects qemu/qemu, but vdsm sets vdsm/qemu.

Comment 24 Michal Privoznik 2014-10-03 15:26:58 UTC
(In reply to Francesco Romani from comment #23)
> Created attachment 943715 [details]
> libvirt debug logs of a reproduction of the problem
> 
> uid mismatch. Apparently libvirt expects qemu/qemu, but vdsm sets vdsm/qemu.

Correct. In the version you are using libvirt checks (mistakenly) if disk is accessible as user/group from qemu.conf while domain may be running under different pair. This was fixed upstream by:

commit 9bf629ab60ec0f25be5426e8598e98c89c244c5a
Author:     Peter Krempa <pkrempa>
AuthorDate: Fri Feb 7 18:42:27 2014 +0100
Commit:     Peter Krempa <pkrempa>
CommitDate: Mon Feb 10 15:49:59 2014 +0100

    qemu: Use correct permissions when determining the image chain
    
    The code took into account only the global permissions. The domains now
    support per-vm DAC labels and per-image DAC labels. Use the most
    specific label available.

which is contained in v1.2.2-rc1~147

Comment 25 Ilanit Stein 2014-10-05 06:08:59 UTC
Clearing the need info, as I understand no info is required.

Michal,

As libvirt.log had been changed, recently, to contain very little information,
This bug (see comment #3), and also several more I handled lately, were missing libvirt information, such as https://bugzilla.redhat.com/show_bug.cgi?id=1045833

The bugs are not always reproduced, and it is good to have the logs.

Also, not having the logs, forces us to reproduce, with debug turned ON.
     
Can we please return the libvirt.log, to be detailed as it was before, or at least partly detailed.

Thank you,
Ilanit.

Comment 26 Francesco Romani 2014-10-06 12:03:04 UTC
The libvirt debug logs is a complex issue.

We knew that turning off debug would have made troubleshooting more complex, but on the other hand we cannot run with debug logs forever. Quite the opposite: VDSM is already too chatty.

To improve things, there should be an easy way to restore libvirt debug logs, so I filed
https://bugzilla.redhat.com/show_bug.cgi?id=1149656

For this issue, as https://bugzilla.redhat.com/show_bug.cgi?id=1145636#c24 confirms, this is libvirt issue. IIUC you're not running on the host you used for testing the last libvirt version, so please retry once you upgraded to the last libvirt snapshot for RHEL 6.6.

Comment 27 Dan Kenigsberg 2014-10-06 13:00:23 UTC
To enable more verbose libvirt logs, reapply


 log_outputs="1:file:/var/log/libvirt/libvirtd.log"
 log_filters="3:virobject 3:virfile 2:virnetlink 3:cgroup 3:event 3:json 1:libvirt 1:util 1:qemu"

to /etc/libvirtd.conf.

Which has been removed by http://gerrit.ovirt.org/#/c/31135/4/lib/vdsm/tool/configurators/libvirt.py

Comment 28 Ilanit Stein 2014-10-12 12:40:22 UTC
(in reply to comment #24)

The version you've mentioned (v1.2.2-rc1~147) is for upstream.
This bug was reported for downstream, where current libvirt version is:
libvirt-0.10.2-46.el6.x86_64 
Does this downstream version contain the fix you refer to?
If not, can it be implemented to downstream too?

Comment 29 Vinzenz Feenstra [evilissimo] 2014-10-13 07:31:09 UTC
*** Bug 1149493 has been marked as a duplicate of this bug. ***

Comment 31 Michal Skrivanek 2014-10-15 09:53:38 UTC
(In reply to Michal Privoznik from comment #24)
we don't use DAC, we only have a global

  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c159,c973</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c159,c973</imagelabel>
  </seclabel>

Comment 32 Michal Privoznik 2014-10-15 10:22:15 UTC
(In reply to Ilanit Stein from comment #28)
> (in reply to comment #24)
> 
> The version you've mentioned (v1.2.2-rc1~147) is for upstream.
> This bug was reported for downstream, where current libvirt version is:
> libvirt-0.10.2-46.el6.x86_64 
> Does this downstream version contain the fix you refer to?
> If not, can it be implemented to downstream too?

This bug has been cloned to libvirt and I've created a scratch build with the fix (backported from upstream) there: bug 1149669#c7. However, the build may have been already wiped out so I'll provide a new one to test shortly.

Comment 33 Michal Privoznik 2014-10-16 09:14:02 UTC
As promised, I'm providing a new build. Can you please try it out?

https://brewweb.devel.redhat.com/taskinfo?taskID=8109125

Comment 34 Michal Skrivanek 2014-10-16 16:18:10 UTC
we're going with fix in the engine after all. It should be enough.
decreasing severity due to a possible workaround - switching to XML-RPC.
hence not a 3.5 beta blocker then

also note it's not that easy to trigger - you need to have a payload cdrom _and_ migrate the VM twice. For the use case of sysprep/cloudinit the most likely scenario is you start a VM, perform sysprep/initial init, shutdown, save as a template and use it from there (i.e. without payload drive)
Of course, for all the other cases when you do want to have a payload CD all the time, well, use workaround...

Comment 35 Ilanit Stein 2014-10-19 06:37:38 UTC
Clearing the needinfo, as I understand from comment #34, the fix will be in the engine.

Comment 36 Francesco Romani 2014-10-20 07:16:20 UTC
*** Bug 1152973 has been marked as a duplicate of this bug. ***

Comment 37 Michal Skrivanek 2014-10-22 13:46:46 UTC
vdsm side changes not required, can move to MODIFIED once the engine side is in

Comment 38 Ilanit Stein 2014-10-30 14:03:24 UTC
Verified on VT8.
by test flow, detailed in description

Comment 39 nijin ashok 2015-07-26 09:35:31 UTC
*** Bug 1224355 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.