Note: I am not sure in which Component this issue is located. Also I don't know which is the correct oVirt Team. Please correct if necessary. Thank you. Description of problem: I updated an engine installation from 3.2 to 3.6. also updated the associated Centos6 hosts vdsm from 3.10.x to 3.16.30. All existing VMs where in running state while the update was performed. After a first failover test a VM could not be restarted. The reason according to engine's log was this: VM xxxxxxxx is down with error. Exit message: internal error process exited while connecting to monitor: qemu-kvm: -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4: Duplicate ID 'virtio-serial0' for device The VM could not be started on any host. A closer look to the engine database confirms that ALL existing VMs have this problem: 2 devices with alias='virtio-serial0' e.g.: ---- engine=# SELECT * FROM vm_device WHERE vm_device.device = 'virtio-serial' AND vm_id = 'cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec' ORDER BY vm_id; -[ RECORD 1 ]-------------+------------------------------------------------------------- device_id | 2821d03c-ce88-4613-9095-e88eadcd3792 vm_id | cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec type | controller device | virtio-serial address | boot_order | 0 spec_params | { } is_managed | t is_plugged | f is_readonly | f _create_date | 2016-01-14 08:30:43.797161+01 _update_date | 2016-02-10 10:04:56.228724+01 alias | virtio-serial0 custom_properties | { } snapshot_id | logical_name | is_using_scsi_reservation | f -[ RECORD 2 ]-------------+------------------------------------------------------------- device_id | 29e0805f-d836-451a-9ec3-9031baa995e6 vm_id | cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec type | controller device | virtio-serial address | {bus=0x00, domain=0x0000, type=pci, slot=0x04, function=0x0} boot_order | 0 spec_params | { } is_managed | f is_plugged | t is_readonly | f _create_date | 2016-02-11 13:47:02.69992+01 _update_date | alias | virtio-serial0 custom_properties | snapshot_id | logical_name | is_using_scsi_reservation | f ---- My solution was this: DELETE FROM vm_device WHERE vm_id='cbfa359f-d0b8-484b-8ec0-cf9b8e4bb3ec' AND vm_device.device = 'virtio-serial' AND address = ''; according to Arik Hada's reply on the ovirt-users mailinglist, the better solution would be to remove the unamanged device and then restart the VM. I did not have the time to try this yet! Version-Release number of selected component (if applicable): 3.6.2.6-1.el6 How reproducible: Arik please could you describe how you reproduced it. Thank you. all the best Johannes Tiefenbacher LINBIT VIE
Hello, we also have a bunch of VMs on 3.6.2, upgraded from 3.5.6 without powering down VMs. When powering off and on a VM, ovirt sometimes complains about the additional controller in XML file and the VM cannot be powered on. I could not identify any pattern so far, only approx 10% of the VMs are affected for now. The additional virtio-serial0 device has no _update_date set, so identifying (and removing) is quite easy: select * from vm_device where alias='virtio-serial0' and _update_date is NULL; best, Dominique
(In reply to dominique.taffin from comment #1) Hi Dominique, The best solution would be to shutdown these VMS and remove their unmanaged virtio-serial devices. If you cannot shut them down and you'll remove the unmanaged devices while they run, the duplicated devices might appear again if any of the VM devices is updated before the next time the VM is down.
the fix is simple enough and the bug affects running VMs - moving to 3.6.5
Looks like I've hit the same issue on my 3.4->3.5->3.6 environment, while I hada an issue with 2 VMs not being able to get started with following error: 2016-03-03 14:53:19,302 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-14) [] Correlation ID: 502f9b04, Job ID: 99142a2c-9903-47f9-990d-9c11b1d43aa2, Call Stack: null, Custom Event ID: -1, Message: VM SP_Compute_el_6_7-5 was started by admin@internal (Host: seal09.qa.lab.tlv.redhat.com). 2016-03-03 14:53:20,877 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler_Worker-81) [] START, DestroyVDSCommand(HostName = seal09.qa.lab.tlv.redhat.com, DestroyVmVDSCommandParameters:{runAsync='true', hostId='ae5f7d8c-c120-44fd-bcb3-cb06d35b1bbf', vmId='21b8393b-b910-43f9-b9db-45d64140e5c3', force='false', secondsToWait='0', gracefully='false', reason='', ignoreNoVm='true'}), log id: 6eb2fdbf 2016-03-03 14:53:20,884 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DestroyVDSCommand] (DefaultQuartzScheduler_Worker-81) [] FINISH, DestroyVDSCommand, log id: 6eb2fdbf 2016-03-03 14:53:20,892 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-81) [] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VM SP_Compute_el_6_7-5 is down with error. Exit message: XML error: Multiple 'virtio-serial' controllers with index '0'. 2016-03-03 14:53:20,892 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-81) [] Running on vds during rerun failed vm: 'null' 2016-03-03 14:53:20,893 INFO [org.ovirt.engine.core.vdsbroker.VmAnalyzer] (DefaultQuartzScheduler_Worker-81) [] VM '21b8393b-b910-43f9-b9db-45d64140e5c3(SP_Compute_el_6_7-5) is running in db and not running in VDS 'seal09.qa.lab.tlv.redhat.com' This issue prevents customer from starting some VMs on their 3.6 environments. Is there any WA currently available?
(In reply to Nikolai Sednev from comment #4) Yes, to remove the unmanaged virtio-serial device. The fix will be available in 3.6.5.
(In reply to Arik from comment #5) > (In reply to Nikolai Sednev from comment #4) > Yes, to remove the unmanaged virtio-serial device. > The fix will be available in 3.6.5. Hi Arik, How can I remove the virtio-serial device from my VMs? If I'm getting to the edit VM, there is nothing selected in console->virtIOdevice.
(In reply to Nikolai Sednev from comment #6) > (In reply to Arik from comment #5) > > (In reply to Nikolai Sednev from comment #4) > > Yes, to remove the unmanaged virtio-serial device. > > The fix will be available in 3.6.5. > > Hi Arik, > How can I remove the virtio-serial device from my VMs? > If I'm getting to the edit VM, there is nothing selected in > console->virtIOdevice. Hi Nikolai, you would have to remove it in the engine database. first thing: make a backup of your database!!! then search for your virtio-serial devices: SELECT device_id,vm_id,device,address,is_managed,_create_date,_update_date FROM vm_device WHERE vm_device.device = 'virtio-serial' ORDER BY vm_id; then you should see your virtio-serial doubles, there device_ids, and if is_managed is true or false. Arik mentioned above: "The best solution would be to shutdown these VMS and remove their _unmanaged_ virtio-serial devices...." to view the unmanaged devices only, do this: SELECT device_id,vm_id,device,address,is_managed,_create_date,_update_date FROM vm_device WHERE vm_device.device = 'virtio-serial' AND is_managed = 'f' ORDER BY vm_id; now delete them one by one and check if it worked in the engine gui: DELETE FROM vm_device WHERE vm_device.device_id = '<put_vm_device_id_here>' yes you could delete them all at once, sure, but be careful, and again: backup your database first. this would delete all unmanaged virtio-servial devices in one go. not sure if this is a good idea! be careful! DELETE FROM vm_device WHERE vm_device.device = 'virtio-serial' AND is_managed = 'f'; I did not test alle the commands, hope there is no typos! I put them together from my command history and what people adviced here! all the best Johannes
The unmanaged virtio-serial devices were removed from Nikolai's DB. They have been removed from 3 VMs - 2 VMs that were down and hosted-engine VM. If the hosted-engine VM is not restarted, the unmanaged device might appear again when one of its devices is updated or when the VM is migrated - so it recommended to restart the hosted-engine VM before that.
Did not reproduced on latest 3.6.5.1-0.1. Works for me. Here what was done: 1)Deploy HE over two hosts as follows: ------------------------rhevm-setup-3.4.5-0.3.el6ev----------------------------- Hosts: qemu-kvm-rhev-0.12.1.2-2.448.el6_6.4.x86_64 vdsm-4.14.18-7.el6ev.x86_64 ovirt-host-deploy-1.2.5-1.el6ev.noarch rhevm-sdk-python-3.4.4.0-1.el6ev.noarch libvirt-0.10.2-46.el6_6.6.x86_64 mom-0.4.0-1.el6ev.noarch sanlock-2.8-2.el6_5.x86_64 ovirt-hosted-engine-ha-1.1.6-3.el6ev.noarch ovirt-hosted-engine-setup-1.1.5-1.el6ev.noarch Linux alma04.qa.lab.tlv.redhat.com 2.6.32-431.70.1.el6.x86_64 #1 SMP Wed Feb 24 16:53:51 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.6 (Santiago) Engine: rhevm-dwh-3.4.2-1.el6ev.noarch rhevm-branding-rhev-3.4.0-4.el6ev.noarch rhevm-doc-3.4.0-8.el6eng.noarch rhevm-reports-setup-3.4.2-1.el6ev.noarch rhevm-setup-plugins-3.4.5-1.el6ev.noarch rhevm-setup-base-3.4.5-0.3.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.4.5-0.3.el6ev.noarch rhevm-userportal-3.4.5-0.3.el6ev.noarch rhevm-3.4.5-0.3.el6ev.noarch rhevm-websocket-proxy-3.4.5-0.3.el6ev.noarch rhevm-spice-client-x86-msi-3.4-4.el6_5.noarch rhevm-image-uploader-3.4.3-1.el6ev.noarch rhevm-guest-agent-common-1.0.9-5.el6ev.noarch ovirt-host-deploy-java-1.2.5-1.el6ev.noarch rhevm-cli-3.4.0.6-4.el6ev.noarch ovirt-host-deploy-1.2.5-1.el6ev.noarch rhevm-iso-uploader-3.4.4-1.el6ev.noarch rhevm-webadmin-portal-3.4.5-0.3.el6ev.noarch rhevm-dwh-setup-3.4.2-1.el6ev.noarch rhevm-log-collector-3.4.5-2.el6ev.noarch rhevm-backend-3.4.5-0.3.el6ev.noarch rhevm-spice-client-x64-cab-3.4-4.el6_5.noarch rhevm-restapi-3.4.5-0.3.el6ev.noarch rhevm-dependencies-3.4.1-1.el6ev.noarch rhevm-lib-3.4.5-0.3.el6ev.noarch rhevm-setup-3.4.5-0.3.el6ev.noarch rhevm-spice-client-x86-cab-3.4-4.el6_5.noarch rhevm-reports-3.4.2-1.el6ev.noarch rhevm-sdk-python-3.4.4.0-1.el6ev.noarch rhevm-setup-plugin-websocket-proxy-3.4.5-0.3.el6ev.noarch rhevm-spice-client-x64-msi-3.4-4.el6_5.noarch rhevm-setup-plugin-ovirt-engine-common-3.4.5-0.3.el6ev.noarch rhevm-tools-3.4.5-0.3.el6ev.noarch rhevm-dbscripts-3.4.5-0.3.el6ev.noarch Red Hat Enterprise Linux Server release 6.6 (Santiago) Linux nsednev-he-2.qa.lab.tlv.redhat.com 2.6.32-504.43.1.el6.x86_64 #1 SMP Mon Jan 11 06:01:46 EST 2016 x86_64 x86_64 x86_64 GNU/Linux -------------------------------------------------------------------------------- 2)Add Export domain. 3)Add NFS data storage domain. 4)Import guest-VMs using rhel-guest-images. 5)Set hosts in to global maintenance. 6)Upgrade the engine as follows: -----------------rhevm-setup-rhevm-setup-3.5.8-0.1.el6ev----------------------- Engine: rhevm-setup-plugin-websocket-proxy-3.5.8-0.1.el6ev.noarch rhevm-cli-3.5.0.6-1.el6ev.noarch rhevm-doc-3.5.3-1.el6eng.noarch rhevm-backend-3.5.8-0.1.el6ev.noarch rhevm-sdk-python-3.5.6.0-1.el6ev.noarch rhevm-dwh-setup-3.5.5-1.el6ev.noarch rhevm-branding-rhev-3.5.0-4.el6ev.noarch rhevm-userportal-3.5.8-0.1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.5.8-0.1.el6ev.noarch rhevm-log-collector-3.5.4-2.el6ev.noarch rhevm-spice-client-x86-msi-3.5-3.el6.noarch rhevm-webadmin-portal-3.5.8-0.1.el6ev.noarch rhevm-tools-3.5.8-0.1.el6ev.noarch rhevm-reports-3.5.8-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-common-3.5.8-0.1.el6ev.noarch rhevm-websocket-proxy-3.5.8-0.1.el6ev.noarch rhevm-iso-uploader-3.5.1-1.el6ev.noarch rhevm-dependencies-3.5.1-1.el6ev.noarch rhevm-spice-client-x64-cab-3.5-3.el6.noarch rhevm-lib-3.5.8-0.1.el6ev.noarch rhevm-setup-base-3.5.8-0.1.el6ev.noarch rhevm-reports-setup-3.5.8-1.el6ev.noarch rhevm-spice-client-x64-msi-3.5-3.el6.noarch rhevm-extensions-api-impl-3.5.8-0.1.el6ev.noarch ovirt-host-deploy-1.3.2-1.el6ev.noarch rhevm-dbscripts-3.5.8-0.1.el6ev.noarch rhevm-dwh-3.5.5-1.el6ev.noarch rhevm-setup-plugins-3.5.4-1.el6ev.noarch rhevm-image-uploader-3.5.0-4.el6ev.noarch rhevm-spice-client-x86-cab-3.5-3.el6.noarch ovirt-host-deploy-java-1.3.2-1.el6ev.noarch rhevm-restapi-3.5.8-0.1.el6ev.noarch rhevm-setup-3.5.8-0.1.el6ev.noarch rhevm-guest-agent-common-1.0.10-2.el6ev.noarch rhevm-3.5.8-0.1.el6ev.noarch Linux nsednev-he-2.qa.lab.tlv.redhat.com 2.6.32-573.22.1.el6.x86_64 #1 SMP Thu Mar 17 03:23:39 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.7 (Santiago) -------------------------------------------------------------------------------- 7)Released hosts from global maintenance. 8)Set second host in to maintenance, all guest-VMs migrated to first host. 9)Upgrade 3.4->3.5 on second host. 10)Activate second host. 11)Set first host in to maintenance. 12)All guest-VMs migrated to second host. 13)Upgrade 3.4->3.5 on first host. 14)Activate first host. ---------------------------------3.5.8-0.1-el6.7------------------------------- Hosts: libvirt-0.10.2-54.el6_7.6.x86_64 sanlock-2.8-2.el6_5.x86_64 ovirt-hosted-engine-ha-1.2.10-1.el6ev.noarch vdsm-4.16.36-1.el6ev.x86_64 ovirt-host-deploy-1.3.2-1.el6ev.noarch ovirt-hosted-engine-setup-1.2.6.1-1.el6ev.noarch qemu-kvm-rhev-0.12.1.2-2.479.el6_7.4.x86_64 Linux alma04.qa.lab.tlv.redhat.com 2.6.32-573.22.1.el6.x86_64 #1 SMP Thu Mar 17 03:23:39 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 6.7 (Santiago) -------------------------------------------------------------------------------- 15)Level up compatibility mode 3.4->3.5 for default host cluster. 16)Level up compatibility mode 3.4->3.5 for DC. 17)Create temporary host-cluster for el7.2 hosts. 12)Redeploy second host as el7.2/3.5 host #2 to the hosted-engine using WA from https://bugzilla.redhat.com/show_bug.cgi?id=1308962. -----------------------------------3.5.8-0.1-el7.2------------------------------ Host: qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64 rhevm-sdk-python-3.5.6.0-1.el7ev.noarch ovirt-hosted-engine-ha-1.2.10-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.4.x86_64 mom-0.4.1-4.el7ev.noarch ovirt-hosted-engine-setup-1.2.6.1-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-host-deploy-1.3.2-1.el7ev.noarch vdsm-4.16.36-1.el7ev.x86_64 -------------------------------------------------------------------------------- 13)Second host redeployed straight in to the temporary host-cluster. 14)Set second host in to maintenance and set it to use JSonRPC via edit host. 15)Activate host. 16)Migrate all VMs from first host to second host using cross-host-cluster migration. 17)Set first el6.7/3.5 host in to maintenance and remove it from WEBUI. 18)Redeploy first host as el7.2/3.5 host #3, as deployment won't let you redeploy it as host #1 and will drop error message, so as WA use #3 or higher, to the hosted-engine using WA from https://bugzilla.redhat.com/show_bug.cgi?id=1308962. 19)Return both hosts to the default host cluster, by setting one by one in to maintenance and cross-host-cluster-migrating guest-VMs from one to another. 20)Check that both hosts using JSONRPC protocol (edit host->General->Advanced Parameters->Use JSON protocol->mark "V"). 21)Remove empty, temporary el7.2 host-cluster via WEBUI. 22)Set on one of the hosts "hosted-engine --set-maintenance --mode=global" 23)Proceed to 3.5->3.6 upgrade and upgrade the engine first, if serial console needed, add it before running engine-setup. ------------------------rhevm-setup-3.6.5.1-0.1.el6----------------------------- Engine: rhevm-vmconsole-proxy-helper-3.6.5.1-0.1.el6.noarch rhevm-backend-3.6.5.1-0.1.el6.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.5.1-0.1.el6.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-spice-client-x64-msi-3.6-6.el6.noarch rhevm-tools-backup-3.6.5.1-0.1.el6.noarch rhevm-tools-3.6.5.1-0.1.el6.noarch rhevm-reports-setup-3.6.5-1.el6ev.noarch rhevm-guest-agent-common-1.0.11-2.el6ev.noarch ovirt-vmconsole-1.0.0-1.el6ev.noarch ovirt-host-deploy-1.4.1-1.el6ev.noarch ovirt-host-deploy-java-1.4.1-1.el6ev.noarch rhevm-3.6.5.1-0.1.el6.noarch rhevm-sdk-python-3.6.5.0-1.el6ev.noarch rhevm-extensions-api-impl-3.6.5.1-0.1.el6.noarch rhevm-dwh-setup-3.6.2-1.el6ev.noarch rhevm-spice-client-x86-cab-3.6-6.el6.noarch ovirt-vmconsole-proxy-1.0.0-1.el6ev.noarch rhevm-webadmin-portal-3.6.5.1-0.1.el6.noarch rhevm-dwh-3.6.2-1.el6ev.noarch rhevm-lib-3.6.5.1-0.1.el6.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.5.1-0.1.el6.noarch ovirt-engine-extension-aaa-jdbc-1.0.6-1.el6ev.noarch rhevm-setup-3.6.5.1-0.1.el6.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-spice-client-x86-msi-3.6-6.el6.noarch rhevm-branding-rhev-3.6.0-9.el6ev.noarch rhevm-restapi-3.6.5.1-0.1.el6.noarch rhevm-reports-3.6.5-1.el6ev.noarch rhevm-setup-plugin-websocket-proxy-3.6.5.1-0.1.el6.noarch rhevm-websocket-proxy-3.6.5.1-0.1.el6.noarch rhevm-spice-client-x64-cab-3.6-6.el6.noarch rhevm-dbscripts-3.6.5.1-0.1.el6.noarch rhevm-setup-base-3.6.5.1-0.1.el6.noarch rhevm-setup-plugin-ovirt-engine-3.6.5.1-0.1.el6.noarch rhevm-image-uploader-3.6.0-1.el6ev.noarch rhevm-doc-3.6.0-6.el6eng.noarch rhevm-userportal-3.6.5.1-0.1.el6.noarch ovirt-setup-lib-1.0.1-1.el6ev.noarch rhevm-setup-plugins-3.6.3-1.el6ev.noarch rhevm-cli-3.6.2.0-1.el6ev.noarch rhevm-dependencies-3.6.0-1.el6ev.noarch -------------------------------------------------------------------------------- 24)Remove global maintenance from hosts "hosted-engine --set-maintenance --mode=none". 25)Upgrade hosts 3.5->3.6, one host at a time. --------------------------------3.6.5.1-0.1------------------------------------- Hosts: rhev-release-3.6.5-2-001.noarch vdsm-4.17.25-0.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.5.0-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.4.x86_64 sanlock-3.2.4-2.el7_2.x86_64 ovirt-host-deploy-1.4.1-1.el7ev.noarch mom-0.5.2-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-vmconsole-host-1.0.0-1.el7ev.noarch rhevm-sdk-python-3.6.5.0-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64 ovirt-hosted-engine-ha-1.3.5.2-1.el7ev.noarch Linux alma03.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) -------------------------------------------------------------------------------- 26)Level up compatibility mode 3.5->3.6 for default host cluster and DC. 27)Wait until auto-import will finish import of the hosted_storage, during this period of time, HE-VM might be automatically restarted as part of the process. PASS.