Bug 1595489

Summary: Virtual machine lost its cdrom device
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: Arik <ahadas>
Status: CLOSED ERRATA QA Contact: Nisim Simsolo <nsimsolo>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2.3CC: ahadas, gveitmic, jortialc, lsurette, michal.skrivanek, mzamazal, nsimsolo, Rhev-m-bugs, sborella, shipatil, srevivo
Target Milestone: ovirt-4.3.0Keywords: Rebase, ZStream
Target Release: 4.3.0Flags: ahadas: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.0_alpha Doc Type: Bug Fix
Doc Text:
This release ensures that VMs existing in Red Hat Virtualization Manager version 4.2.3 or earlier do not lose their CD-ROM device if the VMs are restarted in 4.2.3 or later versions.
Story Points: ---
Clone Of:
: 1596234 (view as bug list) Environment:
Last Closed: 2019-05-08 12:37:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1596234    

Description Germano Veit Michel 2018-06-27 00:42:01 UTC
Description of problem:

After a sequence of events, the VM is now started without a cdrom device. Here is the sequence of events:

1. Upgrade CL from 4.1 to 4.2

2018-06-13 09:04:19,029+10 INFO  [org.ovirt.engine.core.bll.UpdateClusterCommand] ....

2. Migrate VM to another Host (VM was not power cycled yet)

2018-06-13 09:06:51,626+10 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [499f4a18] START, MigrateVDSCommand( MigrateVDSCommandParameters:{hostId='5b0143c4-4a74-40af-91f9-2cbdf1372cef', vmId='33f4ee96-4863-48a8-ae9f-f67a341afc62', srcHost='192.168.18.113', dstVdsId='57f984fc-58b3-432e-b5db-61f0a9918a1e', dstHost='192.168.18.114:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='2500', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, action={name=abort, params=[]}}]]', dstQemu='192.168.18.114'}), log id: 53ab4a6b

2018-06-13 09:07:06,632+10 WARN  [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] unmanaged disk with path '' is ignored    

2018-06-13 09:07:06,638+10 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] VM '33f4ee96-4863-48a8-ae9f-f67a341afc62' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='923802b9-9b05-4a6b-b67b-f2caa1b65578', vmId='33f4ee96-4863-48a8-ae9f-f67a341afc62'}', device='cdrom', type='DISK', specParams='[path=rhel-server-7.4-x86_64-dvd.iso]', address='', managed='true', plugged='false', readOnly='true', deviceAlias='ide0-1-0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice=''}'

3. Reboot the VM (now its going to be power cycled due to the new cold reboot feature)

2018-06-15 10:24:25,149+10 INFO  [org.ovirt.engine.core.bll.RebootVmCommand] (EE-ManagedThreadFactory-engine-Thread-820) [ff2390d2-0726-4946-b60a-5905b584dd62] VM 'NAS03' is performing cold reboot; run once: 'false', running as volatile: 'false', has next run configuration: 'true' 

2018-06-15 10:26:01,202+10 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-15) [] add VM '33f4ee96-4863-48a8-ae9f-f67a341afc62'(NAS03) to cold reboot treatment  

2018-06-15 10:26:01,308+10 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (EE-ManagedThreadFactory-engine-Thread-728) [49419eff] Running command: UpdateVmCommand internal: true. Entities affected :  ID: 33f4ee96-4863-48a8-ae9f-f67a341afc62 Type: VMAction group EDIT_VM_PROPERTIES with role type USER

Look at the timestamp above and the _update_date of the CDROM device, and the device address is NULL.

engine=> select device_id,device,address,is_managed,is_plugged,_update_date,alias from vm_device WHERE vm_id ='33f4ee96-4863-48a8-ae9f-f67a341afc62' AND device ='cdrom';
              device_id               | device | address | is_managed | is_plugged |         _update_date          |  alias   
--------------------------------------+--------+---------+------------+------------+-------------------------------+----------
 923802b9-9b05-4a6b-b67b-f2caa1b65578 | cdrom  |         | t          | f          | 2018-06-15 10:26:01.308944+00 | ide0-1-0

2018-06-15 10:26:01,399+10 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-728) [762c4295] EVENT_ID: SYSTEM_UPDATE_VM(253), VM NAS03 configuration was updated by system. 

2018-06-15 10:26:01,997+10 INFO  [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [2dee0d82] START, CreateVDSCommand( CreateVDSCommandParameters:{hostId='7aadd959-0c4d-4c65-af6d-14c8c958cb55', vmId='33f4ee96-4863-48a8-ae9f-f67a341afc62', vm='VM [NAS03]'}), log id: 11a9d517 

2018-06-15 10:26:02,009+10 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [2dee0d82] VM <?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">                       
  ...
  <uuid>33f4ee96-4863-48a8-ae9f-f67a341afc62</uuid>                                                                             
  ...
  No cdrom device in the XML

Now the VM cannot attach CDROMs anymore.

Version-Release number of selected component (if applicable):
rhvm-4.2.3.8-0.1.el7.noarch
vdsm-4.20.27.2-1.el7ev

Comment 2 Germano Veit Michel 2018-06-27 00:44:53 UTC
Not sure if related: https://gerrit.ovirt.org/#/c/92445/

Working on reproducing it.

Comment 3 Germano Veit Michel 2018-06-27 02:07:33 UTC
Didn't reproduce following the same events on our labs. I'll attach vdsm logs.

Comment 6 Arik 2018-06-27 07:00:33 UTC
(In reply to Germano Veit Michel from comment #3)
> Didn't reproduce following the same events on our labs. I'll attach vdsm
> logs.

Germano, note that this VM must have been started before the engine was updated to 4.2.3 (because the alias of the cd-rom is 'ide0-1-0' rather than a user-lias like 'ua-923802b9-9b05-4a6b-b67b-f2caa1b65578'). It is unlikely to happen for VMs that are started in 4.2.3 and above since we now use user-aliases. So this should be taken into account while trying to reproduce that.

Comment 7 Arik 2018-06-27 07:06:04 UTC
(In reply to Germano Veit Michel from comment #0)
> 2018-06-13 09:07:06,632+10 WARN 
> [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter]
> (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] unmanaged disk with
> path '' is ignored    

This means that we got a device without ID or with a different ID than it used to have before.

> 
> 2018-06-13 09:07:06,638+10 ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring]
> (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] VM
> '33f4ee96-4863-48a8-ae9f-f67a341afc62' managed non pluggable device was
> removed unexpectedly from libvirt:
> 'VmDevice:{id='VmDeviceId:{deviceId='923802b9-9b05-4a6b-b67b-f2caa1b65578',
> vmId='33f4ee96-4863-48a8-ae9f-f67a341afc62'}', device='cdrom', type='DISK',
> specParams='[path=rhel-server-7.4-x86_64-dvd.iso]', address='',
> managed='true', plugged='false', readOnly='true', deviceAlias='ide0-1-0',
> customProperties='[]', snapshotId='null', logicalName='null', hostDevice=''}'

This means that no device was reported with the ID 923802b9-9b05-4a6b-b67b-f2caa1b65578 so we have no correlation for the cd-rom in the database and therefore unplug it.

This reminds me of previous bugs we had where the ID of devices of VMs that were started in previous versions was not preserved when those VMs were migrated to a new version of VDSM.

Milan, were the fixes in that area already available in 4.2.3?

Comment 8 Michal Skrivanek 2018-06-27 07:09:14 UTC
Germano, can you please check if the CDROM was ejected? I.e. bug 1593568

Comment 9 Germano Veit Michel 2018-06-27 07:18:58 UTC
(In reply to Arik from comment #6)
> (In reply to Germano Veit Michel from comment #3)
> > Didn't reproduce following the same events on our labs. I'll attach vdsm
> > logs.
> 
> Germano, note that this VM must have been started before the engine was
> updated to 4.2.3 (because the alias of the cd-rom is 'ide0-1-0' rather than
> a user-lias like 'ua-923802b9-9b05-4a6b-b67b-f2caa1b65578'). It is unlikely
> to happen for VMs that are started in 4.2.3 and above since we now use
> user-aliases. So this should be taken into account while trying to reproduce
> that.

Ohh, I would never had noticed this! Thanks, I can try again tomorrow if you think it's necessary.

(In reply to Michal Skrivanek from comment #8)
> Germano, can you please check if the CDROM was ejected? I.e. bug 1593568

Yes it was ejected (no path).

Comment 10 Arik 2018-06-27 09:13:27 UTC
(In reply to Arik from comment #7)
> This reminds me of previous bugs we had where the ID of devices of VMs that
> were started in previous versions was not preserved when those VMs were
> migrated to a new version of VDSM.
> 
> Milan, were the fixes in that area already available in 4.2.3?

Well, that is incorrect since we're talking about cluster level 4.2 so the engine ignores the device IDs. Need to improve the correlation logic for cd-roms.

Comment 11 Milan Zamazal 2018-06-27 09:18:05 UTC
(In reply to Michal Skrivanek from comment #8)
> Germano, can you please check if the CDROM was ejected? I.e. bug 1593568

This is not related to that bug -- <source> element is present in the domain XML.

Comment 13 Nisim Simsolo 2018-12-12 15:02:02 UTC
Verified: 
ovirt-engine-4.3.0-0.2.master.20181203100035.git0ee44c1.el7
vdsm-4.30.3-71.git7d6039c.el7.x86_64
libvirt-client-4.5.0-10.el7.x86_64
qemu-kvm-ev-2.12.0-18.el7_6.1.1.x86_64
sanlock-3.6.0-1.el7.x86_64

Verification scenario:
1. Run VM on cluster with compatibility version 4.2.
   from engine db, verify VM has cdrom device and cdrom address is not null.
2. Upgrade cluster from 4.2 to 4.3
   Verify cdrom device still exist with correct address.
3. Migrate VM and afterward reboot VM.
4. Verify cdrom device still exist with correct address.
5. Attach cdrom to VM and verify cdrom device can be accessed from the VM.

Comment 15 errata-xmlrpc 2019-05-08 12:37:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085