Bug 1596234 - [downstream clone - 4.2.5] Virtual machine lost its cdrom device
Summary: [downstream clone - 4.2.5] Virtual machine lost its cdrom device
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.2.5
: ---
Assignee: Arik
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On: 1595489
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-28 12:49 UTC by RHV bug bot
Modified: 2021-09-09 14:49 UTC (History)
10 users (show)

Fixed In Version: ovirt-engine-4.2.5
Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Previously, when a virtual machine's cluster level was updated (for example, from 4.1 to 4.2), ejected CD-ROM devices could be lost, and the virtual machine was sometimes unable to use CDs anymore. This has now been fixed so that CD-ROM devices remain intact during upgrades regardless of whether they are in use or ejected.
Clone Of: 1595489
Environment:
Last Closed: 2018-07-31 17:49:18 UTC
oVirt Team: Virt
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43540 0 None None None 2021-09-09 14:49:59 UTC
Red Hat Knowledge Base (Solution) 3500601 0 None None None 2018-06-28 12:50:51 UTC
Red Hat Product Errata RHBA-2018:2318 0 None None None 2018-07-31 17:50:03 UTC
oVirt gerrit 92599 0 master MERGED core: fix the correlation logic of cd-roms 2020-10-08 13:35:00 UTC
oVirt gerrit 92606 0 ovirt-engine-4.2 MERGED core: fix the correlation logic of cd-roms 2020-10-08 13:35:09 UTC

Description RHV bug bot 2018-06-28 12:49:55 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1595489 +++
======================================================================

Description of problem:

After a sequence of events, the VM is now started without a cdrom device. Here is the sequence of events:

1. Upgrade CL from 4.1 to 4.2

2018-06-13 09:04:19,029+10 INFO  [org.ovirt.engine.core.bll.UpdateClusterCommand] ....

2. Migrate VM to another Host (VM was not power cycled yet)

2018-06-13 09:06:51,626+10 INFO  [org.ovirt.engine.core.vdsbroker.MigrateVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-77) [499f4a18] START, MigrateVDSCommand( MigrateVDSCommandParameters:{hostId='5b0143c4-4a74-40af-91f9-2cbdf1372cef', vmId='33f4ee96-4863-48a8-ae9f-f67a341afc62', srcHost='192.168.18.113', dstVdsId='57f984fc-58b3-432e-b5db-61f0a9918a1e', dstHost='192.168.18.114:54321', migrationMethod='ONLINE', tunnelMigration='false', migrationDowntime='0', autoConverge='true', migrateCompressed='false', consoleAddress='null', maxBandwidth='2500', enableGuestEvents='true', maxIncomingMigrations='2', maxOutgoingMigrations='2', convergenceSchedule='[init=[{name=setDowntime, params=[100]}], stalling=[{limit=1, action={name=setDowntime, params=[150]}}, {limit=2, action={name=setDowntime, params=[200]}}, {limit=3, action={name=setDowntime, params=[300]}}, {limit=4, action={name=setDowntime, params=[400]}}, {limit=6, action={name=setDowntime, params=[500]}}, {limit=-1, action={name=abort, params=[]}}]]', dstQemu='192.168.18.114'}), log id: 53ab4a6b

2018-06-13 09:07:06,632+10 WARN  [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] unmanaged disk with path '' is ignored    

2018-06-13 09:07:06,638+10 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] VM '33f4ee96-4863-48a8-ae9f-f67a341afc62' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='923802b9-9b05-4a6b-b67b-f2caa1b65578', vmId='33f4ee96-4863-48a8-ae9f-f67a341afc62'}', device='cdrom', type='DISK', specParams='[path=rhel-server-7.4-x86_64-dvd.iso]', address='', managed='true', plugged='false', readOnly='true', deviceAlias='ide0-1-0', customProperties='[]', snapshotId='null', logicalName='null', hostDevice=''}'

3. Reboot the VM (now its going to be power cycled due to the new cold reboot feature)

2018-06-15 10:24:25,149+10 INFO  [org.ovirt.engine.core.bll.RebootVmCommand] (EE-ManagedThreadFactory-engine-Thread-820) [ff2390d2-0726-4946-b60a-5905b584dd62] VM 'NAS03' is performing cold reboot; run once: 'false', running as volatile: 'false', has next run configuration: 'true' 

2018-06-15 10:26:01,202+10 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-15) [] add VM '33f4ee96-4863-48a8-ae9f-f67a341afc62'(NAS03) to cold reboot treatment  

2018-06-15 10:26:01,308+10 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (EE-ManagedThreadFactory-engine-Thread-728) [49419eff] Running command: UpdateVmCommand internal: true. Entities affected :  ID: 33f4ee96-4863-48a8-ae9f-f67a341afc62 Type: VMAction group EDIT_VM_PROPERTIES with role type USER

Look at the timestamp above and the _update_date of the CDROM device, and the device address is NULL.

engine=> select device_id,device,address,is_managed,is_plugged,_update_date,alias from vm_device WHERE vm_id ='33f4ee96-4863-48a8-ae9f-f67a341afc62' AND device ='cdrom';
              device_id               | device | address | is_managed | is_plugged |         _update_date          |  alias   
--------------------------------------+--------+---------+------------+------------+-------------------------------+----------
 923802b9-9b05-4a6b-b67b-f2caa1b65578 | cdrom  |         | t          | f          | 2018-06-15 10:26:01.308944+00 | ide0-1-0

2018-06-15 10:26:01,399+10 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-728) [762c4295] EVENT_ID: SYSTEM_UPDATE_VM(253), VM NAS03 configuration was updated by system. 

2018-06-15 10:26:01,997+10 INFO  [org.ovirt.engine.core.vdsbroker.CreateVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [2dee0d82] START, CreateVDSCommand( CreateVDSCommandParameters:{hostId='7aadd959-0c4d-4c65-af6d-14c8c958cb55', vmId='33f4ee96-4863-48a8-ae9f-f67a341afc62', vm='VM [NAS03]'}), log id: 11a9d517 

2018-06-15 10:26:02,009+10 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateBrokerVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-41) [2dee0d82] VM <?xml version="1.0" encoding="UTF-8"?><domain type="kvm" xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0">                       
  ...
  <uuid>33f4ee96-4863-48a8-ae9f-f67a341afc62</uuid>                                                                             
  ...
  No cdrom device in the XML

Now the VM cannot attach CDROMs anymore.

Version-Release number of selected component (if applicable):
rhvm-4.2.3.8-0.1.el7.noarch
vdsm-4.20.27.2-1.el7ev

(Originally by Germano Veit Michel)

Comment 3 RHV bug bot 2018-06-28 12:50:05 UTC
Not sure if related: https://gerrit.ovirt.org/#/c/92445/

Working on reproducing it.

(Originally by Germano Veit Michel)

Comment 4 RHV bug bot 2018-06-28 12:50:09 UTC
Didn't reproduce following the same events on our labs. I'll attach vdsm logs.

(Originally by Germano Veit Michel)

Comment 7 RHV bug bot 2018-06-28 12:50:20 UTC
(In reply to Germano Veit Michel from comment #3)
> Didn't reproduce following the same events on our labs. I'll attach vdsm
> logs.

Germano, note that this VM must have been started before the engine was updated to 4.2.3 (because the alias of the cd-rom is 'ide0-1-0' rather than a user-lias like 'ua-923802b9-9b05-4a6b-b67b-f2caa1b65578'). It is unlikely to happen for VMs that are started in 4.2.3 and above since we now use user-aliases. So this should be taken into account while trying to reproduce that.

(Originally by Arik Hadas)

Comment 8 RHV bug bot 2018-06-28 12:50:23 UTC
(In reply to Germano Veit Michel from comment #0)
> 2018-06-13 09:07:06,632+10 WARN 
> [org.ovirt.engine.core.vdsbroker.libvirt.VmDevicesConverter]
> (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] unmanaged disk with
> path '' is ignored    

This means that we got a device without ID or with a different ID than it used to have before.

> 
> 2018-06-13 09:07:06,638+10 ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring]
> (EE-ManagedThreadFactory-engineScheduled-Thread-50) [] VM
> '33f4ee96-4863-48a8-ae9f-f67a341afc62' managed non pluggable device was
> removed unexpectedly from libvirt:
> 'VmDevice:{id='VmDeviceId:{deviceId='923802b9-9b05-4a6b-b67b-f2caa1b65578',
> vmId='33f4ee96-4863-48a8-ae9f-f67a341afc62'}', device='cdrom', type='DISK',
> specParams='[path=rhel-server-7.4-x86_64-dvd.iso]', address='',
> managed='true', plugged='false', readOnly='true', deviceAlias='ide0-1-0',
> customProperties='[]', snapshotId='null', logicalName='null', hostDevice=''}'

This means that no device was reported with the ID 923802b9-9b05-4a6b-b67b-f2caa1b65578 so we have no correlation for the cd-rom in the database and therefore unplug it.

This reminds me of previous bugs we had where the ID of devices of VMs that were started in previous versions was not preserved when those VMs were migrated to a new version of VDSM.

Milan, were the fixes in that area already available in 4.2.3?

(Originally by Arik Hadas)

Comment 9 RHV bug bot 2018-06-28 12:50:27 UTC
Germano, can you please check if the CDROM was ejected? I.e. bug 1593568

(Originally by michal.skrivanek)

Comment 10 RHV bug bot 2018-06-28 12:50:30 UTC
(In reply to Arik from comment #6)
> (In reply to Germano Veit Michel from comment #3)
> > Didn't reproduce following the same events on our labs. I'll attach vdsm
> > logs.
> 
> Germano, note that this VM must have been started before the engine was
> updated to 4.2.3 (because the alias of the cd-rom is 'ide0-1-0' rather than
> a user-lias like 'ua-923802b9-9b05-4a6b-b67b-f2caa1b65578'). It is unlikely
> to happen for VMs that are started in 4.2.3 and above since we now use
> user-aliases. So this should be taken into account while trying to reproduce
> that.

Ohh, I would never had noticed this! Thanks, I can try again tomorrow if you think it's necessary.

(In reply to Michal Skrivanek from comment #8)
> Germano, can you please check if the CDROM was ejected? I.e. bug 1593568

Yes it was ejected (no path).

(Originally by Germano Veit Michel)

Comment 11 RHV bug bot 2018-06-28 12:50:33 UTC
(In reply to Arik from comment #7)
> This reminds me of previous bugs we had where the ID of devices of VMs that
> were started in previous versions was not preserved when those VMs were
> migrated to a new version of VDSM.
> 
> Milan, were the fixes in that area already available in 4.2.3?

Well, that is incorrect since we're talking about cluster level 4.2 so the engine ignores the device IDs. Need to improve the correlation logic for cd-roms.

(Originally by Arik Hadas)

Comment 12 RHV bug bot 2018-06-28 12:50:36 UTC
(In reply to Michal Skrivanek from comment #8)
> Germano, can you please check if the CDROM was ejected? I.e. bug 1593568

This is not related to that bug -- <source> element is present in the domain XML.

(Originally by Milan Zamazal)

Comment 14 Nisim Simsolo 2018-07-10 10:53:20 UTC
Verified:
rhvm-4.2.5-0.1.el7ev
vdsm-4.20.33-1.el7ev.x86_64
libvirt-client-3.9.0-14.el7_5.6.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.4.x86_64
sanlock-3.6.0-1.el7.x86_64

Verification scenario: 
1. Run VM with cdrom device.
2. Upgrade CL from 4.1 to 4.2.
3. Verify cdrom device is still available and accessible from VM. 
   Migrate VM.
4. Verify cdrom device is available and accessible from VM.
   Reboot VM.
5. After VM reboot completed, verify cdrom device is available and accessible.
Verify cdrom device is listed in engine db:
engine=# select device_id,device,address,is_managed,is_plugged,_update_date,alias from vm_device WHERE vm_id ='c4608698-f036-4345-b52a-c71c2cb4c00c' AND device ='cdrom';
              device_id               | device |                       address                       | is_managed | is_plugged |         _update_date          |  alias   
--------------------------------------+--------+-----------------------------------------------------+------------+------------+-------------------------------+----------
 be5e4018-4c12-4b4d-81d8-2309ab00cafa | cdrom  | {type=drive, bus=1, controller=0, target=0, unit=0} | t          | t          | 2018-07-10 13:35:44.498705+03 | ide0-1-0
(1 row)

Comment 16 errata-xmlrpc 2018-07-31 17:49:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2318


Note You need to log in before you can comment on or make changes to this bug.