Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1870108

Summary: VM devices may get temporarily unplugged on VM boot
Product: [oVirt] vdsm Reporter: Arik <ahadas>
Component: CoreAssignee: Milan Zamazal <mzamazal>
Status: CLOSED CURRENTRELEASE QA Contact: Qin Yuan <qiyuan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: ---CC: bugs, dholler, lrotenbe
Target Milestone: ovirt-4.4.3Flags: ahadas: ovirt-4.4?
ahadas: planning_ack?
ahadas: devel_ack+
ahadas: testing_ack?
Target Release: 4.40.28   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: vdsm-4.40.28 Doc Type: Bug Fix
Doc Text:
When booting a newly created VM, Engine could log errors about some of the VM devices being unplugged and show them as unplugged in the Web UI temporarily, despite they are actually plugged. It has been fixed and it shouldn't happen anymore.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-11 06:41:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
relevant vdsm, libvirt and engine logs
none
vdsm.log none

Description Arik 2020-08-19 11:28:28 UTC
We've seen that VDSM reports first dumpxml without addresses, e.g., for a virtio-scsi controller:

1) On create we send:

<controller type="scsi" model="virtio-scsi" index="0">

   <driver iothread="1"/>

   <alias name="ua-b500575d-f8d1-4b0e-a011-f9215f4802f1"/>

</controller>


(2) Then we get xml report (dumpxml) with:

<controller index="0" model="virtio-scsi" type="scsi">

   <driver iothread="1" />

   <alias name="ua-b500575d-f8d1-4b0e-a011-f9215f4802f1" />

</controller>


(3) Then we get another xml report with:

<controller type='scsi' index='0' model='virtio-scsi'>

   <driver iothread='1'/>

   <alias name='ua-b500575d-f8d1-4b0e-a011-f9215f4802f1'/>

   <address type='pci' domain='0x0000' bus='0x17' slot='0x00' function='0x0'/>

</controller>


And thus, after (2) we see in engine.log:

ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmDevicesMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-9) [] VM 'c5df2234-ae71-4b71-83bd-48210ad0f0ca' managed non pluggable device was removed unexpectedly from libvirt: 'VmDevice:{id='VmDeviceId:{deviceId='b500575d-f8d1-4b0e-a011-f9215f4802f1', vmId='c5df2234-ae71-4b71-83bd-48210ad0f0ca'}', device='virtio-scsi', type='CONTROLLER', specParams='[ioThreadId=1]', address='', managed='true', plugged='false', readOnly='false', deviceAlias='', customProperties='[]', snapshotId='null', logicalName='null', hostDevice='null'}'

Some initial thoughts:
1. The engine doesn't hold the hash of the initial xml it sends so the first time VDSM reports 'stats' with a certain hash, the engine would query the dumpxml.
2. It may be a timing issue as it doesn't happen in all environments (e.g., VDSM sets the xml it gets from the engine and since it takes some time to get the updated xml from libvirt, that's what VDSM reports back and the engine doesn't realize that its the same xml it sent to VDSM)

Comment 1 Arik 2020-08-19 11:31:49 UTC
Dominik, could you please attach VDSM and libvirt debug logs?

Comment 2 Arik 2020-08-19 11:33:53 UTC
More initial thoughts by Milan:

I can think about two possibilities:

- We report the XML obtained from Engine before we update it from
  libvirt.  I don't think this can happen unless Engine can call
  dumpxmls on the VM before it is reported as fully started.

- Timing issue in libvirt or so.

Comment 3 Dominik Holler 2020-08-19 13:14:20 UTC
Created attachment 1711874 [details]
relevant vdsm, libvirt and engine logs

(In reply to Arik from comment #1)
> Dominik, could you please attach VDSM and libvirt debug logs?

Please let me know soon if a logfile is missing.

Comment 4 Milan Zamazal 2020-08-19 13:23:15 UTC
Dominik, I can't see vdsm.log in the attachment.

Comment 5 Dominik Holler 2020-08-19 13:33:41 UTC
Created attachment 1711883 [details]
vdsm.log

(In reply to Milan Zamazal from comment #4)
> Dominik, I can't see vdsm.log in the attachment.

thanks for checking quickly!

Comment 6 Milan Zamazal 2020-08-19 15:35:01 UTC
Thank you, Dominik, for the logs. Engine calls getAllVmStats, followed by dumpxmls, perhaps because it sees an "updated" hash as Arik mentions in Comment 0. If it fits into the window between starting and finishing the VM creation in Vdsm then it gets the XML not yet updated from libvirt. If I insert sleep into Vdsm VM initialization, I obtain an XML without the address too.

Now the question is what's the right way to fix it? I don't think attempting to compute the initial hash on the Engine side would be a good idea. Ignoring the missing addresses would help with that particular problem, but not processing XML without stuff added by libvirt at all would be better. Is there a way to achieve that without modifying both Engine and Vdsm?

Comment 7 Arik 2020-08-19 15:51:16 UTC
Yeah, it might be (unless I'm missing something) that if in that period of time (until we get an updated xml from libvirt) VDSM won't report the 'hash' in the stats, the engine wouldn't trigger dumpxmls calls

Comment 8 Milan Zamazal 2020-08-19 16:39:58 UTC
Good, then it could be an easy fix, I'll check if it works.

Comment 9 Qin Yuan 2020-09-14 01:45:02 UTC
Verified with:
vdsm-4.40.28-1.el8ev.x86_64
ovirt-engine-4.4.3.1-0.7.el8ev.noarch

Steps:
1. Create a new VM and start it
2. Check engine log to see if there is no dumpxml without addresses and no device unplugged message.

Results:
1. During VM starting process, there is no dumpxml without addresses, no device unplugged message.

Comment 10 Sandro Bonazzola 2020-11-11 06:41:38 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.