Bug 1496395 - [Memory hot unplug] After commit snapshot with memory hot unplug failed since device not found
Summary: [Memory hot unplug] After commit snapshot with memory hot unplug failed since...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
high vote
Target Milestone: ovirt-4.3.0
: 4.3.0
Assignee: Milan Zamazal
QA Contact: Pedut
Rolfe Dlugy-Hegwer
URL:
Whiteboard:
Depends On: 1645022
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-27 09:42 UTC by Israel Pinto
Modified: 2019-02-13 07:44 UTC (History)
8 users (show)

Fixed In Version: ovirt-engine-4.3.0_alpha
Doc Type: Bug Fix
Doc Text:
Previously, memory hot unplug did not work in virtual machines started from snapshots. This has been fixed in the current release: Memory hot unplug works in virtual machines started from snapshots.
Clone Of:
Environment:
Last Closed: 2019-02-13 07:44:54 UTC
oVirt Team: Virt
rule-engine: ovirt-4.3+
mtessun: planning_ack+
rule-engine: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine_log (400.28 KB, application/x-xz)
2017-09-27 09:42 UTC, Israel Pinto
no flags Details
screenshot (296.98 KB, image/png)
2017-09-27 09:44 UTC, Israel Pinto
no flags Details
vdsm (236.33 KB, application/x-xz)
2017-09-27 09:54 UTC, Israel Pinto
no flags Details
dumpxml (13.58 KB, text/plain)
2017-09-27 09:55 UTC, Israel Pinto
no flags Details
updated logs (564.59 KB, application/x-xz)
2018-10-31 11:10 UTC, Pedut
no flags Details
updated logs (674.98 KB, application/x-xz)
2018-11-01 11:41 UTC, Pedut
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 95037 0 master MERGED core: Add user aliases to memory devices 2018-10-25 11:49:46 UTC

Description Israel Pinto 2017-09-27 09:42:36 UTC
Created attachment 1331352 [details]
engine_log

Description of problem:
Failed to hot unplug memory device on VM with commit snapshot.


Version-Release number of selected component (if applicable):
Software version:4.2.0-0.0.master.20170917124606.gita804ef7.el7.centos


Steps to Reproduce:
1. Create VM with OS and run it
2. Hotplug memory to VM
3. Create snapshot with memory
4. Stop VM
5. Preview Snapshot 
6. Commit snapshot
7. Run VM
8. Check that the memory device exists under VM device tab
9. Hot unplug memory
 
Actual results:
General exception failed to hot unplug memory 

Expected results:
Hot unplug memory will be succeed  


Additional info:
engine log:
2017-09-27 12:05:54,612+03 ERROR [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-5) [5e2b4378-d242-4b02-b926-f616b843be54] Failed in 'HotUnplugMemoryVDS' method
2017-09-27 12:05:54,625+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [5e2b4378-d242-4b02-b926-f616b843be54] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command HotUnplugMemoryVDS failed: General Exception: ('Device instance for device identified by alias dimm1 and type memory not found',)
2017-09-27 12:05:54,626+03 ERROR [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-5) [5e2b4378-d242-4b02-b926-f616b843be54] Command 'HotUnplugMemoryVDSCommand(HostName = host_mixed_1, Params:{hostId='263620b9-1567-4e16-984d-2acc45487c50'})' execution failed: VDSGenericException: VDSErrorException: Failed to HotUnplugMemoryVDS, error = General Exception: ('Device instance for device identified by alias dimm1 and type memory not found',), code = 100
2017-09-27 12:05:54,626+03 INFO  [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-5) [5e2b4378-d242-4b02-b926-f616b843be54] FINISH, HotUnplugMemoryVDSCommand, log id: 2c1b8769
2017-09-27 12:05:54,644+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [5e2b4378-d242-4b02-b926-f616b843be54] EVENT_ID: MEMORY_HOT_UNPLUG_FAILED(2,047), Failed to hot unplug memory device (b1a5b471-eaa9-44ce-9e94-f534eda40815) of size 896 out of VM 'Test_memory_hot_plug_unplug': General Exception: ('Device instance for device identified by alias dimm1 and type memory not found',)

Comment 1 Israel Pinto 2017-09-27 09:44:30 UTC
Created attachment 1331353 [details]
screenshot

Comment 2 Israel Pinto 2017-09-27 09:54:22 UTC
Created attachment 1331356 [details]
vdsm

Comment 3 Israel Pinto 2017-09-27 09:55:50 UTC
Created attachment 1331359 [details]
dumpxml

Comment 4 Michal Skrivanek 2017-09-27 16:24:35 UTC
the xml dump is taken between which steps in your steps to reproduce?

Comment 5 Israel Pinto 2017-09-28 07:02:05 UTC
No, it was taken in the end.

Comment 6 Tomas Jelinek 2017-10-03 13:30:00 UTC
So, the reason it happens is this:
- when the snapshot with memory is taken, libvirt stores its own OVF with VM configuration
- engine stores its own in DB
- than, when the VM is started again, engine builds a libvirtxml from the OVF stored in the DB and sends it to the VDSM
- this libvirtxml does not contain the memory devices, because of this code in LibvirtVmXmlBuilder.writeDevices():
....
case WATCHDOG:
writeWatchdog(device);
break;
case MEMORY:
// memory devices are only used for hot-plug
break;
case VIDEO:
writeVideo(device);
break;
....
so, the memory devices are skipped. 
- VDSM than builds the VM representation from this XML (without the memory devices)
- libvirt builds the VM from it's own OVF (with the memory devices)

So, it looks like the only missing part here is that the writeDevices() needs to write also the memory devices.

@Arik: any thoughts? Do you think it should be done, or are there any risks doing it?

Comment 7 Michal Skrivanek 2017-12-06 13:19:11 UTC
memory devices should be restored correctly on libvirt side, it seems it should be enough if vdsm reports them correctly, they will re-appear in engine, and you can unplug. Milan, please check the vdsm side to use libvirt xml and not anything in vmconf

Comment 8 Milan Zamazal 2017-12-21 20:59:33 UTC
It seems the problem is that _srcDomXML doesn't contain device aliases. Memory hotunplug identifies the DIMM by its alias, which is missing when devices are initialized, so it can't be found. After Vdsm is restarted and initializes devices from libvirt rather than _srcDomXML, hotunplug works.

Perhaps a followup device update from libvirt is missing when restoring from snapshot. I'm not sure though whether the assigned aliases are the same between different runs (before and after snapshot).

Comment 9 Milan Zamazal 2018-09-14 15:39:55 UTC
The problem still exists: The memory hotplug XML doesn't contain any alias, the alias is added to the domain XML by libvirt, and _srcDomXML doesn't contain the alias.

Comment 10 Milan Zamazal 2018-10-18 11:30:50 UTC
The cause of the problem is that memory devices don't have user aliases. libvirt assigned aliases are removed from migratable domain XML returned by libvirt. That means memory devices can no longer be identified in snapshot's _srcDomXML. Live migration doesn't suffer from this problem.

Storage and network devices have user aliases, lease devices don't have any aliases at all, so all the hotunpluggable devices other than memory should be fine. There are other devices that don't have user aliases and lose their aliases in snapshots. I'm not sure whether that is a problem or not.

As for remedy, there are several options (in the order of preference):

1. Find out why live migrations are OK and to check whether there is some avoidable difference regarding device handling in Vdsm between file and host migrations.
2. Provide user aliases for memory devices from Engine.
3. Provide user aliases for memory devices in Vdsm.
4. Identify memory devices by something else than aliases.

Comment 11 Milan Zamazal 2018-10-25 11:58:44 UTC
Solution 2. has been implemented and merged.

Comment 12 Pedut 2018-10-31 11:09:37 UTC
After hot unplug memory the VM memory remains the same(the memory devices that exist under VM devices tab remains the same)

Comment 13 Pedut 2018-10-31 11:10:30 UTC
Created attachment 1499339 [details]
updated logs

Comment 14 Milan Zamazal 2018-10-31 14:30:35 UTC
Pedut, I can't see any hot unplug action in the provided logs. So either the logs are wrong or it's a completely different problem and hot unplug wasn't initiated at all. Did you try to remove all the hot plugged memory devices by pressing the hot unplug buttons next to them?

Comment 15 Pedut 2018-10-31 15:40:49 UTC
Milan you right, I did uploaded the wrong logs.

Comment 16 Milan Zamazal 2018-11-01 10:40:47 UTC
Pedut, could you please upload the right logs?

Comment 17 Pedut 2018-11-01 11:41:17 UTC
Created attachment 1499940 [details]
updated logs

Comment 18 Milan Zamazal 2018-11-01 15:39:04 UTC
The new failure is different from the original one, now the following error is reported: unplug of device was rejected by the guest. It may be a guest OS not properly set up for memory hot unplug.

Pedut, does memory hot unplug work for the same VM without making a snapshot? Even after making the guest OS busy with I/O such as with 

  find / -xdev -type f -exec cat {} \; >/dev/null

And what version of guest OS do you run?

Comment 19 Pedut 2018-11-04 11:39:20 UTC
It doesn't work even without making a snapshot I opened a bug related to this.
The version of the guest OS is Red Hat Enterprise Linux Server 7.6 (Maipo).

Comment 20 Milan Zamazal 2018-11-05 08:21:29 UTC
According to information provided by Pedut the "device not found" problem is no longer present and that memory hot unplug doesn't work in her testing problem is unrelated to snapshots, so moving the bug back to modified.

Comment 21 Pedut 2018-11-13 08:16:17 UTC
Verified on 4.2.7.3-0.0.master.20181015151121.gitd6e9af9.el7 according to the described steps.

Comment 22 Sandro Bonazzola 2019-02-13 07:44:54 UTC
This bugzilla is included in oVirt 4.3.0 release, published on February 4th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.