Bug 1883483 - Engine HotUnplugs the same memory device multiple times when it's requested to HotUnplug multiple memory devices in given size
Summary: Engine HotUnplugs the same memory device multiple times when it's requested t...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.40.32
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.4.3
: ---
Assignee: Liran Rotenberg
QA Contact: Qin Yuan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-29 11:07 UTC by Qin Yuan
Modified: 2020-11-11 06:42 UTC (History)
4 users (show)

Fixed In Version: vdsm-4.40.33
Doc Type: Bug Fix
Doc Text:
Previously, when trying to unplug a DIMM to the VM the action succeed but, the DIMM wasn't removed on the engine side. This could result in trying to remove the same DIMM resulting in errors. Now, the VDSM will update the engine and the DIMM will be removed as expected.
Clone Of:
Environment:
Last Closed: 2020-11-11 06:42:37 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
engine log (430.52 KB, text/plain)
2020-09-29 11:07 UTC, Qin Yuan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 111510 0 master MERGED virt: fix DIMM removal 2021-01-20 16:32:37 UTC

Description Qin Yuan 2020-09-29 11:07:15 UTC
Created attachment 1717506 [details]
engine log

Description of problem:
When request engine to HotUnplug multiple memory devices in given size, such as 15x256M, it unplugs the same one 256M memory device 15 times, but not all the 15 memory devices. 
Each time engine reports Hotunplug of memory device(with the same id) was successfully requested on VM, memory size updated to desired value, though HotUnplug action actually failed:

2020-09-26 23:08:52,477+03 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-281) [vms_update_de5618c5-1dda-4605] Lock Acquired to object 'EngineLock:{exclusiveLocks='[memory_hotplug_test=VM_NAME]', sharedLocks='[65dad12f-6f02-424b-9e89-5e52da24c6b8=VM]'}'
2020-09-26 23:08:52,505+03 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-281) [vms_update_de5618c5-1dda-4605] Running command: UpdateVmCommand internal: false. Entities affected :  ID: 65dad12f-6f02-424b-9e89-5e52da24c6b8 Type: VMAction group EDIT_VM_PROPERTIES with role type USER
2020-09-26 23:08:52,526+03 INFO  [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-281) [56038c5d] START, HotUnplugMemoryVDSCommand(HostName = host_mixed_3, Params:{hostId='e926be11-3e35-4e8a-9671-6fa7176e8132'}), log id: 1993cc7c
2020-09-26 23:08:52,585+03 INFO  [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-281) [56038c5d] FINISH, HotUnplugMemoryVDSCommand, return: , log id: 1993cc7c
2020-09-26 23:08:52,601+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-281) [56038c5d] EVENT_ID: MEMORY_HOT_UNPLUG_SUCCESSFULLY_REQUESTED(2,046), Hot unplug of memory device (e85017bb-7e82-45b3-8330-6f4d1b06f174) of size 256MB was successfully requested on VM 'memory_hotplug_test'. Physical memory guaranteed updated from 4864MB to 4608MB}.
2020-09-26 23:09:14,075+03 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-281) [vms_update_12296b3c-d744-4401] Lock Acquired to object 'EngineLock:{exclusiveLocks='[memory_hotplug_test=VM_NAME]', sharedLocks='[65dad12f-6f02-424b-9e89-5e52da24c6b8=VM]'}'
2020-09-26 23:09:14,115+03 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-281) [vms_update_12296b3c-d744-4401] Running command: UpdateVmCommand internal: false. Entities affected :  ID: 65dad12f-6f02-424b-9e89-5e52da24c6b8 Type: VMAction group EDIT_VM_PROPERTIES with role type USER
2020-09-26 23:09:14,156+03 INFO  [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-281) [75856d8b] START, HotUnplugMemoryVDSCommand(HostName = host_mixed_3, Params:{hostId='e926be11-3e35-4e8a-9671-6fa7176e8132'}), log id: 6ddd55ee
2020-09-26 23:09:14,168+03 WARN  [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-281) [75856d8b] Unexpected return value: Status [code=85, message=Failed to hotunplug memory: {'vmId': '65dad12f-6f02-424b-9e89-5e52da24c6b8', 'reason': "device not found: model 'dimm' memory device not present in the domain configuration"}]
2020-09-26 23:09:14,168+03 ERROR [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-281) [75856d8b] Failed in 'HotUnplugMemoryVDS' method
2020-09-26 23:09:14,169+03 WARN  [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-281) [75856d8b] Unexpected return value: Status [code=85, message=Failed to hotunplug memory: {'vmId': '65dad12f-6f02-424b-9e89-5e52da24c6b8', 'reason': "device not found: model 'dimm' memory device not present in the domain configuration"}]
2020-09-26 23:09:14,182+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-281) [75856d8b] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_3 command HotUnplugMemoryVDS failed: Failed to hotunplug memory: {'vmId': '65dad12f-6f02-424b-9e89-5e52da24c6b8', 'reason': "device not found: model 'dimm' memory device not present in the domain configuration"}
2020-09-26 23:09:14,182+03 ERROR [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-281) [75856d8b] Command 'HotUnplugMemoryVDSCommand(HostName = host_mixed_3, Params:{hostId='e926be11-3e35-4e8a-9671-6fa7176e8132'})' execution failed: VDSGenericException: VDSErrorException: Failed to HotUnplugMemoryVDS, error = Failed to hotunplug memory: {'vmId': '65dad12f-6f02-424b-9e89-5e52da24c6b8', 'reason': "device not found: model 'dimm' memory device not present in the domain configuration"}, code = 85
2020-09-26 23:09:14,182+03 INFO  [org.ovirt.engine.core.vdsbroker.HotUnplugMemoryVDSCommand] (default task-281) [75856d8b] FINISH, HotUnplugMemoryVDSCommand, return: , log id: 6ddd55ee
2020-09-26 23:09:14,196+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-281) [75856d8b] EVENT_ID: MEMORY_HOT_UNPLUG_SUCCESSFULLY_REQUESTED(2,046), Hot unplug of memory device (e85017bb-7e82-45b3-8330-6f4d1b06f174) of size 256MB was successfully requested on VM 'memory_hotplug_test'. Physical memory guaranteed updated from 4608MB to 4352MB}.


Version-Release number of selected component (if applicable):
ovirt-engine-4.4.3.3-0.19.el8ev.noarch
vdsm-4.40.31-1.el8ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Run VM (build on base of template OS  Linux release 8.2 (Ootpa) ) with:
   Memory Size= 1024 MB; Maximum memory = 16384 MB; Physical Memory Guaranteed = 512 MB.
   kernel parameter "movable_node".
2. Hotplug 15x256M memory devices(update VM 15 times with +256M memory size each time via API)
3. HotUnplug 15x256M memory devices(update VM 15 times with -256M memory size each time via API)


Actual results:
1. After step 3, VM memory size was updated to 1024M on engine side, but `free -m` on VM showed only one 256M memory device was removed.
2. In engine log:
1) the same memory device of size 256MB was unplugged 15 times.
2) only the first unplug action succeeded, the other 14 times failed because the memory device not present in the domain configuration.
3) engine still shows Hot unplug of memory device of size 256MB was successfully requested on VM, memory size reduced 256M, though unplug action actually failed.


Expected results:
1. All memory devices, not only the first one, should be removed on VM.
2. Memory size shouldn't be reduced on engine side when the unplug action actually failed.


Additional info:

Comment 1 Liran Rotenberg 2020-09-29 12:43:40 UTC
Hi Qin,
The error shown above is known. The DIMM removal is to some extent unreliable.
Can you verify UI works? If so, my guts telling me that it is possible when we moved to new libvirt version it might take different time when removing the DIMM.
When you try to execute it and the first DIMM removal didn't finish, it will try to remove it again(API), this can cause the error above.
I would suggest adjusting the waiting time between the DIMM removals according to the logs / UI actions.

Regarding the expected results, we may consider to resolve the memory size on engine side in case of error.
But first, let's try to figure if we have a real problem removing DIMMs or just different time to do so.

Comment 2 Qin Yuan 2020-09-30 09:05:44 UTC
Tried on UI, hot plug one 256M memory device, there is one dimm added on Vm Devices tab, then click HotUnplug button to remove the dimm, the dimm keeps stay on Vm Devices tab though the memory size already got reduced in the guest.

Comment 3 Qin Yuan 2020-10-13 08:28:06 UTC
Verified with:
vdsm-4.40.33-1.el8ev.x86_64
ovirt-engine-4.4.3.6-0.13.el8ev.noarch

Steps:
The same as in description.

Results:
Hot unplug of 15x256M memory devices succeeded. All DIMMs disappeared on UI, `free -m` on the VM showed correct memory size.

engine.log:
2020-10-13 09:34:26,670+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-107) [7cbe6908] EVENT_ID: MEMORY_HOT_UNPLUG_SUCCESSFULLY_REQUESTED(2,046), Hot unplug of memory device (11a38265-8eab-4c83-9527-4d6bacac9c73) of size 256MB was successfully requested on VM 'memory_hotplug_test'. Physical memory guaranteed updated from 4608MB to 4352MB}.
2020-10-13 09:34:51,459+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-108) [6f22df5e] EVENT_ID: MEMORY_HOT_UNPLUG_SUCCESSFULLY_REQUESTED(2,046), Hot unplug of memory device (c949d676-e93b-49ea-99ca-61d2db1e4af2) of size 256MB was successfully requested on VM 'memory_hotplug_test'. Physical memory guaranteed updated from 4352MB to 4096MB}.
2020-10-13 09:35:16,374+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-107) [44ced136] EVENT_ID: MEMORY_HOT_UNPLUG_SUCCESSFULLY_REQUESTED(2,046), Hot unplug of memory device (eed2adb5-bc33-4469-9914-9021ba65e8a4) of size 256MB was successfully requested on VM 'memory_hotplug_test'. Physical memory guaranteed updated from 4096MB to 3840MB}.
...
2020-10-13 09:39:56,778+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-109) [2e0760fd] EVENT_ID: MEMORY_HOT_UNPLUG_SUCCESSFULLY_REQUESTED(2,046), Hot unplug of memory device (4665325d-191f-4207-8871-4f726498c266) of size 128MB was successfully requested on VM 'memory_hotplug_test'. Physical memory guaranteed updated from 1280MB to 1024MB}.
2020-10-13 09:39:56,947+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-109) [187cd83c] EVENT_ID: MEMORY_HOT_UNPLUG_SUCCESSFULLY_REQUESTED(2,046), Hot unplug of memory device (3a534e69-ac5a-4906-8236-5fe8a864a96d) of size 128MB was successfully requested on VM 'memory_hotplug_test'. Physical memory guaranteed updated from 1280MB to 1024MB}.

Comment 4 Sandro Bonazzola 2020-11-11 06:42:37 UTC
This bugzilla is included in oVirt 4.4.3 release, published on November 10th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.3 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.