Bug 1674497 - Memory Hot-unplug fails to remove DIMM
Summary: Memory Hot-unplug fails to remove DIMM
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.3.1
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.4.2
: 4.4.2
Assignee: Rolfe Dlugy-Hegwer
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-11 13:29 UTC by Liran Rotenberg
Modified: 2023-08-08 02:37 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Previously, hot-unplugging memory on RHEL 8 guests generated a error because the memory DIMM was in use. This prevented the removal of that memory from that VM. To work around this issue, add `movable_node` by setting the virtual machine's kernel command-line parameters, link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/configuring-kernel-command-line-parameters_system-design-guide#setting-kernel-command-line-parameters_configuring-kernel-command-line-parameters[as described here].
Clone Of:
Environment:
Last Closed: 2020-10-04 16:34:56 UTC
oVirt Team: Docs
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (306.09 KB, application/x-xz)
2019-02-11 13:33 UTC, Liran Rotenberg
no flags Details
RHEL8 qemu guest memory hotplug steps (1.53 KB, text/plain)
2019-02-13 15:21 UTC, Masayoshi Mizuma (Fujitsu)
no flags Details

Description Liran Rotenberg 2019-02-11 13:29:47 UTC
Description of problem:
After creating a VM using ovirt, memory hot-plug 256MB DIMMs * 15 times(total 16 DIMMs). Memory Hot-unplug doesn't succeed.
No error seen on ovirt side, guest's CPU load moving to 100%:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                       
   64 root      20   0       0      0      0 D  98.3   0.0   7106:54 kworker/u32:1+kacpi_hotplug                                                                                                                   
10847 root      20   0       0      0      0 R   1.7   0.0   0:01.13 kworker/0:1-mm_percpu_wq 

Version-Release number of selected component (if applicable):
Guest:
Red Hat Enterprise Linux release 8.0 Beta (Ootpa)
qemu-guest-agent-2.12.0-41.el8+2104+3e32e6f8.x86_64
Host:
Red Hat Enterprise Linux Server release 7.6 (Maipo)
vdsm-4.30.6-1.el7ev.x86_64
libvirt-4.5.0-10.el7_6.4.x86_64
Engine:
Red Hat Enterprise Linux Server release 7.6 (Maipo)
ovirt-engine-4.3.0-0.8.rc2.el7.noarch

How reproducible:
100%, Succeed to reproduce more than once. 

Steps to Reproduce:
1. Create a VM(1GB memory, 16 max memory), RHEL8 image, BIOS.
2. Memory hotplug 15 times 256MB.
3. Start to remove the DIMMs from VM devices.

Actual results:
Not all the DIMMs were removed. On UI side, all the DIMMs stayed.
# free -m
              total        used        free      shared  buff/cache   available
Mem:           4593         355        3978          14         260        4150
Swap:          1023           0        1023

Starting with 1GB memory, plugging 3810MB results in 4864. After un-pluging memory, getting the same memory.

VM's CPU load moved to 100%, kacpi_hotplug process causing it. 

Expected results:
DIMMS removed correctly, VM's memory reduce, ovirt UI reports the VM devices correctly, VM's CPU load won't stuck on 100%.

Comment 1 Liran Rotenberg 2019-02-11 13:33:25 UTC
Created attachment 1529010 [details]
logs

Comment 2 Laszlo Ersek 2019-02-11 15:00:56 UTC
Hi Liran,

the BZ Component field is "edk2", but comment #0 says, "Create a VM(1GB memory, 16 max memory), RHEL8 image, *BIOS*." (emphasis mine).

Also, regarding "dmesg.log" from the attachment ("logs.tar.xz"), it indeed terminates with

[ 1049.684974] memory memory35: Offline failed.

however, the dmesg doesn't indicate UEFI firmware (no EFI memmap, no references to EFI etc). To me it looks like a SeaBIOS VM. Should we correct the BZ Component? Thanks.

Comment 3 Laszlo Ersek 2019-02-11 15:06:26 UTC
In addition, "mem_test.log-20190210" contains no references to "pflash" or "OVMF" -- I'd expect those on the QEMU command line, for setting Component=edk2. Thanks.

Comment 4 Liran Rotenberg 2019-02-11 15:10:11 UTC
(In reply to Laszlo Ersek from comment #2)
> Hi Liran,
> 
> the BZ Component field is "edk2", but comment #0 says, "Create a VM(1GB
> memory, 16 max memory), RHEL8 image, *BIOS*." (emphasis mine).
> 
> Also, regarding "dmesg.log" from the attachment ("logs.tar.xz"), it indeed
> terminates with
> 
> [ 1049.684974] memory memory35: Offline failed.
> 
> however, the dmesg doesn't indicate UEFI firmware (no EFI memmap, no
> references to EFI etc). To me it looks like a SeaBIOS VM. Should we correct
> the BZ Component? Thanks.

Sure, moved to seabios.
Thanks.

Comment 7 Igor Mammedov 2019-02-12 15:25:48 UTC
It's first hot-unplug BZ in RHEL8, as far as I'm aware it is still impossible to unplug memory reliably upstream.
There were some work done in that area (like removing time-outs and continue attempting to remove memory, which could improve likehood of removal and explains CPU load).

RHEL8 is probably the same as RHEL7 in memory hot-unplug area (Baoquan He backported many fixes from upstream into RHEL7)

Long story in relevant BZs from RHEL7 (probably should be cloned to RHEL8)
  Bug 1245892 - hot-unhotplug guest memory fail most of the time because it is in use
  Bug 1258312 When un-hotplug memory failed, libvirt gives user a wrong message

Reassigning bug back to kernel and CCing people involved in fixing them.

Comment 8 Yumei Huang 2019-02-13 01:56:55 UTC
I agree with Igor. And we have bug 1654978 for rhel8. This one might be duplicate.

Comment 9 Baoquan He 2019-02-13 06:04:55 UTC
Hi Igor,

(In reply to Igor Mammedov from comment #7)
> It's first hot-unplug BZ in RHEL8, as far as I'm aware it is still
> impossible to unplug memory reliably upstream.
> There were some work done in that area (like removing time-outs and continue
> attempting to remove memory, which could improve likehood of removal and
> explains CPU load).
> 
> RHEL8 is probably the same as RHEL7 in memory hot-unplug area (Baoquan He
> backported many fixes from upstream into RHEL7)

RHEL8 probably is different than rhel7 on memory hotplug, at least on x86 64, we have got good test results.

In rhel7, since virt team didn't apply necessary udev rule, the memory block may not be onlined as online_movable, that's necessary for memory hotplug in rhel7 because of a memory defect.

So as far as I know, rhel8 has different status as rhel7. As for upstream kernel, it behaves very well on memory hotplug on  x86_64. Seems there's regression issue on ppc platform which is under discussion in upstream.

Thanks
Baoquan

Comment 10 Masayoshi Mizuma (Fujitsu) 2019-02-13 15:21:41 UTC
Created attachment 1534440 [details]
RHEL8 qemu guest memory hotplug steps

Comment 11 Masayoshi Mizuma (Fujitsu) 2019-02-13 15:24:16 UTC
Hello,

Could you share the detailed step to reproduce?

I tried to do memory hotplug on RHEL8 Qemu guest and it worked well.
I attached the steps I ran in Comment 10.

Before memory hot-remove:

              total        used        free      shared  buff/cache   available
Mem:        9048584      248312     8445376        8768      354896     8648988
Swap:       2097148           0     2097148

After memory hot-remove:

              total        used        free      shared  buff/cache   available
Mem:        8000008      246520     7398584        8768      354904     7643260
Swap:       2097148           0     2097148

Thanks,
Masa

Comment 12 Baoquan He 2019-02-20 08:20:35 UTC
Please refer to the comments:

https://bugzilla.redhat.com/show_bug.cgi?id=1654978#c26

So I think this is not a bug, suggest close it as NOTABUG.

Thanks
Baoquan

Comment 13 Baoquan He 2019-03-05 05:28:54 UTC
Per comment as below, I would like to close this bug as NOTABUG.
https://bugzilla.redhat.com/show_bug.cgi?id=1654978#c23

Please reopen it if any concern is raised.

Thanks
Baoquan

Comment 14 Liran Rotenberg 2019-03-26 14:42:58 UTC
I'm re-opening this bug.

I retested again, using RHEL7.6 hosts and RHEL8 guest (kernel-4.18.0-80.el8.x86_64)

I hot-plugged 5 DIMMs, each of 256MB.
When I try to unplug each DIMM I see on the guest VM:

With balloon device on the VM:
"Offlined pages 32768
memory memory38: Offline failed."

Without balloon device on the VM:
"memory memory38: Offline failed."

On each DIMM it's another memory number.

On RHV side, the operation is successful, but on the guest it's clearly not, the DIMM also stays in the VM devices tab. 

This looks like what Igor mentioned in comment #7.

Comment 15 Yumei Huang 2019-03-27 02:18:14 UTC
(In reply to Liran Rotenberg from comment #14)
> I'm re-opening this bug.
> 
> I retested again, using RHEL7.6 hosts and RHEL8 guest
> (kernel-4.18.0-80.el8.x86_64)

Hi Liran, 
did you add 'movable_node' in guest kernel line as suggested by Baoquan in bug1654978? Seems it works to me. Thanks.

> I hot-plugged 5 DIMMs, each of 256MB.
> When I try to unplug each DIMM I see on the guest VM:
> 
> With balloon device on the VM:
> "Offlined pages 32768
> memory memory38: Offline failed."
> 
> Without balloon device on the VM:
> "memory memory38: Offline failed."
> 
> On each DIMM it's another memory number.
> 
> On RHV side, the operation is successful, but on the guest it's clearly not,
> the DIMM also stays in the VM devices tab. 
> 
> This looks like what Igor mentioned in comment #7.

Comment 16 Liran Rotenberg 2019-03-28 13:24:53 UTC
Hi Yumei,
I just tested it with adding 'movable_node' in the guest kernel line.

Solved the issue here.

I'm moving this bug to documentation, if any user wish to use el8 guests.

Thanks!

Comment 21 Rolfe Dlugy-Hegwer 2020-10-04 16:34:56 UTC
Published as known issue in RHV 4.4.2 release notes.


Note You need to log in before you can comment on or make changes to this bug.