Bug 979257 - [Hyper-V][RHEL6.5][RFE]in-kernel online support for memory hot-add [NEEDINFO]
[Hyper-V][RHEL6.5][RFE]in-kernel online support for memory hot-add
Status: CLOSED DEFERRED
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.5
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Vitaly Kuznetsov
Virtualization Bugs
: FutureFeature
Depends On:
Blocks: 923342 927417 1091679 1091680
  Show dependency treegraph
 
Reported: 2013-06-28 01:23 EDT by jason wang
Modified: 2018-07-02 10:56 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 1038941 1091679 1091680 (view as bug list)
Environment:
Last Closed: 2015-12-01 11:21:33 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
kraxel: needinfo? (kay)


Attachments (Terms of Use)
Patch to support in-kernel onlining (2.76 KB, application/mbox)
2013-09-12 15:27 EDT, K. Y. Srinivasan
no flags Details
patch to online memory in the balloon driver (2.15 KB, application/mbox)
2013-09-12 15:28 EDT, K. Y. Srinivasan
no flags Details

  None (edit)
Description jason wang 2013-06-28 01:23:41 EDT
Description of problem:

Hyperv balloon driver waits for the memory to be online after doing hot-add. This behavior needs udev rules to set the memory online like:

SUBSYSTEM=="memory", ACTION=="add", ATTR{state}="online"

So let's add this rules.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Kay Sievers 2013-07-02 05:58:39 EDT
What is the reason that the kernel does not add these "devices"? The kernel
has such defaults for all other devices, what makes memory so special here?

Such unconditional defaults almost never belong into userspace tools. We usually
avoid to ship such defaults and require to kernel to carry that out, if such a
default is needed.
Comment 2 jason wang 2013-07-02 22:57:11 EDT
(In reply to Kay Sievers from comment #1)
> What is the reason that the kernel does not add these "devices"? The kernel
> has such defaults for all other devices, what makes memory so special here?
> 
> Such unconditional defaults almost never belong into userspace tools. We
> usually
> avoid to ship such defaults and require to kernel to carry that out, if such
> a
> default is needed.

CC K.Y. for the answer.
Comment 3 K. Y. Srinivasan 2013-07-03 08:53:22 EDT
Kay,

As far as I know, the hot added memory blocks need to be brought online and the preferred way is to write the appropriate sysfs variable from user land and hence the udev rules. I would prefer not to involve the user land code in getting the memory regions "online". Let me know how I can accomplish this.
Comment 4 jason wang 2013-07-05 04:35:34 EDT
K.Y.:

Is it possible to online the page automatically by:

- export memory_block_change_state()
- schedule a workqueue to online the memory explicitly by calling memory_block_change_state()?

Or let's just remove the final waiting for the online in hv_balloon?

Thanks
Comment 5 K. Y. Srinivasan 2013-07-05 10:09:07 EDT
If we can do the onlining of the memory synchronously that would be ideal. We would not need the delay in that case. The hot add operation is already handled in a work context and so we can do this without scheduling another work for.

When I did this work initially upstream, I got the sense that mm maintainers did not want to export the API that would allow for in-kernel onlining of memory. This is obviously the ideal solution. If you can, please submit the patch.
Comment 6 jason wang 2013-07-08 03:21:44 EDT
Kay, 

According to comment #5. Upstream does not want to provides the mechanism that doing the online synchronously. (Even if we want to do some changes, it's not a short-term task). The udev involvement were also documented in kernel Documentation/memory-hotplug.txt. Maybe we can reconsider this?
Comment 7 Kay Sievers 2013-07-15 05:36:15 EDT
Upstream udev does not want to provide unconditional, always enabled kernel
policy in udev. :)

It is just the wrong way to do things that way. Defaults should be set
where they are coming from, not in tools running later switching
them around. Udev is just not meant to carry out unconditional defaults.
If udev rules always enable something in the kernel, it can be enabled right
away in the kernel itself.
Comment 8 K. Y. Srinivasan 2013-07-15 10:20:47 EDT
Jason,

I am ok with getting rid of the user level (udev) involvement here. I am fine with what you had suggested earlier (onlining the page synchronously after hot adding the page). Could you send such a patch out (on top of the patches that are currently circulating on the balloon driver).
Comment 9 jason wang 2013-07-16 03:05:52 EDT
(In reply to K. Y. Srinivasan from comment #8)
> Jason,
> 
> I am ok with getting rid of the user level (udev) involvement here. I am
> fine with what you had suggested earlier (onlining the page synchronously
> after hot adding the page). Could you send such a patch out (on top of the
> patches that are currently circulating on the balloon driver).

K.Y:

I wish I can help, but I'm not an expert on mm. I believe you are more suitable than me to do this since most of the driver were wrote by you.

Thanks
Comment 10 K. Y. Srinivasan 2013-07-19 10:07:25 EDT
Ok; I will put these patches together. However, I am not sure if the upstream guys will accept it or not. So, I would recommend having the udev rules in RHEL 6.5 as our default position. Please pick up the other fixes I have submitted for the balloon driver.
Comment 13 jason wang 2013-08-05 23:02:05 EDT
K.Y:

Since we don't need the check of memory onling in balloon driver. And consider the in-kernel memory online may still need some time to finish. I wonder how about just document this behaviour and notice that it's the user/admin's responsibility to do the memory online manually or let them to add a default udev rules if they wish. (Looks like this is just what xen currently does).

Thanks
Comment 14 K. Y. Srinivasan 2013-08-06 09:43:28 EDT
No; that would not work. I dropped the check for onlining in the balloon driver only because we expect onlining to occur albeit after some delay. We cannot have admin do the onlining as is done in Xen. The reason for this is that Xen does not have a DM policy engine running on the host as is the case on Windows. Since the host expects that hot-added memory is available in the guest, it will include the hot-added memory in its policy decisions for memory management. So, if we just require human intervention for onlining, the scheme will not work. Minimally we need the udev rules if we cannot get the in-kernel onlining patch accepted.
Comment 15 jason wang 2013-08-06 22:46:41 EDT
Oh get it. 

Kay, I believe you've notice the upstream discussion of in-kernel memory online, looks like it could not be achieved for a short of period (lack of infrastructure support or need redesign). And As K.Y. states in comment #14, we could not let the admin to do this instead. We'd better re-consider the udev rules for this? What's your opinion?

Thanks
Comment 16 jason wang 2013-09-11 02:59:00 EDT
Kay, any more thought on this?

Thanks
Comment 17 Harald Hoyer 2013-09-11 07:28:04 EDT
(In reply to K. Y. Srinivasan from comment #14)
> No; that would not work. I dropped the check for onlining in the balloon
> driver only because we expect onlining to occur albeit after some delay.

err... and if udev if faster than your expected "delay"??
Comment 18 Kay Sievers 2013-09-11 08:01:12 EDT
Userspace is not the right tool to solve kernel-internal issues.

Please contain that change in the kernel and do not rely userspace-delay-loops
to work around kernel driver behavior.

Udev is not meant to carry-out global kernel policy. This rule cannot be carried
in the upstream package, because it is the wrong way of doing things. The kernel
sends an event for a device and expects userspace to *always* change a flag on
the very *same* device? That's just not how things should work.

Such a change should not also not be done in RHEL6's udev, because we should
not make such global unconditional, always-on changes to the behavior in a
base package.

Please change the kernel instead, and also do the QA for this change for the
kernel, instead of relying userspace to carry it out; just because it seems
simpler to do it that way, does not make it correct to it there.

Doing this from inside the kernel will also allow to limit the change to drivers
which actually expect that behavior, while doing it unconditionally for all
setups from userspace might cause unforeseen side-effects.
Comment 19 K. Y. Srinivasan 2013-09-11 13:25:34 EDT
Jason,

I had submitted a patch to the kernel that would bring memory online without involving the userspace. As you know the mm folks were not in favor of accepting it. Kay was involved in this discussion as well. We have that as the fallback position. For what it is worth, this patch improves the user experience significantly. The system is really responsive even under heavy memory load. If you don't have that patch; I can resubmit it to RedHat.
Comment 20 jason wang 2013-09-12 01:30:03 EDT
(In reply to K. Y. Srinivasan from comment #19)
> Jason,
> 
> I had submitted a patch to the kernel that would bring memory online without
> involving the userspace. As you know the mm folks were not in favor of
> accepting it. Kay was involved in this discussion as well. We have that as
> the fallback position. For what it is worth, this patch improves the user
> experience significantly. The system is really responsive even under heavy
> memory load. If you don't have that patch; I can resubmit it to RedHat.

Sure, please resubmit. Generally, if the patch was not accepted upstream and not RHEL specific, it may get very little possibility to be merged into RHEL. But it the patch does very small change, I can try to submit.

Thanks
Comment 21 K. Y. Srinivasan 2013-09-12 15:27:34 EDT
Created attachment 796988 [details]
Patch to support in-kernel onlining

The current machinery for hot-adding memory requires having user
level to bring the memory segments online. Export the necessary functionality
to bring the memory segment online without involving user space code.
Comment 22 K. Y. Srinivasan 2013-09-12 15:28:46 EDT
Created attachment 796989 [details]
patch to online memory in the balloon driver

Leverage the newly exported functionality to bring memory online
without involving user level code.
Comment 24 jason wang 2013-09-30 00:53:22 EDT
K.Y:

Some of our engineer does not like this since it was not upstream. Is there any chance that the patch can be accepted in the near future? If it could, since we still have time for 6.6, let's make it for 6.6 then.
Comment 25 K. Y. Srinivasan 2013-10-09 11:11:11 EDT
Jason,

I don't know if (much less when) this patch will be accepted. The current maintainers of the memory hot add infrastructure seem to think what they have now is the way it should be. Given, how important this feature is to our customers, I suggest we package the udev rules for hot-adding memory as has been done by all other Distros today.
Comment 27 Gerd Hoffmann 2014-03-03 06:24:16 EST
(In reply to K. Y. Srinivasan from comment #19)
> Jason,
> 
> I had submitted a patch to the kernel that would bring memory online without
> involving the userspace. As you know the mm folks were not in favor of
> accepting it. Kay was involved in this discussion as well.

Do you have a pointer to this discussion?  Google finds me only this ...
http://comments.gmane.org/gmane.linux.kernel.mm/103912
... which doesn't look like the discussion you are referring to.
Comment 28 K. Y. Srinivasan 2014-03-03 18:44:38 EST
Gerd,

I forwarded you some emails on this thread that capture the gist of where we are - Kay does not want udev rules to control these kind of events and Dave and others do not want the in-kernel mechanism for onlining. This is just a mechanism and can be used at will - hyper-V balloon driver would use it.

In some previous email with RredHat it appeared that you were ok with udev rules and only wanted it to be Hyper-V specific. Has Kay signed off on that. If that is the case it should be easy enough to structure such a rule: We could invoke a Hyper-V specific onlining script that gets invoked from the standard udev rule and this hyper-v specific udev rule would check to see if the hyperv balloon driver is loaded before onlining the memory. I am sure there are other ways we could structure this as well.

If Kay is ok with this, we can work on such a script.
Comment 29 Gerd Hoffmann 2014-03-04 05:14:25 EST
> I forwarded you some emails on this thread that capture the gist of where we
> are 

Thanks, I'll have a look.

> - Kay does not want udev rules to control these kind of events and Dave
> and others do not want the in-kernel mechanism for onlining. This is just a
> mechanism and can be used at will - hyper-V balloon driver would use it.

I agree with Kay here.  It's pretty pointless to do the round-trip to userspace if userspace simply onlines memory unconditionally.  It makes things complicated for no good reason.

> In some previous email with RedHat it appeared that you were ok with udev
> rules and only wanted it to be Hyper-V specific.

Tested dynamic memory meanwhile.  Problem is there is no easy way for udev to see that the memory was hotplugged by the hyperv-balloon driver and online it only in that case.  The memory nodes in sysfs have neither some name/id attribute nor a symlink to the balloon driver.

I think we should continue to push for the in-kernel solution.  I think the udev rule should only used as temporary stopgap, and of course it need to be limited to hyperv.  A somewhat crude way to do it would be this:

SUBSYSTEM=="memory", TEST=="/sys/bus/vmbus/drivers/hv_balloon", ACTION=="add", ATTR{state}="online"

That tests for the hyperv balloon driver to be loaded.  Not exactly the same as whenever the memory has been hotplugged by hyperv balloon driver.  But as the driver is autoloaded in hyperv guests only and there is only this one way to hotplug memory in hyperv it should be close enough in practice.

> Has Kay signed off on that.

Kay?
Comment 30 K. Y. Srinivasan 2014-03-04 05:22:54 EST
As far as the hyper-V specific udev rule, I was thinking on similar lines (to what you have) here. We will wait for Kay's input.
Comment 31 Gerd Hoffmann 2014-03-04 08:19:30 EST
> I forwarded you some emails on this thread that capture the gist of where we
> are

So it boils down to these requests from Dave:

<quote>
1. Come up with an interface that specifies a default policy for
   newly-added memory sections.  Currently, added memory gets "added",
   but not "onlined", and the default should stay doing that.
2. Make sure that we at least WARN_ONCE() if someone tries to online an
   already-kernel-onlined memory section.  That way, if someone trips
   over this new policy, we have a _chance_ of explaining to them what
   is going on.
3. Think about what we do in the failure case where we are able to
   "add", but fail to "online" in the kernel.  Do we tear the
   newly-added structures down and back out of the operation, or do
   we leave the memory added, but offline (what happens in the normal
   case now)?
</quote>

There is also the problem that you must allocate memory (so you can manage the new pages) before the new memory becomes available, which might be a problem under memory pressure.

On (1):  I think online_memory_block() is fine.  I suggest to submit it again.

On (2):  Adding a patch which does WARN_ON on a double online is easily done ;)

On (3):  Error handling is a valid concern.  I think this belongs into the hotplug driver, not in the kernel memory hotplug core.  Different drivers probably want do different things here.  For hyper-v (and other hypervisors) it makes sense to remove_memory() and signal the failure back to the hypervisor so it can unmap the memory as the guest isn't going to use it anyway.  When it comes to physical dimms: those are not going to go away just because onlining them failed, so the hotplug driver probably should not remove_memory() then.  So, I'd suggest to fix patch #2 to catch+handle online_memory_block() failures + resubmit upstream.

On the memory allocation issue:  I don't think that'll be a big issue in practice as (a) the balloon driver monitors memory usage and signals the need for additional memory early enough to the hypervisor and (b) memory is hotplugged in small (128 MB) chunks, so the amount of memory needed to manage the new memory isn't *that* big.  I suggest to explain that in the commit message.

Feel free to Cc: me when submitting the patches so I can help in the discussions.
Comment 32 K. Y. Srinivasan 2014-03-04 19:47:51 EST
Thanks Gerd; I will resubmit them with some changes in a while. Are we now settled on having the udev rule as an intermediate solution?
Comment 33 Gerd Hoffmann 2014-03-05 04:55:13 EST
(In reply to K. Y. Srinivasan from comment #32)
> Thanks Gerd; I will resubmit them with some changes in a while. Are we now
> settled on having the udev rule as an intermediate solution?

I think it's ok.  No answer from Kay yet.

It's not urgent though, RHEL 6.6 is still in early development phase so we have a fair chance to get the in-kernel bits accepted upstream in time.
Comment 34 Luiz Capitulino 2014-04-04 14:14:31 EDT
My understanding is that this is still pending an upstream solution, setting conditional NACK.
Comment 36 Ademar Reis 2014-05-21 12:14:22 EDT
(In reply to Luiz Capitulino from comment #34)
> My understanding is that this is still pending an upstream solution, setting
> conditional NACK.

Condnack was cleared because of the devel_ack+. Fixed.
Comment 39 Andrew Cook 2015-10-27 22:33:37 EDT
Has there been any movement on this bug?
Just ran across it on a fedora install, manually added the udev entry
Comment 40 Vitaly Kuznetsov 2015-12-01 11:21:33 EST
This is still a missing feature upstream and it won't probably make it to RHEL6 ever. Closing the bug.
Comment 41 Andrew Cook 2015-12-01 17:26:48 EST
It's still present in Fedora 23 but no bug seems to be open against it
Comment 42 Vitaly Kuznetsov 2015-12-02 07:33:18 EST
This is a missing kernel feature and I'm not sure that creating a Fedora bug will help but I think it's possible to add a udev rule (the same we actually have in RHEL) as a workaround:

/lib/udev/rules.d/40-redhat.rules

# CPU hotadd request
SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}="1"

# Memory hotadd request
SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

Note You need to log in before you can comment on or make changes to this bug.