The control plane needs to have alerts on system usage, full stop. There has been no shortage of outages, fire drills and system degradations that are caused by the master VM instances running low on resources or system components (kubelet, cri-o) being starved of resources. Regardless of the cause of these issues, alerts for high steady-state usage must exist, or administrators are never going to know that something needs to be done. See the post-mortem here for more details: https://docs.google.com/document/d/1VfwmECbpCnDTOb0JVE37wcEQm4KnGwbatgIynTa6Wvg/edit#
I think that monitoring of a state of a node and alerts based on that should be handled by the node team.
I think we need to rename the Memory manager component as it deals with hugepages. This goes to the kubelet subcomponent I believe.