Bug 2002545
| Summary: | [GSS][RFE] Adjust memory limits due to mds pods getting oom killed during pod start (mds replay) | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | MAYANK PANDEY <mpandey> |
| Component: | ocs-operator | Assignee: | Jose A. Rivera <jrivera> |
| Status: | CLOSED WONTFIX | QA Contact: | Elad <ebenahar> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.6 | CC: | ajuarez, assingh, bniver, hnallurv, jrivera, madam, muagarwa, nravinas, ocs-bugs, odf-bz-bot, pdonnell, sostapov |
| Target Milestone: | --- | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-05-30 10:50:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 4
Scott Ostapovicz
2021-09-09 18:39:23 UTC
Travis, please see comment #7 from Patrick about making Rook transiently increase MDS memory The resources limits on mds are currently set by the ocs operator as seen here [1]
"mds": {
Requests: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("3"),
corev1.ResourceMemory: resource.MustParse("8Gi"),
},
Limits: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("3"),
corev1.ResourceMemory: resource.MustParse("8Gi"),
},
},
K8s does not allow the limits to be changed at pod runtime. If the requests or limits are changed, the pod will be restarted. But if MDS really needs to burst to 12Gi sometimes, seems like we should leave the "requests" at 8Gi and increase the "limits" to 12Gi so the mds won't be killed prematurely. So please move to the ocs operator component for this change if there is not another way to constraint the mds memory.
[1] https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/defaults/resources.go#L32-L41
Please see comment 12 This would have other implications on the QoS Class of the MDS Pod, having different requests and limits bumps it down a level and has fewer guarantees of survivability when node resources become constrained. At this point this is basically an RFE, and given where we are in the schedule I'm pushing this to ODF 4.10. |