Bug 2065838
| Summary: | [RFE] Need to auto increase MDS memory limit when MDS is reporting oversized cache | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Mike Hackett <mhackett> |
| Component: | ocs-operator | Assignee: | Mudit Agarwal <muagarwa> |
| Status: | NEW --- | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.8 | CC: | bkunal, etamir, mmuench, muagarwa, odf-bz-bot, shan, sostapov, tnielsen, vumrao |
| Target Milestone: | --- | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Mike Hackett
2022-03-18 20:25:31 UTC
As discussed with the Rook team, this is not something the Rook operator will do (dynamically change the pod memory resources). Today the memory resources are built and passed by ocs-operator. This component would likely be responsible for editing those resources to increase/decrease the memory available in the MDS pod. We probably need to react to an alert coming from Prometheus, which will result in adapting the memory resources of that pod. Moving to ocs-operator. Adjusting the resource limits dynamically may not be desirable since it will require an update to the mds pod spec, which will restart the mds. Instead of dynamically updating the limits, we should consider: 1. The limits can be overridden in the StorageCluster CR with the "mds" key when the workload requires it 2. Set higher limits instead of using the same limits as requests See the resource requests/limits currently set to 8Gi here: https://github.com/red-hat-storage/ocs-operator/blob/e871f8953e3a32bc82b27a174ae6fe7f85a22d3e/controllers/defaults/resources.go#L32-L41 For now, the first option at least help those clusters with higher mds loads. Since this is not a blocker, moving out to ODF 4.12. WE'll definitely need some prioritization on this, as we don't have the bandwidth to just take in something like this without accoomodating the schedule. |