Not sure there is anything to do here until it gets reproduced.
Travis, please see comment #7 from Patrick about making Rook transiently increase MDS memory
The resources limits on mds are currently set by the ocs operator as seen here [1] "mds": { Requests: corev1.ResourceList{ corev1.ResourceCPU: resource.MustParse("3"), corev1.ResourceMemory: resource.MustParse("8Gi"), }, Limits: corev1.ResourceList{ corev1.ResourceCPU: resource.MustParse("3"), corev1.ResourceMemory: resource.MustParse("8Gi"), }, }, K8s does not allow the limits to be changed at pod runtime. If the requests or limits are changed, the pod will be restarted. But if MDS really needs to burst to 12Gi sometimes, seems like we should leave the "requests" at 8Gi and increase the "limits" to 12Gi so the mds won't be killed prematurely. So please move to the ocs operator component for this change if there is not another way to constraint the mds memory. [1] https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/defaults/resources.go#L32-L41
Please see comment 12
This would have other implications on the QoS Class of the MDS Pod, having different requests and limits bumps it down a level and has fewer guarantees of survivability when node resources become constrained. At this point this is basically an RFE, and given where we are in the schedule I'm pushing this to ODF 4.10.