Description of problem: Presently, growing a deployed ODF environment requires manually modifying the StorageCluster definition to increment the 'count' parameter of the default storageDeviceSet (assuming the appropriate provisioner is available to create new volumes to create the new OSDs). The MVP for the desired capability would allow a threshold to be defined that once crossed would automatically instantiate a new set of OSDs (by incrementing the count above). - To account for transient spikes, we would want that threshold to be exceeded for a minimum period of time (say 2 or 4 hours but optimally user definable in minutes) - A user-definable scale-factor integer may be desired to control the increment used for the count increase - Optionally, a user-defined value of the applicable storageDeviceSet index to be scaled (default to: 0) to accommodate cases where the deployment has multiple storageDeviceSets defined - In an environment where dynamically provisioned Machines are used for ODF worker nodes, the MachineSet Auto-Scaler should be configured to create new nodes once the current node selection is at capacity and cannot house any more OSDs - To ensure a cluster doesn't get into a uncontrolled/runaway state and scale in an uncontrolled manner given effectively unlimited resources (like could be available on a cloud platform), a user definable limit should be set for the maximum value permitted in count effectively bounding how large the cluster can grow automatically - A value tracking the current state of this automated capacity scaling feature should be created to ensure additional scaling operations are not attempted while one is in progress or is awaiting resources. - Optionally, a cool-down timer could be implemented to ensure another scaling operation is not attempted within a user-defined period of time of the previous scaling operation (e.g. 12+ hours) - Optionally, a way to temporarily suspend this auto-scale capability would be beneficial for migrations and other known transient events (planned AZ outage, upgrade, etc.) - A high severity alert should be generated if the cluster cannot perform this scaling operation (e.g. if the scale limit has been reached, cooldown has not been reached, MachineSet auto-scaling is not functioning or has hit its limit, the StorageCluster cannot be modified, or the new OSDs cannot be provisioned for any other reason) Version of all relevant components (if applicable): - Optimally, 4.8+ EUS (The customer is running 4.6 EUS today, but it is desired this feature is available in the following EUS release that will be deployed next) Does this issue impact your ability to continue to work with the product - This customer will be deploying a large number of ODF environments and does not want to have to individually manage storage-scaling for each environment - Products that compete with ODF do offer this capability, and this is likely to be a common RFP requirement for customers utilizing that procurement process, especially if seeded by our competitors that know this is a capability we do not have. Is there any workaround available to the best of your knowledge? - No, it would likely require a custom operator be developed to monitor the state of the environment through Prometheus variables, and then perform the StorageCluster modifications accordingly. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? N/A Can this issue reproduce from the UI? N/A