Description of problem: I was monitoring a storage node cluster, periodically checking the storage node admin UI for alerts. I was primarily looking at the list view. Eventually I started seeing alerts for a couple nodes due to high disk usage. My first inclination was to look at the disk metrics in the details view for each node. The values reported by the metrics did not indicate any problems. Then I realized that this is likely because are aggregate values for the past 8 hours. I wound up logging onto each machine and inspecting the file systems from the command line. The alerts were 100% correct. The data files for the raw_metrics table were starting to take up a lot of space. This is far from ideal. Users are likely go to the storage node details view when alerts are generated just like I did. While the metrics reported are correct, they are not helpful for dealing with the alerts. I think we need to display current values somewhere to help users deal with the situation. We may want to get some UXD input on this as well. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 828891 [details] screenshot of alerts
Created attachment 828892 [details] screenshot of metrics
I have attached a couple screenshots from my test environment that show the list view when alerts have been generated along with the details view of a node for which several alerts have fired.