Description of problem ====================== When there is no free disk space left on /var/lib/carbon partition (eg. because we have lot of archived data from previous unmanage tasks), the import cluster task finishes with success, WA doesn't report any error directly in the web ui or via alerts, but the Grafana dashboards doesn't show any data (as no files can be written into /var/lib/carbon/whisper/tendrl/clusters/<cluster-id> directory). This is an edge case which we can address by combination of: * increased error detection/reporting (eg. monitoring free space on /var/lib/carbon partition and reporting an alert) * description of this case in debugging guide Version-Release number ====================== RHGS WA components on tendrl server machine: tendrl-ansible-1.6.3-4.el7rhgs.noarch tendrl-api-1.6.3-3.el7rhgs.noarch tendrl-api-httpd-1.6.3-3.el7rhgs.noarch tendrl-commons-1.6.3-6.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-4.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-4.el7rhgs.noarch tendrl-node-agent-1.6.3-6.el7rhgs.noarch tendrl-notifier-1.6.3-3.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-3.el7rhgs.noarch Other WA components: grafana-4.3.2-3.el7rhgs.x86_64 python-carbon-0.9.15-2.1.el7rhgs.noarch python-whisper-0.9.15-1.1.el7rhgs.noarch How reproducible ================ 100% Steps to Reproduce ================== 1. Prepare RHGS trusted storage pool 2. Prepare separate partitions for etcd and graphite data on RHGS WA server 3. Install RHGS WA using tendrl-ansible 4. Generate large file in /var/lib/carbon partition so that there are no free space left there 5. Import the cluster Actual results ============== The ImportCluster task finishes with success, and all components of the trusted storage pool are shown in the WA interface. There are no errors reported by WA directly (via ui, task details or alerts). Grafana dashboard shows no data. Only empty directory structure was created in /var/lib/carbon/whisper/tendrl/clusters/<cluster-id> directory, as any attempt to write data there fails: ``` # pwd /var/lib/carbon/whisper/tendrl/clusters/<cluster-id> # find . -type d | wc -l 301 # find . -type f | wc -l 0 ``` Log file of carbon, /var/log/carbon/console.log contains error messages about this: ``` 11/06/2018 08:11:13 :: 'Error creating /var/lib/carbon/whisper/tendrl/clusters/84ffce52-031b-415f-a8a0-c878043dfd89/nodes/mbukatov-usm1-gl5/bricks/|mnt|brick_gama_disperse_2|2/device/vdc/mount_utilization/total.wsp' ``` Expected results ================ This is an edge case, we may consider monitoring disk usage of carbon partition and report the problem via alerts if needed.
Asking doc team: would this case be worth mentioning in debugging/troubleshooting guide for RHGS WA 3.4?
Additional Information ---------------------- When I free some space in /var/lib/carbon, the data starts to gradually appear on the dashboard. I haven't checked in detail how long it takes or whether all data will appear eventually.