Bug 1589801

Summary: no error reported by WA ui when importing cluster without free disk space on /var/lib/carbon partition
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Martin Bukatovic <mbukatov>
Component: web-admin-tendrl-node-agentAssignee: gowtham <gshanmug>
Status: CLOSED WONTFIX QA Contact: sds-qe-bugs
Severity: low Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: asriram, nthomas, rhs-bugs, sankarshan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-08 19:43:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Bukatovic 2018-06-11 12:16:11 UTC
Description of problem
======================

When there is no free disk space left on /var/lib/carbon partition (eg. because
we have lot of archived data from previous unmanage tasks), the import
cluster task finishes with success, WA doesn't report any error directly
in the web ui or via alerts, but the Grafana dashboards doesn't show any data
(as no files can be written into /var/lib/carbon/whisper/tendrl/clusters/<cluster-id> directory).

This is an edge case which we can address by combination of:

 * increased error detection/reporting (eg. monitoring free space on
   /var/lib/carbon partition and reporting an alert)
 * description of this case in debugging guide

Version-Release number
======================

RHGS WA components on tendrl server machine:

tendrl-ansible-1.6.3-4.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
tendrl-commons-1.6.3-6.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-4.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-4.el7rhgs.noarch
tendrl-node-agent-1.6.3-6.el7rhgs.noarch
tendrl-notifier-1.6.3-3.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-3.el7rhgs.noarch

Other WA components:

grafana-4.3.2-3.el7rhgs.x86_64
python-carbon-0.9.15-2.1.el7rhgs.noarch
python-whisper-0.9.15-1.1.el7rhgs.noarch

How reproducible
================

100%

Steps to Reproduce
==================

1. Prepare RHGS trusted storage pool
2. Prepare separate partitions for etcd and graphite data on RHGS WA server
3. Install RHGS WA using tendrl-ansible
4. Generate large file in /var/lib/carbon partition so that there are no
   free space left there
5. Import the cluster

Actual results
==============

The ImportCluster task finishes with success, and all components of the
trusted storage pool are shown in the WA interface.

There are no errors reported by WA directly (via ui, task details or alerts).

Grafana dashboard shows no data. Only empty directory structure was created
in /var/lib/carbon/whisper/tendrl/clusters/<cluster-id> directory, as any
attempt to write data there fails:

```
# pwd
/var/lib/carbon/whisper/tendrl/clusters/<cluster-id>
# find . -type d | wc -l
301
# find . -type f | wc -l
0
```

Log file of carbon, /var/log/carbon/console.log contains error messages about
this:

```
11/06/2018 08:11:13 :: 'Error creating /var/lib/carbon/whisper/tendrl/clusters/84ffce52-031b-415f-a8a0-c878043dfd89/nodes/mbukatov-usm1-gl5/bricks/|mnt|brick_gama_disperse_2|2/device/vdc/mount_utilization/total.wsp'
```

Expected results
================

This is an edge case, we may consider monitoring disk usage of carbon partition
and report the problem via alerts if needed.

Comment 3 Martin Bukatovic 2018-06-11 12:28:18 UTC
Asking doc team: would this case be worth mentioning in debugging/troubleshooting
guide for RHGS WA 3.4?

Comment 4 Martin Bukatovic 2018-06-11 12:43:33 UTC
Additional Information
----------------------

When I free some space in /var/lib/carbon, the data starts to gradually appear
on the dashboard. I haven't checked in detail how long it takes or whether all
data will appear eventually.