Bug 1647322

Summary: WA should detect and report problems with carbon initialization
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Martin Bukatovic <mbukatov>
Component: web-admin-tendrl-commonsAssignee: Timothy Asir <tjeyasin>
Status: CLOSED ERRATA QA Contact: Sweta Anandpara <sanandpa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: amukherj, nthomas, rcyriac, rhinduja, rhs-bugs, sanandpa, tjeyasin
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tendrl-monitoring-integration-1.6.3-22.el7rhgs.noarch Doc Type: Bug Fix
Doc Text:
Previously, tendrl did not set an owner for the /var/lib/carbon/whisper/tendrl directory. When the owner of this directory was not the 'carbon' user, carbon-cache could not create whisper files in this location. Tendrl now ensures the directory is owned by the 'carbon' user to ensure whisper files can be created.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-30 12:23:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1696807    

Description Martin Bukatovic 2018-11-07 08:23:41 UTC
Description of problem
======================

When carbon fails to create it's database for some reason, WA doesn't notice
the problem and reports import cluster as success, even though no data could
be shown on any dashboard.

Version-Release number of selected component
============================================

# rpm -qa | grep tendrl | sort
tendrl-ansible-1.6.3-8.el7rhgs.noarch
tendrl-api-1.6.3-7.el7rhgs.noarch
tendrl-api-httpd-1.6.3-7.el7rhgs.noarch
tendrl-commons-1.6.3-13.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-14.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-14.el7rhgs.noarch
tendrl-node-agent-1.6.3-10.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-11.el7rhgs.noarch

[root@mbukatov-usm1-server ~]# rpm -qa | egrep '(carbon|grafana|collectd)' | grep -v tendrl | sort
carbon-selinux-1.5.4-2.el7rhgs.noarch
collectd-5.7.2-3.1.el7rhgs.x86_64
collectd-ping-5.7.2-3.1.el7rhgs.x86_64
grafana-4.3.2-3.el7rhgs.x86_64
libcollectdclient-5.7.2-3.1.el7rhgs.x86_64
python-carbon-0.9.15-2.1.el7rhgs.noarch

How reproducible
================

100 %

Steps to Reproduce
==================

1. Prepare trusted storage pool with few volumes to import
2. Install WA using tendrl-ansible
3. On WA server, make sure that directory
   /var/lib/carbon/whisper/tendrl exists, but carbon user
   can't write there (eg. run chown root /var/lib/carbon/whisper/tendrl)
   and that /var/lib/carbon/whisper/tendrl is empty directory
4. Import cluster

Alternative reproducer when you have cluster already imported:

1. Unmanage the cluster
2. Wait for some time to show again in tendrl interface of WA
3. On WA server, run: chown root /var/lib/carbon/whisper/tendrl
4. On WA server, run: rmdir /var/lib/carbon/whisper/tendrl/cluster
5. Run import cluster again

Actual results
==============

The import task finishes with success, but there are no data points
in the dashboard, as carbon was unable to initialize it's database,
which can be seen in /var/log/carbon/console.log, which contains
tons of error messages like:

```
07/11/2018 09:16:27 :: 'Error creating /var/lib/carbon/whisper/tendrl/clusters/969a2d08-3e24-4f56-8a09-61575106f8b9/nodes/mbukatov-usm1-gl3/brick_count/up.wsp'
07/11/2018 09:16:27 :: "[Errno 13] Permission denied: '/var/lib/carbon/whisper/tendrl/clusters'"
```

One can also directly check that clusters directory (which normally
holds the carbon database) is missing:

```
# ls -l /var/lib/carbon/whisper/tendrl/
total 0
drwxr-xr-x. 3 root root 22 Nov  7 08:57 archive
drwxr-xr-x. 2 root root 26 Nov  7 09:14 names
# 
```

Expected results
================

WA performs some validation during last phase of import cluster and
notifies the user about the problem.

Additional info
===============

When the problem with access rights is fixed, carbon recovers and
the database and dashboard is populated (datapoints appear in the
dashboard almost immediately).

```
# chown carbon /var/lib/carbon/whisper/tendrl
# ls -l /var/lib/carbon/whisper/tendrl/clusters/
total 0
drwxr-xr-x. 8 carbon carbon 124 Nov  7 09:20 969a2d08-3e24-4f56-8a09-61575106f8b9
#
```

Comment 2 Martin Bukatovic 2018-11-07 08:30:08 UTC
There are other self monitoring/error reporting bugs reported for WA and carbon,
eg. BZ 1589801.

Comment 3 gowtham 2019-04-01 16:43:36 UTC
PR: https://github.com/Tendrl/monitoring-integration/pull/596, assigning permission to the carbon user while creating an alias

Comment 20 errata-xmlrpc 2019-10-30 12:23:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3251