Description of problem ====================== When carbon fails to create it's database for some reason, WA doesn't notice the problem and reports import cluster as success, even though no data could be shown on any dashboard. Version-Release number of selected component ============================================ # rpm -qa | grep tendrl | sort tendrl-ansible-1.6.3-8.el7rhgs.noarch tendrl-api-1.6.3-7.el7rhgs.noarch tendrl-api-httpd-1.6.3-7.el7rhgs.noarch tendrl-commons-1.6.3-13.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-14.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-14.el7rhgs.noarch tendrl-node-agent-1.6.3-10.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-11.el7rhgs.noarch [root@mbukatov-usm1-server ~]# rpm -qa | egrep '(carbon|grafana|collectd)' | grep -v tendrl | sort carbon-selinux-1.5.4-2.el7rhgs.noarch collectd-5.7.2-3.1.el7rhgs.x86_64 collectd-ping-5.7.2-3.1.el7rhgs.x86_64 grafana-4.3.2-3.el7rhgs.x86_64 libcollectdclient-5.7.2-3.1.el7rhgs.x86_64 python-carbon-0.9.15-2.1.el7rhgs.noarch How reproducible ================ 100 % Steps to Reproduce ================== 1. Prepare trusted storage pool with few volumes to import 2. Install WA using tendrl-ansible 3. On WA server, make sure that directory /var/lib/carbon/whisper/tendrl exists, but carbon user can't write there (eg. run chown root /var/lib/carbon/whisper/tendrl) and that /var/lib/carbon/whisper/tendrl is empty directory 4. Import cluster Alternative reproducer when you have cluster already imported: 1. Unmanage the cluster 2. Wait for some time to show again in tendrl interface of WA 3. On WA server, run: chown root /var/lib/carbon/whisper/tendrl 4. On WA server, run: rmdir /var/lib/carbon/whisper/tendrl/cluster 5. Run import cluster again Actual results ============== The import task finishes with success, but there are no data points in the dashboard, as carbon was unable to initialize it's database, which can be seen in /var/log/carbon/console.log, which contains tons of error messages like: ``` 07/11/2018 09:16:27 :: 'Error creating /var/lib/carbon/whisper/tendrl/clusters/969a2d08-3e24-4f56-8a09-61575106f8b9/nodes/mbukatov-usm1-gl3/brick_count/up.wsp' 07/11/2018 09:16:27 :: "[Errno 13] Permission denied: '/var/lib/carbon/whisper/tendrl/clusters'" ``` One can also directly check that clusters directory (which normally holds the carbon database) is missing: ``` # ls -l /var/lib/carbon/whisper/tendrl/ total 0 drwxr-xr-x. 3 root root 22 Nov 7 08:57 archive drwxr-xr-x. 2 root root 26 Nov 7 09:14 names # ``` Expected results ================ WA performs some validation during last phase of import cluster and notifies the user about the problem. Additional info =============== When the problem with access rights is fixed, carbon recovers and the database and dashboard is populated (datapoints appear in the dashboard almost immediately). ``` # chown carbon /var/lib/carbon/whisper/tendrl # ls -l /var/lib/carbon/whisper/tendrl/clusters/ total 0 drwxr-xr-x. 8 carbon carbon 124 Nov 7 09:20 969a2d08-3e24-4f56-8a09-61575106f8b9 # ```
There are other self monitoring/error reporting bugs reported for WA and carbon, eg. BZ 1589801.
PR: https://github.com/Tendrl/monitoring-integration/pull/596, assigning permission to the carbon user while creating an alias
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3251