1514733 – Cluster import fails because of collectd service not having started on one of the nodes

Bug 1514733 - Cluster import fails because of collectd service not having started on one of the nodes

Summary: Cluster import fails because of collectd service not having started on one of...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-monitoring-integration
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Nishanth Thomas
QA Contact:	Sweta Anandpara
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-11-18 06:39 UTC by Sweta Anandpara
Modified:	2017-12-18 04:37 UTC (History)
CC List:	4 users (show)
Fixed In Version:	tendrl-node-agent-1.5.4-4.el7rhgs.noarch.rpm
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-18 04:37:04 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:3478	0	normal	SHIPPED_LIVE	RHGS Web Administration packages	2017-12-18 09:34:49 UTC

Description Sweta Anandpara 2017-11-18 06:39:21 UTC

Description of problem:
=======================
While importing a 3 node cluster into a tendrl server already having a successfully imported (one) cluster, it failed with the message 'Could not find atom tendrl.objects.Cluster.atoms.ConfigureMonitoring '

Everything seemed to be going well with cluster import, until a missing 'service collectd running on node <>' message for one of the nodes of the 3-node-cluster. When checked on the backend, 'collectd' service is dead(inactive) on one of the nodes, but running on the other 2 nodes.

error    Failed post-run: tendrl.objects.Cluster.atoms.ConfigureMonitoring for flow: Import existing Gluster Cluster    18 Nov 2017 06:59:23    
info    Running Flow monitoring.flows.NewClusterDashboard    18 Nov 2017 06:59:22    
info    Processing Job 90885685-5b3f-4fc2-9074-280294a47d57    18 Nov 2017 06:59:22
error    Could not find atom tendrl.objects.Cluster.atoms.ConfigureMonitoring    18 Nov 2017 06:59:22
info    Released lock (e6a30215-3261-475a-bcd9-78071d7ff3ae) for Node (dec9e261-e263-41c1-9324-fc9ced7d4d62)    18 Nov 2017 06:59:22
info    Job (4b7d2892-fdce-41a2-8cbe-a34a5720c466): Finished Flow tendrl.flows.ImportCluster    18 Nov 2017 06:59:14
info    Service collectd running on node dhcp42-243.lab.eng.blr.redhat.com    18 Nov 2017 06:59:14
info    Job (85e39abf-c114-4f83-8cb5-0e44835a7f0d): Finished Flow tendrl.flows.ImportCluster    18 Nov 2017 06:59:14
info    Service collectd running on node dhcp42-206.lab.eng.blr.redhat.com    18 Nov 2017 06:59:14


Screenshot of the tasks and var/log/messages have been copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>

Version-Release number of selected component (if applicable):
=============================================================
tendrl-grafana-plugins-1.5.4-3.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-node-agent-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-3.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-notifier-1.5.4-2.el7rhgs.noarch
tendrl-commons-1.5.4-2.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-ui-1.5.4-2.el7rhgs.noarch


On storage node: 
tendrl-collectd-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-2.el7rhgs.noarch
tendrl-node-agent-1.5.4-2.el7rhgs.noarch
tendrl-gluster-integration-1.5.4-2.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch


How reproducible:
=================
1:1


Additional info:
================
The setup is in the same state if it has to be looked at.

Comment 2 Petr Penicka 2017-11-20 13:25:36 UTC

Giving pm_ack and 3.3.z+ since both qa_ack and dev_ack are already given.

Comment 4 Sweta Anandpara 2017-11-22 08:16:25 UTC

Validated the same on build tendrl-node-agent-1.5.4-5.el7rhgs.noarch

Cluster import succeeded without any issues. Collectd service on every storage node is up and running. 

Having discussed it with developer (Nishanth), there was a fix that went in gluster/monitoring integration, which ended up fixing this issue as well.. in other words, the issue mentioned in this bugzilla is a one-off case and will not be easy to reproduce and hence verify.

Moving the bug to its final state for RHGS 3.3.1.

Comment 6 errata-xmlrpc 2017-12-18 04:37:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3478

Note You need to log in before you can comment on or make changes to this bug.