Bug 1514733 - Cluster import fails because of collectd service not having started on one of the nodes
Summary: Cluster import fails because of collectd service not having started on one of...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: web-admin-tendrl-monitoring-integration
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Nishanth Thomas
QA Contact: Sweta Anandpara
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-18 06:39 UTC by Sweta Anandpara
Modified: 2017-12-18 04:37 UTC (History)
4 users (show)

Fixed In Version: tendrl-node-agent-1.5.4-4.el7rhgs.noarch.rpm
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-18 04:37:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:3478 normal SHIPPED_LIVE RHGS Web Administration packages 2017-12-18 09:34:49 UTC

Description Sweta Anandpara 2017-11-18 06:39:21 UTC
Description of problem:
=======================
While importing a 3 node cluster into a tendrl server already having a successfully imported (one) cluster, it failed with the message 'Could not find atom tendrl.objects.Cluster.atoms.ConfigureMonitoring '

Everything seemed to be going well with cluster import, until a missing 'service collectd running on node <>' message for one of the nodes of the 3-node-cluster. When checked on the backend, 'collectd' service is dead(inactive) on one of the nodes, but running on the other 2 nodes.

error    Failed post-run: tendrl.objects.Cluster.atoms.ConfigureMonitoring for flow: Import existing Gluster Cluster    18 Nov 2017 06:59:23    
info    Running Flow monitoring.flows.NewClusterDashboard    18 Nov 2017 06:59:22    
info    Processing Job 90885685-5b3f-4fc2-9074-280294a47d57    18 Nov 2017 06:59:22
error    Could not find atom tendrl.objects.Cluster.atoms.ConfigureMonitoring    18 Nov 2017 06:59:22
info    Released lock (e6a30215-3261-475a-bcd9-78071d7ff3ae) for Node (dec9e261-e263-41c1-9324-fc9ced7d4d62)    18 Nov 2017 06:59:22
info    Job (4b7d2892-fdce-41a2-8cbe-a34a5720c466): Finished Flow tendrl.flows.ImportCluster    18 Nov 2017 06:59:14
info    Service collectd running on node dhcp42-243.lab.eng.blr.redhat.com    18 Nov 2017 06:59:14
info    Job (85e39abf-c114-4f83-8cb5-0e44835a7f0d): Finished Flow tendrl.flows.ImportCluster    18 Nov 2017 06:59:14
info    Service collectd running on node dhcp42-206.lab.eng.blr.redhat.com    18 Nov 2017 06:59:14


Screenshot of the tasks and var/log/messages have been copied to http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>

Version-Release number of selected component (if applicable):
=============================================================
tendrl-grafana-plugins-1.5.4-3.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-node-agent-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-3.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-notifier-1.5.4-2.el7rhgs.noarch
tendrl-commons-1.5.4-2.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-ui-1.5.4-2.el7rhgs.noarch


On storage node: 
tendrl-collectd-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-2.el7rhgs.noarch
tendrl-node-agent-1.5.4-2.el7rhgs.noarch
tendrl-gluster-integration-1.5.4-2.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch


How reproducible:
=================
1:1


Additional info:
================
The setup is in the same state if it has to be looked at.

Comment 2 Petr Penicka 2017-11-20 13:25:36 UTC
Giving pm_ack and 3.3.z+ since both qa_ack and dev_ack are already given.

Comment 4 Sweta Anandpara 2017-11-22 08:16:25 UTC
Validated the same on build tendrl-node-agent-1.5.4-5.el7rhgs.noarch

Cluster import succeeded without any issues. Collectd service on every storage node is up and running. 

Having discussed it with developer (Nishanth), there was a fix that went in gluster/monitoring integration, which ended up fixing this issue as well.. in other words, the issue mentioned in this bugzilla is a one-off case and will not be easy to reproduce and hence verify.

Moving the bug to its final state for RHGS 3.3.1.

Comment 6 errata-xmlrpc 2017-12-18 04:37:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3478


Note You need to log in before you can comment on or make changes to this bug.