Bug 1516242

Summary: When rpm package installation fails on Could not resolve host error during Import Cluster operation, information about this problem is not provided
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Martin Bukatovic <mbukatov>
Component: web-admin-tendrl-node-agentAssignee: Shubhendu Tripathi <shtripat>
Status: CLOSED ERRATA QA Contact: Lubos Trilety <ltrilety>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: ltrilety, mkudlej, rhs-bugs, sanandpa, sankarshan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tendrl-node-agent-1.5.4-8.el7rhgs.noarch Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-18 04:37:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot of task details page with error messages
none
response from ${TENDRL_SERVER}/api/1.0/jobs/${JOB_ID}/messages api call none

Description Martin Bukatovic 2017-11-22 10:48:46 UTC
Created attachment 1357364 [details]
screenshot of task details page with error messages

Description of problem
======================

When rpm package installation fails on Could not resolve host error during
Import Cluster operation, information about this problem is not provided.

This is similar to upstream, more serious issue reported (and fixed no) as:

https://github.com/Tendrl/node-agent/issues/627

but it differs in reproducer and a type of failure, which is not detected
and reported in case of this BZ. Compared to original upstream issue 
Tendrl/node-agent/issues/627, this seems to have a lower priority.

Version-Release number
======================

tendrl-node-agent-1.5.4-3.el7rhgs.noarch

[root@usm1-gl1 ~]# rpm -qa | grep tendrl | sort
tendrl-collectd-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-3.el7rhgs.noarch
tendrl-node-agent-1.5.4-3.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch

[root@usm1-server ~]# rpm -qa | grep tendrl | sort
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-commons-1.5.4-3.el7rhgs.noarch
tendrl-grafana-plugins-1.5.4-4.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-4.el7rhgs.noarch
tendrl-node-agent-1.5.4-3.el7rhgs.noarch
tendrl-notifier-1.5.4-2.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-ui-1.5.4-3.el7rhgs.noarch

Steps to Reproduce
==================

1. Prepare machines with GlusterFS storage pool, including gluster volume
2. Install RHGS WA using tendrl-ansible
3. Pick one storage server node and break rpm repository of RHGS WA channel
   by specifying invalid hostname in repourl (eg. download.example.com).
4. Verify that installation of tendrl-gluster-integration is not possible
   on machine selected in previous step.
   Yum install should fail on error:
   > "Could not resolve host: download.example.com; Name or service not known"
5. Import gluster trusted storage pool with a volume (prepared in step #1).
6. Wait until the import task finishes (it's expected to fail).
7. Check the details in task details page.

Actual results
==============

Even though that the installation of tendrl-gluster-integration package failed
on one storage server, there is no direct indication of this happening on the
page.

Errors reported include:

* Error: Error executing atom: tendrl.objects.Cluster.atoms.ImportCluster
* Failed atom: tendrl.objects.Cluster.atoms.ImportCluster on flow: Import existing Gluster Cluster
* Error: Cluster data sync still incomplete. Timing out
* Failed atom: tendrl.objects.Cluster.atoms.ImportCluster on flow: Import existing Gluster Cluster

but not a single one mentions the problem with yum installation failure.

When I download the task messages via curl like this:

```
$ curl "${TENDRL_SERVER}/api/1.0/jobs/${JOB_ID}/messages" -H "Authorization: Bearer ${TENDRL_TOKEN}" | jq '.' > job.pretty.json
```

I can grep within the error messages, but I don't see anything interesting:

```
$ grep yum job.pretty.json
$ grep fail job.pretty.json
      "message": "Failure in Job 4cd6bfee-d3ef-49cf-8c24-b72827e47225 Flow tendrl.flows.ImportCluster with error:\nTraceback (most recent call last):\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py\", line 218, in process_job\n    the_flow.run()\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py\", line 103, in run\n    raise ex\nAtomExecutionFailedError: Atom Execution failed. Error: Error executing atom: tendrl.objects.Cluster.atoms.ImportCluster on flow: Import existing Gluster Cluster\n"
      "message": "Failure in Job 5c0c6cab-a4ea-4892-adaf-3001822d3efe Flow tendrl.flows.ImportCluster with error:\nTraceback (most recent call last):\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py\", line 218, in process_job\n    the_flow.run()\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py\", line 103, in run\n    raise ex\nAtomExecutionFailedError: Atom Execution failed. Error: Cluster data sync still incomplete. Timing out\n"
      "message": "Failure in Job 38ae87de-d101-429f-a499-c84f18f93a1a Flow tendrl.flows.ImportCluster with error:\nTraceback (most recent call last):\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py\", line 218, in process_job\n    the_flow.run()\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py\", line 103, in run\n    raise ex\nAtomExecutionFailedError: Atom Execution failed. Error: Cluster data sync still incomplete. Timing out\n"
      "message": "Failure in Job e282dca2-af32-49f5-bb87-a9610e9b4fef Flow tendrl.flows.ImportCluster with error:\nTraceback (most recent call last):\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py\", line 218, in process_job\n    the_flow.run()\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py\", line 103, in run\n    raise ex\nAtomExecutionFailedError: Atom Execution failed. Error: Cluster data sync still incomplete. Timing out\n"
      "message": "Failure in Job bba9853d-3961-4bbe-87ab-8f02c5562a1c Flow tendrl.flows.ImportCluster with error:\nTraceback (most recent call last):\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py\", line 218, in process_job\n    the_flow.run()\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py\", line 103, in run\n    raise ex\nAtomExecutionFailedError: Atom Execution failed. Error: Cluster data sync still incomplete. Timing out\n"
      "message": "Failure in Job b7680ecf-e0a8-4dc9-8cd6-538f6fe807f6 Flow tendrl.flows.ImportCluster with error:\nTraceback (most recent call last):\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/jobs/__init__.py\", line 218, in process_job\n    the_flow.run()\n  File \"/usr/lib/python2.7/site-packages/tendrl/commons/flows/import_cluster/__init__.py\", line 103, in run\n    raise ex\nAtomExecutionFailedError: Atom Execution failed. Error: Error executing atom: tendrl.objects.Cluster.atoms.ImportCluster on flow: Import existing Gluster Cluster\n"
```

```
$ grep gluster-integration job.pretty.json
      "message": "Installing tendrl-gluster-integration on Node mbukatov-usm1-gl6.example.com"
      "message": "Installing tendrl-gluster-integration on Node mbukatov-usm1-gl2.example.com"
      "message": "Installing tendrl-gluster-integration on Node mbukatov-usm1-gl5.example.com"
      "message": "Installing tendrl-gluster-integration on Node mbukatov-usm1-gl1.example.com"
      "message": "Installing tendrl-gluster-integration on Node mbukatov-usm1-gl3.example.com"
      "message": "Installing tendrl-gluster-integration on Node mbukatov-usm1-gl4.example.com"
      "message": "Generating configuration for tendrl-gluster-integration on Node mbukatov-usm1-gl6.example.com"
      "message": "Running tendrl-gluster-integration on Node mbukatov-usm1-gl6.example.com"
      "message": "Generating configuration for tendrl-gluster-integration on Node mbukatov-usm1-gl3.example.com"
      "message": "Running tendrl-gluster-integration on Node mbukatov-usm1-gl3.example.com"
      "message": "Generating configuration for tendrl-gluster-integration on Node mbukatov-usm1-gl5.example.com"
      "message": "Running tendrl-gluster-integration on Node mbukatov-usm1-gl5.example.com"
      "message": "Generating configuration for tendrl-gluster-integration on Node mbukatov-usm1-gl2.example.com"
      "message": "Running tendrl-gluster-integration on Node mbukatov-usm1-gl2.example.com"
      "message": "Generating configuration for tendrl-gluster-integration on Node mbukatov-usm1-gl4.example.com"
      "message": "Running tendrl-gluster-integration on Node mbukatov-usm1-gl4.example.com"

```

Expected results
================

Tendrl should report an error about the fact that installation of the package
failed, with details indicating why is that (based on yum error) if possible.

Which in this case is that yum can't resolve a hostname of RHGS WA repo.

Comment 2 Martin Bukatovic 2017-11-22 10:49:50 UTC
Created attachment 1357365 [details]
response from ${TENDRL_SERVER}/api/1.0/jobs/${JOB_ID}/messages api call

Comment 4 Lubos Trilety 2017-11-29 14:26:27 UTC
Tested with:
tendrl-node-agent-1.5.4-8.el7rhgs.noarch

There error from which is clear what happened, why import failed.

Comment 6 errata-xmlrpc 2017-12-18 04:37:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3478