Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1588357 - Sometimes import flow and unmanage flow is failing
Sometimes import flow and unmanage flow is failing
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: web-admin-tendrl-commons (Show other bugs)
3.4
Unspecified Unspecified
unspecified Severity unspecified
: ---
: RHGS 3.4.0
Assigned To: gowtham
Filip Balák
: TestBlocker
Depends On:
Blocks: 1503137 1570048
  Show dependency treegraph
 
Reported: 2018-06-07 02:47 EDT by gowtham
Modified: 2018-09-04 03:08 EDT (History)
6 users (show)

See Also:
Fixed In Version: tendrl-commons-1.6.3-7.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-09-04 03:07:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Github https://github.com/Tendrl/commons/issues/983 None None None 2018-06-07 02:49 EDT
Red Hat Product Errata RHSA-2018:2616 None None None 2018-09-04 03:08 EDT

  None (edit)
Description gowtham 2018-06-07 02:47:31 EDT
Description of problem:
If a setup keeps running for some time then unmanage and import flows are working fine. If we do node-agent stop or down the node or server for sometimes then import and unmanage flows are failing. 

Version-Release number of selected component (if applicable):


How reproducible:
It is reprodusable using following steps:

Steps to Reproduce:
1. Stop node-agent in a server and few storage nodes for 200 sec, after 200 sec status watcher field in a node_context object will be deleted. It won't mark down because a server is down. the only server is monitoring the nodes and mark as down but in this case server also down. So status watcher filed for a node_context object in etcd won't be created. 

2. Start node-agent in a server and other storage nodes, then node-agent sync failed with exception:
Jun 07 06:22:50 tendrl-server tendrl-node-agent[6578]: Traceback (most recent call last):
Jun 07 06:22:50 tendrl-server tendrl-node-agent[6578]: File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
Jun 07 06:22:50 tendrl-server tendrl-node-agent[6578]: self.run()
Jun 07 06:22:50 tendrl-server tendrl-node-agent[6578]: File "/usr/lib/python2.7/site-packages/tendrl/node_agent/node_sync/__init__.py", line 52, in run
Jun 07 06:22:50 tendrl-server tendrl-node-agent[6578]: NS.node_context.save(ttl=_sync_ttl)
Jun 07 06:22:50 tendrl-server tendrl-node-agent[6578]: File "/usr/lib/python2.7/site-packages/tendrl/commons/objects/node_context/__init__.py", line 98, in save
Jun 07 06:22:50 tendrl-server tendrl-node-agent[6578]: etcd_utils.refresh(status, ttl)
Jun 07 06:22:50 tendrl-server tendrl-node-agent[6578]: File "/usr/lib/python2.7/site-packages/tendrl/commons/utils/etcd_utils.py", line 82, in refresh
Jun 07 06:22:50 tendrl-server tendrl-node-agent[6578]: raise ex
Jun 07 06:22:50 tendrl-server tendrl-node-agent[6578]: EtcdKeyNotFound: Key not found : /nodes/3d389d10-2e02-43f1-9e40-9104825798a4/NodeContext/status

3. Because there is no status watcher field but node-agent is trying to refresh watcher field so it raised a KeyNotFound exception.

4. If an import is already done then fire unmanage, it will fail. Because job won't pick up by the nodes so it will timeout.

5. If import flow is not done then fire import, that is also failed for the same reason. 

6. node-agent is down so all flows will fail.


Actual results:
when I stop the server and some storage nodes for sometimes and then start an import and unmanage flows are failing. 


Expected results:
import and unmanage should work when server and node are up.

Additional info:
Comment 2 gowtham 2018-06-07 02:52:43 EDT
upstream PR is merged for this : https://github.com/Tendrl/commons/pull/984
Comment 3 Martin Bukatovic 2018-06-07 03:01:11 EDT
(In reply to gowtham from comment #0)
> Version-Release number of selected component (if applicable):

Could you provide version of affected builds by running on the following on
on machine from the cluster:

```
# rpm -qa | grep tendrl | sort
```
Comment 6 Nishanth Thomas 2018-06-07 04:25:14 EDT
tendrl-ansible-1.6.3-4.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
tendrl-commons-1.6.3-6.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-4.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-4.el7rhgs.noarch
tendrl-node-agent-1.6.3-6.el7rhgs.noarch
tendrl-notifier-1.6.3-3.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-3.el7rhgs.noarch
Comment 10 Filip Balák 2018-06-25 05:03:45 EDT
I tested reproducer from this BZ and BZ 1570048 few times and everything seem ok. --> VERIFIED

Tested with:
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-3.el7rhgs.noarch
tendrl-api-httpd-1.6.3-3.el7rhgs.noarch
tendrl-commons-1.6.3-7.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-5.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-5.el7rhgs.noarch
tendrl-node-agent-1.6.3-7.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-4.el7rhgs.noarch
Comment 12 errata-xmlrpc 2018-09-04 03:07:28 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Note You need to log in before you can comment on or make changes to this bug.