1462807 – GlusterCreateBrick job fails and there are no messages

Bug 1462807 - GlusterCreateBrick job fails and there are no messages

Summary: GlusterCreateBrick job fails and there are no messages

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	Gluster Integration
Sub Component:
Version:	3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	alpha
Target Release:	3-alpha
Assignee:	Shubhendu Tripathi
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-19 15:10 UTC by Filip Balák
Modified:	2018-11-19 05:43 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-19 05:42:55 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1460762	0	unspecified	CLOSED	jobs API call returns: Invalid JSON received.	2021-02-22 00:41:40 UTC

Internal Links: 1460762

Description Filip Balák 2017-06-19 15:10:34 UTC

Description of problem:
I try to create bricks via API. The job is created but remains `new` for a long time and then fails. After it fails I call `hostname/api/1.0/jobs/:job_id:/messages` API call but it returns empty list. `hostname/api/1.0/jobs` API call starts returning `{"errors":{"message":"Invalid JSON received."}}` (as described in BZ 1460762). On nodes provided to API call are no directories provided as path to GlusterCreateBrick API call (/bricks/fs_gluster01).

Version-Release number of selected component (if applicable):
tendrl-alerting-3.0-alpha.3.el7scon.noarch
tendrl-api-3.0-alpha.4.el7scon.noarch
tendrl-api-doc-3.0-alpha.4.el7scon.noarch
tendrl-api-httpd-3.0-alpha.4.el7scon.noarch
tendrl-commons-3.0-alpha.9.el7scon.noarch
tendrl-dashboard-3.0-alpha.4.el7scon.noarch
tendrl-node-agent-3.0-alpha.9.el7scon.noarch
tendrl-performance-monitoring-3.0-alpha.7.el7scon.noarch

How reproducible:
Probably 100%
I tried it few times and it behave always in the same matter.

Steps to Reproduce:
1. Import cluster with 4 gluster nodes.
2. Restart machines at the same time. (I do this because of loading machine state from snapshots)
3. After few minutes run:
```
curl -X POST -H 'Authorization: Bearer :access_token:' -d '{":node1_id:": {"/bricks/fs_gluster01": {"brick_name": "brick"}}, ":node2_id:": {"/bricks/fs_gluster01": {"brick_name": "brick"}}, "node3_id": {"/bricks/fs_gluster01": {"brick_name": "brick"}}, "node4_id": {"/bricks/fs_gluster01": {"brick_name": "brick"}}}' http://hostname/api/1.0/:cluster_id:/GlusterCreateBrick
```
4. Check `hostname/api/1.0/jobs/:job_id:` with job_id returned in response from previous step.

Actual results:
Job remains as `new` for a long time. After a while it fails and there are no messages about what failed. It also causes BZ 1460762.

Expected results:
Job should finish and create bricks or if it fails it should provide message with error what failed.

Additional info:

Comment 3 Nishanth Thomas 2017-06-20 04:08:06 UTC

I believe that it is failing due to https://github.com/Tendrl/gluster-integration/issues/315

Can you confirm? If you need any help, talk to @shubhendu

Comment 4 Filip Balák 2017-06-20 10:58:00 UTC

It might be the case. In /nodes/:node_id:/NodeContext/tags I see provisioner tags with two gluster nodes. Also GlusterCreateBrick task is not working for UI either after restart of machines.

Comment 5 Anup Nivargi 2017-06-20 17:04:03 UTC

Upstream fix at https://github.com/Tendrl/api/pull/213

Comment 6 Shubhendu Tripathi 2017-06-21 08:55:51 UTC

Filip, is it the scenario where the cluster was created from tendrl UI earlier and later after cleanup of etcd the same cluster is imported. If that's the case it could be related to https://github.com/Tendrl/gluster-integration/issues/315 as mentioned by Nishanth.

Also in simulation steps, I see step-2 `Restart machines at the same time. (I do this because of loading machine state from snapshots)` if nodes are still starting and `tendrl-gluster-inetgration` doesnt come as part of re-start, there is nobody to pick the create brick jobs and it would time out.

In latest builds all the tendrl services are marked now for re-start. With latest build I feel create bricks job should be picked after nodes are started back.

Would suggest to try with latest builds once. Also if its related to https://github.com/Tendrl/gluster-integration/issues/315, its still being worked out and would be available in later build.

Comment 7 Filip Balák 2017-06-23 15:00:19 UTC

The cluster was created by gluster cli, after that it was imported to tendrl.
It might have be caused by inactive `tendrl-gluster-inetgration`. There should be some message in that case.

Comment 10 Shubhendu Tripathi 2017-06-27 04:34:08 UTC

@rohan, thoughts on comment#7??

Comment 15 Shubhendu Tripathi 2018-11-19 05:42:55 UTC

This product is EOL now

Note You need to log in before you can comment on or make changes to this bug.