Bug 1312404

Summary: [API] tasks Accept Node finish before node is actually ready
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Daniel Horák <dahorak>
Component: coreAssignee: Nishanth Thomas <nthomas>
core sub component: provisioning QA Contact: sds-qe-bugs
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified CC: dahorak, mbukatov, nthomas
Version: 2Keywords: Reopened, TestBlocker
Target Milestone: ---   
Target Release: 2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-05 08:45:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1298114, 1339605    
Bug Blocks:    

Description Daniel Horák 2016-02-26 15:55:14 UTC
Description of problem:
  When I'll accept unmanaged node (no mater if it is via API or from web UI), it launch two related tasks:
    * Accept Node: ...
    * Initialize Node: ...

  First task just add the node to the database, which is very quick. Second task configure/install required services on the node itself, which usually take longer time (tens seconds).

  The problem is, that only the first task is returned as a result for accepting node and when this task is finished (which is basically immediately), the client part (web UI or API client) think that the node is ready for further actions (e.g. cluster creation).
  There is no connection between "Accept Node" task and "Initialize Node" task.


Version-Release number of selected component (if applicable):
  rhscon-ceph-0.0.6-8.el7.x86_64,
  rhscon-core-0.0.8-7.el7.x86_64,
  rhscon-ui-0.0.16-1.el7.noarch
  rhscon-agent-0.0.3-2.el7.noarch

How reproducible:
  100%


Steps to Reproduce:
1. Accept "unmanaged" node (via API or web UI - if via web UI, watch the response e.g. in FireBug console).
2. Check the task mentioned in response for previous step.
3. Check also all other running tasks.

Actual results:
  The task "Accept Node..." is completed imediatelly, while task "Initialize Node..." is still running and there is no connection between two tasks.

Expected results:
  The task returned as a response for accepting node will finish only when the node will be ready (when will finish also the "Initialize Node..." task.

Additional info:
  This might be core issue of bug 1310746.

Comment 2 Nishanth Thomas 2016-03-10 10:01:09 UTC
This is how it is designed.
There are two steps in the accept process:
1. accept the node by contacting salt-master - ist task
2. Once the communication channel is between the salt-master and salt-minion is up, USM gets an event from salt-master, up on which USM retrieves the node details from the node and populate the DB - 2nd task

Based on the state of the node you can figure out the node is ready to be consumed for cluster creaton

This is not a bug and I am closing this

Comment 3 Daniel Horák 2016-04-07 09:04:16 UTC
I have few notes/questions for this:

1) from my point of view, it would be good to have the two tasks connected - e.g. why isn't the second task subtasks of the first one?
It would be much easier for automation - accept all the nodes and than continue with some other tasks not before all the nodes are accepted and initialized.

2) It is quite difficult to check the particular node through the accepting process, because unaccepted node hasn't 'nodeid', so it is not possible to check it repeatedly after sending the accept request. The only connecting value is 'hostname'.

3) Related to the previous note, is it possible to get/find/filter nodes not only by 'nodeid' but also by hostname? (This wouldn't be the best solution for the previous notes, it would be more likely "workaround", but it would be definitely handy functionality.)

Comment 5 Nishanth Thomas 2016-04-28 09:33:54 UTC
(In reply to Daniel Horák from comment #3)
> I have few notes/questions for this:
> 
> 1) from my point of view, it would be good to have the two tasks connected -
> e.g. why isn't the second task subtasks of the first one?
> It would be much easier for automation - accept all the nodes and than
> continue with some other tasks not before all the nodes are accepted and
> initialized.
> 

Its not possible because those are separate execution paths. Once the accept is triggered, API will do the accept and returns. After that once the communication channel is ready, salt master will trigger an event based on that initialization task starts. so there is no way to connect these.  

> 2) It is quite difficult to check the particular node through the accepting
> process, because unaccepted node hasn't 'nodeid', so it is not possible to
> check it repeatedly after sending the accept request. The only connecting
> value is 'hostname'.

you can filter the nodes based on the state("initializing") and figure out the status of initialization.  

> 
> 3) Related to the previous note, is it possible to get/find/filter nodes not
> only by 'nodeid' but also by hostname? (This wouldn't be the best solution
> for the previous notes, it would be more likely "workaround", but it would
> be definitely handy functionality.)

If you still feel that this is required please raise a RFE to address this

Hope this answered your questions. I will go ahead and close this bug.

Comment 6 Daniel Horák 2016-05-06 12:26:59 UTC
(In reply to Nishanth Thomas from comment #5)
> (In reply to Daniel Horák from comment #3)
> > 2) It is quite difficult to check the particular node through the accepting
> > process, because unaccepted node hasn't 'nodeid', so it is not possible to
> > check it repeatedly after sending the accept request. The only connecting
> > value is 'hostname'.
> 
> you can filter the nodes based on the state("initializing") and figure out
> the status of initialization.  

Is it possible to filter the nodes directly in the API GET nodes request?
Something like this:

  ${SKYRINGSERVER}:8080/api/v1/nodes?state=initializing

For me it returns "ERROR 500: Internal Server Error."

Comment 7 Nishanth Thomas 2016-05-10 05:11:30 UTC
Right this is not implemented. Raise a RFE if you want that to be implemented.

Comment 8 Martin Bukatovic 2016-05-20 16:47:15 UTC
This misunderstanding seems to be a result of missing design and
documentation of REST API.

Comment 9 Martin Bukatovic 2016-05-20 16:50:48 UTC
Is there a way how could one start a Accept host workflow and wait until it actually finishes (which includes both accept and initialize tasks) from the API?

Or do we have this process (how to do this via API) properly documented?

If there is no answer of the answer is no, this is a valid bug and so
I would reopen this BZ as it would not make sense to create a new one
for the same issue again.

Comment 10 Nishanth Thomas 2016-05-24 05:36:52 UTC
I already mentioned this couple of times in the same thread itself

"
This is how it is designed.
There are two steps in the accept process:
1. accept the node by contacting salt-master - ist task
2. Once the communication channel is between the salt-master and salt-minion is up, USM gets an event from salt-master, up on which USM retrieves the node details from the node and populate the DB - 2nd task

Based on the state of the node you can figure out the node is ready to be consumed for cluster creaton

"

So you need to look at the 'STATE' of the node to see whether it is usable or not. This must be covered as part of the documentation and you can raise a bug against doc it it is not covered.

Comment 11 Martin Bukatovic 2016-05-25 12:52:51 UTC
(In reply to Nishanth Thomas from comment #10) 
> So you need to look at the 'STATE' of the node to see whether it is usable
> or not. This must be covered as part of the documentation and you can raise
> a bug against doc it it is not covered.

Ok, so I created a new BZ for documentation of this particular use case (BZ 1339605) which is now blocking this BZ. Moreover I changed component of
BZ 1298114 because this kind of documentation should be written by developers
in the upstream along with the code as we already discussed some time ago.

Based on the state of BZ 1339605, we may decide to reopen this BZ if we
identify an issue with the way how the API is designed. We see a risk that
the API was not properly designed with all use cases in mind and a missing
documentation and examples hinder us from properly discussing this here.