Bug 1332984 - show list of not imported nodes
Summary: show list of not imported nodes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat Storage
Component: core
Version: 2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 2
Assignee: Shubhendu Tripathi
QA Contact: Martin Kudlej
URL:
Whiteboard:
Depends On: 1356503 1362510
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-04 13:16 UTC by Martin Kudlej
Modified: 2018-11-19 05:32 UTC (History)
0 users

Fixed In Version: rhscon-ceph-0.0.17-1.el7scon
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 05:32:44 UTC
Embargoed:


Attachments (Terms of Use)
Import cluster flow with failed nodes (78.17 KB, image/png)
2016-05-24 10:10 UTC, Shubhendu Tripathi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gerrithub.io 275253 0 None None None 2016-05-24 10:14:56 UTC

Description Martin Kudlej 2016-05-04 13:16:22 UTC
Description of problem:
There can be issues/problems/timeouts during cluster importing. Now if any node is not imported import is marked as success.
I think if any of node is not imported import should be in marked as "failed".
Also there should be possibility how to fix this issue. There should exist steps to complete import for timeouted nodes. List of not imported nodes should be visible in "import" task with clear reason why node was not imported.

Version-Release number of selected component (if applicable):
ceph-ansible-1.0.5-5.el7scon.noarch
ceph-installer-1.0.6-1.el7scon.noarch
rhscon-ceph-0.0.11-1.el7scon.x86_64
rhscon-core-0.0.14-1.el7scon.x86_64
rhscon-ui-0.0.28-1.el7scon.noarch

How reproducible:
in case of node import timeouted

Steps to Reproduce:
1. create cluster
2. unmanage cluster
3. forget cluster
4. wait till all nodes are visible in USM as available
5. import cluster

Actual results:
If any node is not imported import is marked as success. User don't know about this issue. User cannot fix this issue.

Expected results:
If any node is not imported import is marked as failed. User know about this issue and see list of not imported nodes with proper message. User can fix this issue by pushing button "Retry import"

Additional info:
bigfin.log:
14:51 < shubhendu> 2016-05-04T13:47:15.869+02:00 ERROR    import_cluster.go:455 
                   PopulateNodeNetworkDetails] Node mkudlej-usm1-node3.os1.phx2.redhat.com still in 
                   initializing state. Continuing to other
14:51 < shubhendu> 2016-05-04T13:49:26+0000 INFO     saltwrapper.py:49 saltwrapper.wrapper] 
                   args=(<salt.client.LocalClient object at 0x33a3810>, 
                   ['mkudlej-usm1-node1.os1.phx2.redhat.com'], 'cmd.run', ["lsblk --all --bytes 
                   --noheadings 
                   --output='NAME,KNAME,FSTYPE,MOUNTPOINT,UUID,PARTUUID,MODEL,SIZE,TYPE,PKNAME,VENDOR' 
                   --path --raw"]), kwargs={'expr_form': 'list'}

Comment 2 Shubhendu Tripathi 2016-05-24 10:10:48 UTC
Created attachment 1160973 [details]
Import cluster flow with failed nodes

Comment 3 Shubhendu Tripathi 2016-05-24 10:12:04 UTC
Below are the steps to verify the issue

1. Invoke import cluster from UI
2. Select a mon node from the nodes list
3. It shows the list of participating nodes of the cluster
4. Now bring down salt-minion on one of the nodes of the cluster (say a OSD)
5. Submit the import cluster request
6. While importing the failed node would be listed in the steps as error


Attached a screen shot a import cluster flow with one failed node for reference.

Comment 4 Martin Kudlej 2016-07-14 07:21:58 UTC
I still see this issue. I've tried reproducer from comment #3 with one monitor and one node and monitor was reinitialized. But node cannot be reinitialized even if it is part of cluster.
ceph-ansible-1.0.5-25.el7scon.noarch
ceph-installer-1.0.12-4.el7scon.noarch
rhscon-ceph-0.0.31-1.el7scon.x86_64
rhscon-core-0.0.32-1.el7scon.x86_64
rhscon-core-selinux-0.0.32-1.el7scon.noarch
rhscon-ui-0.0.46-1.el7scon.noarch

Comment 5 Shubhendu Tripathi 2016-07-14 09:23:08 UTC
Martin, the feature is developed in way that -
 - If nodes are not already accepted they would get accepted while import cluster flow
 - If some node fails accept while import or due to slow network, it would clearly marked as failed node in import cluster task steps

Now while import cluster few cluster fail to accept/initialize and are reported as failed in task, only re-initialize trigger from UI is not going to help as the node might get accepted/initialized but there is no way could be marked as participating in cluster.

To work around this situation, below are few options -
1. Forget the cluster in USM and try to import back again after re-starting the salt-minion services on the storage nodes
2. OR prior to starting import cluster, if no of nodes in the cluster are huge, the nodes are already listed as un-accepted in USM. Accept the nodes before starting import cluster and then trigger import. 

In option-2, if the no of nodes is really huge like hundreds, better accept in batches of nodes as again due to salt-master level limitation of serial accept execution, few nodes might get time out.
This way chances of node accept/initialize failing while import is removed and cluster import should be smooth.

Reverting to ON_QA and I would raise a doc BZ for adding a troubleshooting section for import cluster node failures.

Comment 6 Shubhendu Tripathi 2016-07-14 09:30:11 UTC
Created a doc BZ#1356503 to document the troubleshooting section in admin guide for this.

Comment 7 Martin Kudlej 2016-08-08 11:44:12 UTC
Tested with 
ceph-ansible-1.0.5-32.el7scon.noarch
ceph-installer-1.0.14-1.el7scon.noarch
rhscon-ceph-0.0.40-1.el7scon.x86_64
rhscon-core-0.0.41-1.el7scon.x86_64
rhscon-core-selinux-0.0.41-1.el7scon.noarch
rhscon-ui-0.0.52-1.el7scon.noarch
and it works as it is described in comment #5


Note You need to log in before you can comment on or make changes to this bug.