Bug 1455693 - Single point of failure on Calamari server node
Summary: Single point of failure on Calamari server node
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat Storage
Component: core
Version: 2
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
: 3
Assignee: Nishanth Thomas
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-25 20:32 UTC by Stuart James
Modified: 2018-11-19 05:43 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 05:42:49 UTC
Embargoed:


Attachments (Terms of Use)
Pool creation failure (11.64 KB, image/png)
2017-05-25 20:32 UTC, Stuart James
no flags Details
Initial cluster import monitor selection (36.60 KB, image/png)
2017-05-25 20:32 UTC, Stuart James
no flags Details

Description Stuart James 2017-05-25 20:32:17 UTC
Created attachment 1282374 [details]
Pool creation failure

Description of problem:

When importing a cluster you must select a single Monitor node, this monitor node must be running the Calamari-server. The cluster imports but does not utilize other monitor nodes that contain calamari-server for resilience, if you turn off the monitor node (simulating failure of device) the RHSCON can no longer perform common operations.


Version-Release number of selected component (if applicable):
calamari-server-1.5.5-1.el7cp.x86_64
python-cephfs-10.2.5-37.el7cp.x86_64
ceph-selinux-10.2.5-37.el7cp.x86_64
libcephfs1-10.2.5-37.el7cp.x86_64
ceph-base-10.2.5-37.el7cp.x86_64
ceph-mon-10.2.5-37.el7cp.x86_64
ceph-common-10.2.5-37.el7cp.x86_64
rhscon-core-0.0.45-1.el7scon.x86_64
rhscon-ceph-0.0.43-1.el7scon.x86_64
rhscon-core-selinux-0.0.45-1.el7scon.noarch
rhscon-ui-0.0.60-1.el7scon.noarch


How reproducible:
Every time

Steps to Reproduce:
1. Import cluster from RHSCON
2. Turn off monitor node used during import
3. Attempt to create a pool

Actual results:
Failure to create pool

Expected results:
Pool should be create

Additional info:
The import process detects all monitor nodes, these monitor nodes should all be listed as possible calamari-server servers, it appears that the monitor node used to import the cluster is hard coded as the only calamari-server. If a monitor node is down then the RHSCON should simply use one of the other additional monitor nodes.


May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch block_device_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: rbd du --cluster ceph -p rbd --format=json. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:126 func1] skyring:25558e53-f529-4367-8d61-e37d65caf6ee-
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch slu_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: ceph osd df --cluster ceph -f json. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:126 func1] skyring:25558e53-f529-4367-8d61-e37d65caf6ee-Unable to fetch PG Details from mon rhceph1.kaycero.com of cluster ceph.Error: Error executing request: Error executing the request: Get https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/sync_object/pg_summary?format=json: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch cluster_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: ceph df --cluster ceph. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host

Comment 3 Stuart James 2017-05-25 20:32:53 UTC
Created attachment 1282375 [details]
Initial cluster import monitor selection

Comment 4 Shubhendu Tripathi 2018-11-19 05:42:49 UTC
This product is EOL now


Note You need to log in before you can comment on or make changes to this bug.