1455693 – Single point of failure on Calamari server node

Bug 1455693 - Single point of failure on Calamari server node

Summary: Single point of failure on Calamari server node

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Red Hat Storage Console
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	2
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3
Assignee:	Nishanth Thomas
QA Contact:	sds-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-25 20:32 UTC by Stuart James
Modified:	2018-11-19 05:43 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-19 05:42:49 UTC
Embargoed:

Attachments	(Terms of Use)
Pool creation failure (11.64 KB, image/png) 2017-05-25 20:32 UTC, Stuart James	no flags	Details
Initial cluster import monitor selection (36.60 KB, image/png) 2017-05-25 20:32 UTC, Stuart James	no flags	Details
View All

Description Stuart James 2017-05-25 20:32:17 UTC

Created attachment 1282374 [details]
Pool creation failure

Description of problem:

When importing a cluster you must select a single Monitor node, this monitor node must be running the Calamari-server. The cluster imports but does not utilize other monitor nodes that contain calamari-server for resilience, if you turn off the monitor node (simulating failure of device) the RHSCON can no longer perform common operations.


Version-Release number of selected component (if applicable):
calamari-server-1.5.5-1.el7cp.x86_64
python-cephfs-10.2.5-37.el7cp.x86_64
ceph-selinux-10.2.5-37.el7cp.x86_64
libcephfs1-10.2.5-37.el7cp.x86_64
ceph-base-10.2.5-37.el7cp.x86_64
ceph-mon-10.2.5-37.el7cp.x86_64
ceph-common-10.2.5-37.el7cp.x86_64
rhscon-core-0.0.45-1.el7scon.x86_64
rhscon-ceph-0.0.43-1.el7scon.x86_64
rhscon-core-selinux-0.0.45-1.el7scon.noarch
rhscon-ui-0.0.60-1.el7scon.noarch


How reproducible:
Every time

Steps to Reproduce:
1. Import cluster from RHSCON
2. Turn off monitor node used during import
3. Attempt to create a pool

Actual results:
Failure to create pool

Expected results:
Pool should be create

Additional info:
The import process detects all monitor nodes, these monitor nodes should all be listed as possible calamari-server servers, it appears that the monitor node used to import the cluster is hard coded as the only calamari-server. If a monitor node is down then the RHSCON should simply use one of the other additional monitor nodes.


May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch block_device_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: rbd du --cluster ceph -p rbd --format=json. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:126 func1] skyring:25558e53-f529-4367-8d61-e37d65caf6ee-
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch slu_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: ceph osd df --cluster ceph -f json. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:126 func1] skyring:25558e53-f529-4367-8d61-e37d65caf6ee-Unable to fetch PG Details from mon rhceph1.kaycero.com of cluster ceph.Error: Error executing request: Error executing the request: Get https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/sync_object/pg_summary?format=json: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch cluster_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: ceph df --cluster ceph. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host

Comment 3 Stuart James 2017-05-25 20:32:53 UTC

Created attachment 1282375 [details]
Initial cluster import monitor selection

Comment 4 Shubhendu Tripathi 2018-11-19 05:42:49 UTC

This product is EOL now

Note You need to log in before you can comment on or make changes to this bug.