Bug 1455693

Summary: Single point of failure on Calamari server node
Product: [Red Hat Storage] Red Hat Storage Console Reporter: Stuart James <stuartjames>
Component: coreAssignee: Nishanth Thomas <nthomas>
core sub component: events QA Contact: sds-qe-bugs
Status: CLOSED EOL Docs Contact:
Severity: medium    
Priority: unspecified    
Version: 2   
Target Milestone: ---   
Target Release: 3   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-19 05:42:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Pool creation failure
none
Initial cluster import monitor selection none

Description Stuart James 2017-05-25 20:32:17 UTC
Created attachment 1282374 [details]
Pool creation failure

Description of problem:

When importing a cluster you must select a single Monitor node, this monitor node must be running the Calamari-server. The cluster imports but does not utilize other monitor nodes that contain calamari-server for resilience, if you turn off the monitor node (simulating failure of device) the RHSCON can no longer perform common operations.


Version-Release number of selected component (if applicable):
calamari-server-1.5.5-1.el7cp.x86_64
python-cephfs-10.2.5-37.el7cp.x86_64
ceph-selinux-10.2.5-37.el7cp.x86_64
libcephfs1-10.2.5-37.el7cp.x86_64
ceph-base-10.2.5-37.el7cp.x86_64
ceph-mon-10.2.5-37.el7cp.x86_64
ceph-common-10.2.5-37.el7cp.x86_64
rhscon-core-0.0.45-1.el7scon.x86_64
rhscon-ceph-0.0.43-1.el7scon.x86_64
rhscon-core-selinux-0.0.45-1.el7scon.noarch
rhscon-ui-0.0.60-1.el7scon.noarch


How reproducible:
Every time

Steps to Reproduce:
1. Import cluster from RHSCON
2. Turn off monitor node used during import
3. Attempt to create a pool

Actual results:
Failure to create pool

Expected results:
Pool should be create

Additional info:
The import process detects all monitor nodes, these monitor nodes should all be listed as possible calamari-server servers, it appears that the monitor node used to import the cluster is hard coded as the only calamari-server. If a monitor node is down then the RHSCON should simply use one of the other additional monitor nodes.


May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch block_device_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: rbd du --cluster ceph -p rbd --format=json. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:126 func1] skyring:25558e53-f529-4367-8d61-e37d65caf6ee-
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch slu_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: ceph osd df --cluster ceph -f json. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:126 func1] skyring:25558e53-f529-4367-8d61-e37d65caf6ee-Unable to fetch PG Details from mon rhceph1.kaycero.com of cluster ceph.Error: Error executing request: Error executing the request: Get https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/sync_object/pg_summary?format=json: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch cluster_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: ceph df --cluster ceph. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host

Comment 3 Stuart James 2017-05-25 20:32:53 UTC
Created attachment 1282375 [details]
Initial cluster import monitor selection

Comment 4 Shubhendu Tripathi 2018-11-19 05:42:49 UTC
This product is EOL now