Bug 1455693

Summary:

Single point of failure on Calamari server node

Product:

[Red Hat Storage] Red Hat Storage Console

Reporter:

Stuart James <stuartjames>

Component:

core

Assignee:

Nishanth Thomas <nthomas>

core sub component:

events

QA Contact:

sds-qe-bugs

Status:

CLOSED EOL

Docs Contact:

Severity:

medium

Priority:

unspecified

Version:

Target Milestone:

---

Target Release:

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-11-19 05:42:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Pool creation failure	none
Initial cluster import monitor selection	none

Description Stuart James 2017-05-25 20:32:17 UTC

Created attachment 1282374 [details]
Pool creation failure

Description of problem:

When importing a cluster you must select a single Monitor node, this monitor node must be running the Calamari-server. The cluster imports but does not utilize other monitor nodes that contain calamari-server for resilience, if you turn off the monitor node (simulating failure of device) the RHSCON can no longer perform common operations.


Version-Release number of selected component (if applicable):
calamari-server-1.5.5-1.el7cp.x86_64
python-cephfs-10.2.5-37.el7cp.x86_64
ceph-selinux-10.2.5-37.el7cp.x86_64
libcephfs1-10.2.5-37.el7cp.x86_64
ceph-base-10.2.5-37.el7cp.x86_64
ceph-mon-10.2.5-37.el7cp.x86_64
ceph-common-10.2.5-37.el7cp.x86_64
rhscon-core-0.0.45-1.el7scon.x86_64
rhscon-ceph-0.0.43-1.el7scon.x86_64
rhscon-core-selinux-0.0.45-1.el7scon.noarch
rhscon-ui-0.0.60-1.el7scon.noarch


How reproducible:
Every time

Steps to Reproduce:
1. Import cluster from RHSCON
2. Turn off monitor node used during import
3. Attempt to create a pool

Actual results:
Failure to create pool

Expected results:
Pool should be create

Additional info:
The import process detects all monitor nodes, these monitor nodes should all be listed as possible calamari-server servers, it appears that the monitor node used to import the cluster is hard coded as the only calamari-server. If a monitor node is down then the RHSCON should simply use one of the other additional monitor nodes.


May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch block_device_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: rbd du --cluster ceph -p rbd --format=json. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:126 func1] skyring:25558e53-f529-4367-8d61-e37d65caf6ee-
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch slu_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: ceph osd df --cluster ceph -f json. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:126 func1] skyring:25558e53-f529-4367-8d61-e37d65caf6ee-Unable to fetch PG Details from mon rhceph1.kaycero.com of cluster ceph.Error: Error executing request: Error executing the request: Get https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/sync_object/pg_summary?format=json: dial tcp 172.16.67.128:8002: getsockopt: no route to host
May 26 00:03:33 rhscon.example.com skyring[12637]: 2017-05-26T00:03:33.117+01:00 ERROR    monitoring.go:96 getStatsFromCalamariApi] skyring:25558e53-f529-4367-8d61-e37d65caf6ee - Failed to fetch cluster_utilization metrics from cluster dbea3e52-91f5-480a-92c6-09246ae8d12a.Err Failed to execute command: ceph df --cluster ceph. error: Error executing request: Error executing the request: Post https://rhceph1.kaycero.com:8002/api/v2/cluster/dbea3e52-91f5-480a-92c6-09246ae8d12a/cli: dial tcp 172.16.67.128:8002: getsockopt: no route to host

Comment 3 Stuart James 2017-05-25 20:32:53 UTC

Created attachment 1282375 [details]
Initial cluster import monitor selection

Comment 4 Shubhendu Tripathi 2018-11-19 05:42:49 UTC

This product is EOL now