Bug 1430104 - Console import fails by attempting to import cluster network interface
Summary: Console import fails by attempting to import cluster network interface
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Calamari
Version: 2.2
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 2.2
Assignee: Boris Ranto
QA Contact: Martin Kudlej
URL:
Whiteboard:
: 1414918 1434608 (view as bug list)
Depends On:
Blocks: 1356451
TreeView+ depends on / blocked
 
Reported: 2017-03-07 21:24 UTC by Tupper Cole
Modified: 2020-06-11 13:23 UTC (History)
19 users (show)

Fixed In Version: calamari-server-1.5.5-1.el7cp calamari_server_1.5.5-2redhat1xenial
Doc Type: Bug Fix
Doc Text:
Previously, the Calamari API attempted to resolve the Ceph cluster network address instead of the Ceph public network address. However, the cluster network address does not have to be resolvable. Consequently, when Calamari failed to resolve the cluster network address, it failed to recognize Ceph hosts. With this update, Calamari attempts to resolve the public network address instead of the cluster one. As a result, Calamari recognizes the hosts correctly. Note that Red Hat Storage Console 2 can import Ceph cluster only if the primary Ceph host name is bounded with the public network on every Ceph node.
Clone Of:
Environment:
Last Closed: 2017-04-17 14:31:56 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2944461 0 None None None 2017-05-10 03:41:10 UTC
Red Hat Knowledge Base (Solution) 2977771 0 None None None 2017-03-31 15:46:15 UTC
Red Hat Product Errata RHBA-2017:0978 0 normal SHIPPED_LIVE Red Hat Ceph Storage 2.2 bug fix update 2017-04-17 18:31:39 UTC

Description Tupper Cole 2017-03-07 21:24:16 UTC
Created attachment 1260980 [details]
Node list

Description of problem:When trying to import a cluster, the console sees all accepted nodes. After choosing the calamari node for import, import of storage nodes fails. The failure shows the DNS name for the cluster network interface rather than the public network interface for OSD nodes.  


Version-Release number of selected component (if applicable):RHCS 2.1, RHSC 2


How reproducible: Seems consistent. I've done it 20 times on this one cluster.


Steps to Reproduce:
1.Have ceph nodes with DNS entries for public and cluster interfaces
2.Add nodes to console
3.Import cluster. 

Actual results:Import fails because the OSD nodes are "Not found". This occurs even though the nodes are reachable on that IP address from all other nodes. 




Expected results:


Additional info:

Comment 2 Tupper Cole 2017-03-07 21:25:15 UTC
Created attachment 1260981 [details]
import_failed

Comment 4 Tupper Cole 2017-03-07 23:02:11 UTC
At the suggestion of someone in GSS, I changed the cluster network to match the public network, and this *somehow* served as a workaround. I would think that a console would not touch the cluster_network at all for importing nodes. What's that about? 

This needs to be fixed.

Comment 18 Vikhyat Umrao 2017-03-31 15:33:24 UTC
*** Bug 1434608 has been marked as a duplicate of this bug. ***

Comment 20 Vikhyat Umrao 2017-03-31 15:55:40 UTC
upstream patch which fixes this issue: https://github.com/ceph/calamari/pull/515

Comment 29 Ken Dreyer (Red Hat) 2017-04-06 20:47:23 UTC
*** Bug 1414918 has been marked as a duplicate of this bug. ***

Comment 30 arkady kanevsky 2017-04-06 20:58:50 UTC
When do you expect it to be pushed to CDN?

Comment 31 Federico Lucifredi 2017-04-07 23:12:00 UTC
Hi Arkady,
  We are looking at this right now -- are you working to a deadline that we need to account for?

Comment 32 arkady kanevsky 2017-04-07 23:18:30 UTC
Federico,
we are dev complete already. We can wait till the end of next week to lock the bits. Can you release the fix to CDN by that time?
If not we will need to put it into overcloud image.
Expect that you will still be able to support it in the field.
Thanks,
Arkady

Comment 35 Martin Kudlej 2017-04-12 19:52:06 UTC
We were able to reproduce it with:
  * calamari-server calamari-server-1.5.3-1.el7cp (on RHEL 7.3)
  * calamari-server 1.5.3-2redhat1xenial (on Ubuntu 16.04 Xenial)

And test and VERIFIED with:
  * calamari-server calamari-server-1.5.5-1.el7cp (on RHEL 7.3)
  * calamari-server 1.5.5-2redhat1xenial (on Ubuntu 16.04 Xenial)


On the "Select Monitor Host" page (during Import Cluster procedure) all hosts are properly listed with the hostname (FQDN) assigned for the IP address of public network interface. But on next step "Cluster Summary", the OSD nodes are listed with hostname (FQDN) assigned for the IP address of cluster network interface, marked as "Not Found" in the Status column and it is not possible to import the cluster.

With updated version of calamari-server, all the hosts on the "Cluster Summary" page are listed with hostnames assigned to the public network and it is possible to properly import the cluster.

We tested it also with three networks (Public network, Cluster network and 3rd auxiliary network).

Both scenarios (with two and three networks) works only when the "main" hostname of the Ceph hosts (which also means that the salt minion ID) was the hostname assigned to the Public network and Console (Skyring server) have access to the Public network.

Comment 37 errata-xmlrpc 2017-04-17 14:31:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0978


Note You need to log in before you can comment on or make changes to this bug.