Bug 1430104

Summary: Console import fails by attempting to import cluster network interface
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tupper Cole <tcole>
Component: CalamariAssignee: Boris Ranto <branto>
Calamari sub component: Back-end QA Contact: Martin Kudlej <mkudlej>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: alan_bishop, amainkar, arkady_kanevsky, branto, ceph-eng-bugs, flucifre, gmeno, jcall, j_t_williams, kdreyer, ksquizza, mhackett, mkarnik, mkudlej, nthomas, rkanade, sankarshan, vikumar, vumrao
Version: 2.2   
Target Milestone: rc   
Target Release: 2.2   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: calamari-server-1.5.5-1.el7cp calamari_server_1.5.5-2redhat1xenial Doc Type: Bug Fix
Doc Text:
Previously, the Calamari API attempted to resolve the Ceph cluster network address instead of the Ceph public network address. However, the cluster network address does not have to be resolvable. Consequently, when Calamari failed to resolve the cluster network address, it failed to recognize Ceph hosts. With this update, Calamari attempts to resolve the public network address instead of the cluster one. As a result, Calamari recognizes the hosts correctly. Note that Red Hat Storage Console 2 can import Ceph cluster only if the primary Ceph host name is bounded with the public network on every Ceph node.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-17 14:31:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1356451    

Description Tupper Cole 2017-03-07 21:24:16 UTC
Created attachment 1260980 [details]
Node list

Description of problem:When trying to import a cluster, the console sees all accepted nodes. After choosing the calamari node for import, import of storage nodes fails. The failure shows the DNS name for the cluster network interface rather than the public network interface for OSD nodes.  


Version-Release number of selected component (if applicable):RHCS 2.1, RHSC 2


How reproducible: Seems consistent. I've done it 20 times on this one cluster.


Steps to Reproduce:
1.Have ceph nodes with DNS entries for public and cluster interfaces
2.Add nodes to console
3.Import cluster. 

Actual results:Import fails because the OSD nodes are "Not found". This occurs even though the nodes are reachable on that IP address from all other nodes. 




Expected results:


Additional info:

Comment 2 Tupper Cole 2017-03-07 21:25:15 UTC
Created attachment 1260981 [details]
import_failed

Comment 4 Tupper Cole 2017-03-07 23:02:11 UTC
At the suggestion of someone in GSS, I changed the cluster network to match the public network, and this *somehow* served as a workaround. I would think that a console would not touch the cluster_network at all for importing nodes. What's that about? 

This needs to be fixed.

Comment 18 Vikhyat Umrao 2017-03-31 15:33:24 UTC
*** Bug 1434608 has been marked as a duplicate of this bug. ***

Comment 20 Vikhyat Umrao 2017-03-31 15:55:40 UTC
upstream patch which fixes this issue: https://github.com/ceph/calamari/pull/515

Comment 29 Ken Dreyer (Red Hat) 2017-04-06 20:47:23 UTC
*** Bug 1414918 has been marked as a duplicate of this bug. ***

Comment 30 arkady kanevsky 2017-04-06 20:58:50 UTC
When do you expect it to be pushed to CDN?

Comment 31 Federico Lucifredi 2017-04-07 23:12:00 UTC
Hi Arkady,
  We are looking at this right now -- are you working to a deadline that we need to account for?

Comment 32 arkady kanevsky 2017-04-07 23:18:30 UTC
Federico,
we are dev complete already. We can wait till the end of next week to lock the bits. Can you release the fix to CDN by that time?
If not we will need to put it into overcloud image.
Expect that you will still be able to support it in the field.
Thanks,
Arkady

Comment 35 Martin Kudlej 2017-04-12 19:52:06 UTC
We were able to reproduce it with:
  * calamari-server calamari-server-1.5.3-1.el7cp (on RHEL 7.3)
  * calamari-server 1.5.3-2redhat1xenial (on Ubuntu 16.04 Xenial)

And test and VERIFIED with:
  * calamari-server calamari-server-1.5.5-1.el7cp (on RHEL 7.3)
  * calamari-server 1.5.5-2redhat1xenial (on Ubuntu 16.04 Xenial)


On the "Select Monitor Host" page (during Import Cluster procedure) all hosts are properly listed with the hostname (FQDN) assigned for the IP address of public network interface. But on next step "Cluster Summary", the OSD nodes are listed with hostname (FQDN) assigned for the IP address of cluster network interface, marked as "Not Found" in the Status column and it is not possible to import the cluster.

With updated version of calamari-server, all the hosts on the "Cluster Summary" page are listed with hostnames assigned to the public network and it is possible to properly import the cluster.

We tested it also with three networks (Public network, Cluster network and 3rd auxiliary network).

Both scenarios (with two and three networks) works only when the "main" hostname of the Ceph hosts (which also means that the salt minion ID) was the hostname assigned to the Public network and Console (Skyring server) have access to the Public network.

Comment 37 errata-xmlrpc 2017-04-17 14:31:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0978