Created attachment 1260980 [details] Node list Description of problem:When trying to import a cluster, the console sees all accepted nodes. After choosing the calamari node for import, import of storage nodes fails. The failure shows the DNS name for the cluster network interface rather than the public network interface for OSD nodes. Version-Release number of selected component (if applicable):RHCS 2.1, RHSC 2 How reproducible: Seems consistent. I've done it 20 times on this one cluster. Steps to Reproduce: 1.Have ceph nodes with DNS entries for public and cluster interfaces 2.Add nodes to console 3.Import cluster. Actual results:Import fails because the OSD nodes are "Not found". This occurs even though the nodes are reachable on that IP address from all other nodes. Expected results: Additional info:
Created attachment 1260981 [details] import_failed
At the suggestion of someone in GSS, I changed the cluster network to match the public network, and this *somehow* served as a workaround. I would think that a console would not touch the cluster_network at all for importing nodes. What's that about? This needs to be fixed.
*** Bug 1434608 has been marked as a duplicate of this bug. ***
upstream patch which fixes this issue: https://github.com/ceph/calamari/pull/515
*** Bug 1414918 has been marked as a duplicate of this bug. ***
When do you expect it to be pushed to CDN?
Hi Arkady, We are looking at this right now -- are you working to a deadline that we need to account for?
Federico, we are dev complete already. We can wait till the end of next week to lock the bits. Can you release the fix to CDN by that time? If not we will need to put it into overcloud image. Expect that you will still be able to support it in the field. Thanks, Arkady
We were able to reproduce it with: * calamari-server calamari-server-1.5.3-1.el7cp (on RHEL 7.3) * calamari-server 1.5.3-2redhat1xenial (on Ubuntu 16.04 Xenial) And test and VERIFIED with: * calamari-server calamari-server-1.5.5-1.el7cp (on RHEL 7.3) * calamari-server 1.5.5-2redhat1xenial (on Ubuntu 16.04 Xenial) On the "Select Monitor Host" page (during Import Cluster procedure) all hosts are properly listed with the hostname (FQDN) assigned for the IP address of public network interface. But on next step "Cluster Summary", the OSD nodes are listed with hostname (FQDN) assigned for the IP address of cluster network interface, marked as "Not Found" in the Status column and it is not possible to import the cluster. With updated version of calamari-server, all the hosts on the "Cluster Summary" page are listed with hostnames assigned to the public network and it is possible to properly import the cluster. We tested it also with three networks (Public network, Cluster network and 3rd auxiliary network). Both scenarios (with two and three networks) works only when the "main" hostname of the Ceph hosts (which also means that the salt minion ID) was the hostname assigned to the Public network and Console (Skyring server) have access to the Public network.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0978