Description of problem: I am trying to Import the Ceph cluster into USM. this cluster was upgraded from 1.3.2 to 2.0. Version-Release number of selected component (if applicable): Ceph: 10.2.2-21.el7cp (d06518f91ef88a8a7263dfa9c6ee51d5f9163abc) calamari-server-1.4.5-1.el7cp.x86_64 rhscon-ceph-0.0.33-1.el7scon.x86_64 rhscon-ui-0.0.47-1.el7scon.noarch rhscon-core-selinux-0.0.34-1.el7scon.noarch rhscon-core-0.0.34-1.el7scon.x86_64 How reproducible: Always Steps to Reproduce: 1. Create a Ceph 1.3.2 cluster with nodes : - Admin + calamari server (1.3.3.1) - MON - OSD 2. Install USM packages on the Admin node ( rhscon-core rhscon-ceph rhscon-ui) and install the rhscon-agent on the MON's and OSD's. 3. Setup the USM server(skyrng setup) 4. Upgrade the Ceph cluster from 1.3.2 to 2.0. 5. Install calamari-lite(1.4.5.1) and on the first MON node, and configure /etc/salt/minion.d/ceph_agent.conf 6. The salt-keys are still accepted by the older calamari-server, so delete the salt-keys on the admin. Now these keys are ready to be accepted by the USM server. 7. Import the cluster to USM using the GUI. Actual results: Import is failing Expected results: Import should succeed Additional info: System is in same state: USM: magna009 MON: magna031 OSD: magna046 magna052 Error seen from UI: "Failed to retrive cluster information from the selected host 'magna031.ceph.redhat.com'. Please select a monitor host and try again" Skyring.log: 2016-07-14T13:22:08.396Z INFO auth.go:138 Login] User: admin already logged in 2016-07-14T13:23:04.9Z ERROR request-router.go:69 GetProviderFromClusterId] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Error getting details for cluster: 00000000-0000-0000-0000-000000000000. error: not found 2016-07-14T13:23:04.9Z ERROR request-router.go:82 RouteProviderEvents] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Error getting provider for cluster: 00000000-0000-0000-0000-000000000000 2016-07-14T13:23:04.9Z ERROR handler.go:457 node_lost_handler] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Event:Node connectivity changed could not be handled for node: magna031.ceph.redhat.com. error: Error getting provider for cluster: 00000000-0000-0000-0000-000000000000 2016-07-14T13:23:04.901Z ERROR mailnotifier.go:33 MailNotifier] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Unable to read MailNotifier from DB: <nil> 2016-07-14T13:23:04.901Z WARNING mail_notifier.go:119 getNotifier] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Unable to read MailNotifier from DB: can't find Mail Notifier 2016-07-14T13:23:04.901Z WARNING mail_notifier.go:159 MailNotify] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Could not Get the notifier. Error: can't find Mail Notifier 2016-07-14T13:23:04.901Z ERROR audit_logger.go:46 AuditLog] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Could not send mail for event: Node connectivity changed 2016-07-14T13:24:11.067Z ERROR import_cluster.go:89 GET_ClusterNodesForImport] admin:74706d8c-d227-42cd-ac87-b1ace0f946be-Error getting import cluster details Bigfin.log: 2016-07-14T11:33:46.121Z WARNING handler.go:60 HttpGet] Session seems invalidated. Trying to login again. 2016-07-14T11:33:46.121Z ERROR handler.go:61 HttpGet] Session: &{0xc82019a090 <nil> 0xc820032028 0} 2016-07-14T11:33:46.121Z ERROR handler.go:62 HttpGet] Mon: magna031.ceph.redhat.com 2016-07-14T11:33:46.122Z ERROR handler.go:63 HttpGet] URL: https://magna031.ceph.redhat.com:8002/api/v2/auth/login/ 2016-07-14T11:33:46.276Z ERROR handler.go:67 HttpGet] CSRF: 2016-07-14T11:33:46.276Z ERROR handler.go:68 HttpGet] Get URL: https://magna031.ceph.redhat.com:8002/api/v2/cluster?format=json 2016-07-14T11:33:46.328Z ERROR api.go:852 GetCluster] Resp Body: {"detail": "Authentication credentials were not provided."} 2016-07-14T11:33:46.328Z ERROR import_cluster.go:49 GetClusterNodesForImport] admin:dc9d6ff5-7dc1-460c-befa-e262314c594e-Error getting cluster details. error: json: cannot unmarshal object into Go value of type []backend.CephCluster 2016-07-14T11:38:34.265Z ERROR handler.go:167 csrfTokenFromSession] Cokkies: [] 2016-07-14T11:38:34.265Z ERROR handler.go:174 csrfTokenFromSession] Returning blank XSRF 2016-07-14T11:38:34.28Z WARNING handler.go:60 HttpGet] Session seems invalidated. Trying to login again. 2016-07-14T11:38:34.28Z ERROR handler.go:61 HttpGet] Session: &{0xc820192000 <nil> 0xc820154010 0} 2016-07-14T11:38:34.28Z ERROR handler.go:62 HttpGet] Mon: magna031.ceph.redhat.com 2016-07-14T11:38:34.28Z ERROR handler.go:63 HttpGet] URL: https://magna031.ceph.redhat.com:8002/api/v2/auth/login/ 2016-07-14T11:38:34.43Z ERROR handler.go:67 HttpGet] CSRF: 2016-07-14T11:38:34.43Z ERROR handler.go:68 HttpGet] Get URL: https://magna031.ceph.redhat.com:8002/api/v2/cluster?format=json 2016-07-14T11:38:34.482Z ERROR api.go:852 GetCluster] Resp Body: {"detail": "Authentication credentials were not provided."} 2016-07-14T11:38:34.482Z ERROR import_cluster.go:49 GetClusterNodesForImport] admin:b7b0fd08-c505-4fe2-b4d1-03953a0f56f9-Error getting cluster details. error: json: cannot unmarshal object into Go value of type []backend.CephCluster 2016-07-14T12:46:55.173Z ERROR handler.go:167 csrfTokenFromSession] Cokkies: [] 2016-07-14T12:46:55.173Z ERROR handler.go:174 csrfTokenFromSession] Returning blank XSRF 2016-07-14T12:46:55.189Z WARNING handler.go:60 HttpGet] Session seems invalidated. Trying to login again. 2016-07-14T12:46:55.189Z ERROR handler.go:61 HttpGet] Session: &{0xc82014a240 <nil> 0xc82010a2f8 0} 2016-07-14T12:46:55.189Z ERROR handler.go:62 HttpGet] Mon: magna031.ceph.redhat.com 2016-07-14T12:46:55.189Z ERROR handler.go:63 HttpGet] URL: https://magna031.ceph.redhat.com:8002/api/v2/auth/login/ 2016-07-14T12:46:55.364Z ERROR handler.go:67 HttpGet] CSRF: 2016-07-14T12:46:55.364Z ERROR handler.go:68 HttpGet] Get URL: https://magna031.ceph.redhat.com:8002/api/v2/cluster?format=json 2016-07-14T12:46:55.416Z ERROR api.go:852 GetCluster] Resp Body: {"detail": "Authentication credentials were not provided."} 2016-07-14T12:46:55.416Z ERROR import_cluster.go:49 GetClusterNodesForImport] admin:e3ce6e01-827c-4b29-b439-d959ed4b3e11-Error getting cluster details. error: json: cannot unmarshal object into Go value of type []backend.CephCluster 2016-07-14T13:24:10.858Z ERROR handler.go:167 csrfTokenFromSession] Cokkies: [] 2016-07-14T13:24:10.859Z ERROR handler.go:174 csrfTokenFromSession] Returning blank XSRF 2016-07-14T13:24:10.873Z WARNING handler.go:60 HttpGet] Session seems invalidated. Trying to login again. 2016-07-14T13:24:10.873Z ERROR handler.go:61 HttpGet] Session: &{0xc8201ac000 <nil> 0xc8201b0000 0} 2016-07-14T13:24:10.873Z ERROR handler.go:62 HttpGet] Mon: magna031.ceph.redhat.com 2016-07-14T13:24:10.873Z ERROR handler.go:63 HttpGet] URL: https://magna031.ceph.redhat.com:8002/api/v2/auth/login/ 2016-07-14T13:24:11.015Z ERROR handler.go:67 HttpGet] CSRF: 2016-07-14T13:24:11.015Z ERROR handler.go:68 HttpGet] Get URL: https://magna031.ceph.redhat.com:8002/api/v2/cluster?format=json 2016-07-14T13:24:11.067Z ERROR api.go:852 GetCluster] Resp Body: {"detail": "Authentication credentials were not provided."} 2016-07-14T13:24:11.067Z ERROR import_cluster.go:49 GetClusterNodesForImport] admin:74706d8c-d227-42cd-ac87-b1ace0f946be-Error getting cluster details. error: json: cannot unmarshal object into Go value of type []backend.CephCluster
Hi Tejas, did you configure calamari user "admin" with password "admin"? See Bug 1345983.
Gregory, We need some help from you to understand why CSRF token could be coming as blank for setup where it migrated to higher versions of ceph and calamari. After migration we had done "calamarictl inistailize" with admin user name, password. We suspected firewalld and selinux and tried with disabled mode as well but no luck. Request to have a look at the logs and let us know if you you can make out something.
I tried simulating the issue today and find import cluster working with one work-around. The steps followed are as below - Step-1: Created a ceph 1.3.2 cluster with 1 MON and 2 OSD nodes (total 8 osds). Thanks to Tejas for getting this in place :) Step-2: Followed Red Hat Ceph Storage Installation guide, section-5.1/5.2 and upgraded the MON and OSD nodes to higher versions (2.0). Step-3: In this case calamari-1.3.3 was installed on the MON node only, so removed all calamari related rpms (diamond, graphite, salt-minion, calamari) using the command "yum remove" Step-4: After upgrade rebooted all the MON and OSD nodes and verified that "ceph -s" works fine Step-5: Created a fresh node and installed latest USM server bits and executed skyring-setup Step-6: On MON and OSD nodes added the USM agent repo and bootstrapped the nodes with ceph-installer using the ceph-installer apis /setup/ and /setup/agent Step-7: Verified that salt-minions are up and running on MON and OSD nodes and salt-master running on USM server node. Verified if salt-minions on MON and OSD nodes are properly referring USM server node's salt-master by looking at /etc/saly/minion.d/ceph_agent.conf Step-8: Opened USM UI and selected option for import cluster. All the 3 nodes (1 MON + 2 OSDs) get listed. Selected the MON node where calamari is up and running now. Continued and it listed all the participating nodes of the cluster. Submitted and cluster import started as a task. There was a glitch here while importing cluster as "rbd ls" command was hanging in MON node. Suspected default created pool "rbd" and removed the same using command "ceph osd pool delete rbd rbd --yes-i-really-really-mean-it" Step-9: Once default created pool "rbd" was removed, the import cluster task got completed immediately as it was still waiting for calamari to respond. SO, THIS WAY USM IMPORT CLUSTER FEATURE I FIND WORKING PERFECTLY FINE. @Gregory, I have a question here, why the default created pool "rbd" is causing issue and command "rbd ls" hangs. Cluster was and is in WARN state only. @Tejas, as discussed in person, this whole flow I tried with rhevm VMs and once more you would like to simulate the same with magna nodes in QE setup.
Today self and Tejas tried the import cluster cluster flow with migrated cluster from 1.3.2 to 2.0 and it properly imported the cluster in USM. I suspect in old setup there was some cleanup issue with calamari old version as we need to do a "calamari-ctl clear" for old and then "calamari-ctl initialize" and it works fie. Tejas, please move/close the BZ accordingly. Thanks
(In reply to Tejas from comment #7) > Thanks Shubhendu for helping me with this bug. I was not aware of 2 steps > that needed to be done for this. So I will move this bug to Doc, as we need > this upgrade process to be documented. > > Here are the list of steps for Calamari and USM setup on a ceph upgrade: > > 1. Create a Ceph 1.3.2 cluster with nodes : > - Admin + calamari server (1.3.3.1) > - MON > - OSD > > 2. Install USM packages on the Admin node ( rhscon-core rhscon-ceph > rhscon-ui) > and install the rhscon-agent on the MON's and OSD's. > > 3. Setup the USM server(skyrng setup) > > 4. Upgrade the Ceph cluster from 1.3.2 to 2.0. > > 5. Remove the Calamari Salt packages, diamond and graphite packages from Admin node. > > 6. Also remove the /etc/salt/minion.d/calamari.conf and > /etc/salt/pki/minion/* files on all the MON and OSD nodes. > > 7. Install latest calamari-server (say calamari-lite 1.4.5.1) on the first MON node, and configure /etc/salt/minion.d/ceph_agent.conf 8. Run "calamari-ctl clear --yes-i-am-sure" and "calamari-ctl initialize --admin-username admin --admin-password admin --admin-email junk" on calamari node. Then restart supervisord service once on node. > > 9. The salt-keys are still accepted by the older calamari-server, so delete > the salt-keys on the admin. Now these keys are ready to be accepted by the > USM server. > > 10. Import the cluster to USM using the GUI. > > > Thanks, > Tejas
Thanks Aron. About comment 12 step 2, yes you are right. I suggest we ask the customers to delete the older packages first, before asking them to install the new RHSC. That is step 2 and 3 should come before step 1. Shubhendu, Can you let us know if our thinking is right? Thanks, Tejas
Thats correct I feel. We might ask the user to manually remove older version of salt from storage nodes and then migrate cluster to higher version of ceph. What you say Tejas?
Thanks Shubhendu, yes we can ask customers to delete older version before ceph upgrade. But I feel ceph upgrade has nothing to do with the Calamari and USM setup on the new admin node. We can do these steps even after a ceph upgrade. Thanks, Tejas
I feel this section is clear to the customer.