Bug 1356619 - [USM] Import Cluster fails on an upgraded ceph cluster from 1.3.2 to 2.0
Summary: [USM] Import Cluster fails on an upgraded ceph cluster from 1.3.2 to 2.0
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Storage Console
Classification: Red Hat Storage
Component: documentation
Version: 2
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: 2
Assignee: Aron Gunn
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-14 13:27 UTC by Tejas
Modified: 2018-11-19 05:31 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 05:31:40 UTC
Embargoed:


Attachments (Terms of Use)

Description Tejas 2016-07-14 13:27:47 UTC
Description of problem:
I am trying to Import the Ceph cluster into USM. this cluster was upgraded from 1.3.2 to 2.0.

Version-Release number of selected component (if applicable):
Ceph: 10.2.2-21.el7cp (d06518f91ef88a8a7263dfa9c6ee51d5f9163abc)
calamari-server-1.4.5-1.el7cp.x86_64
rhscon-ceph-0.0.33-1.el7scon.x86_64
rhscon-ui-0.0.47-1.el7scon.noarch
rhscon-core-selinux-0.0.34-1.el7scon.noarch
rhscon-core-0.0.34-1.el7scon.x86_64



How reproducible:
Always

Steps to Reproduce:
1. Create a Ceph 1.3.2 cluster with nodes :
- Admin + calamari server (1.3.3.1)
- MON
- OSD

2. Install  USM packages on the Admin node ( rhscon-core rhscon-ceph rhscon-ui)
 and install the rhscon-agent on the MON's and OSD's.

3. Setup the USM server(skyrng setup)

4. Upgrade the Ceph cluster from 1.3.2 to 2.0.

5. Install calamari-lite(1.4.5.1) and on the first MON node, and configure
/etc/salt/minion.d/ceph_agent.conf

6. The salt-keys are still accepted by the older calamari-server, so delete the salt-keys on the admin. Now these keys are ready to be accepted by the USM server.

7. Import the cluster to USM using the GUI.

Actual results:
Import is failing

Expected results:
Import should succeed

Additional info:


System is in same state:
USM: magna009
MON: magna031
OSD: magna046 magna052



Error seen from UI:
"Failed to retrive cluster information from the selected host 'magna031.ceph.redhat.com'. Please select a monitor host and try again"





Skyring.log:
2016-07-14T13:22:08.396Z INFO     auth.go:138 Login] User: admin already logged in
2016-07-14T13:23:04.9Z ERROR    request-router.go:69 GetProviderFromClusterId] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Error getting details for cluster: 00000000-0000-0000-0000-000000000000. error: not found
2016-07-14T13:23:04.9Z ERROR    request-router.go:82 RouteProviderEvents] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Error getting provider for cluster: 00000000-0000-0000-0000-000000000000
2016-07-14T13:23:04.9Z ERROR    handler.go:457 node_lost_handler] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Event:Node connectivity changed could not be handled for node: magna031.ceph.redhat.com. error: Error getting provider for cluster: 00000000-0000-0000-0000-000000000000
2016-07-14T13:23:04.901Z ERROR    mailnotifier.go:33 MailNotifier] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Unable to read MailNotifier from DB: <nil>
2016-07-14T13:23:04.901Z WARNING  mail_notifier.go:119 getNotifier] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Unable to read MailNotifier from DB: can't find Mail Notifier
2016-07-14T13:23:04.901Z WARNING  mail_notifier.go:159 MailNotify] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Could not Get the notifier. Error: can't find Mail Notifier
2016-07-14T13:23:04.901Z ERROR    audit_logger.go:46 AuditLog] skyring:bd6fde6a-99dd-4c06-99cc-9917e329d6e0-Could not send mail for event: Node connectivity changed
2016-07-14T13:24:11.067Z ERROR    import_cluster.go:89 GET_ClusterNodesForImport] admin:74706d8c-d227-42cd-ac87-b1ace0f946be-Error getting import cluster details


Bigfin.log:
2016-07-14T11:33:46.121Z WARNING  handler.go:60 HttpGet] Session seems invalidated. Trying to login again.
2016-07-14T11:33:46.121Z ERROR    handler.go:61 HttpGet] Session: &{0xc82019a090 <nil> 0xc820032028 0}
2016-07-14T11:33:46.121Z ERROR    handler.go:62 HttpGet] Mon: magna031.ceph.redhat.com
2016-07-14T11:33:46.122Z ERROR    handler.go:63 HttpGet] URL: https://magna031.ceph.redhat.com:8002/api/v2/auth/login/
2016-07-14T11:33:46.276Z ERROR    handler.go:67 HttpGet] CSRF: 
2016-07-14T11:33:46.276Z ERROR    handler.go:68 HttpGet] Get URL: https://magna031.ceph.redhat.com:8002/api/v2/cluster?format=json
2016-07-14T11:33:46.328Z ERROR    api.go:852 GetCluster] Resp Body: {"detail": "Authentication credentials were not provided."}
2016-07-14T11:33:46.328Z ERROR    import_cluster.go:49 GetClusterNodesForImport] admin:dc9d6ff5-7dc1-460c-befa-e262314c594e-Error getting cluster details. error: json: cannot unmarshal object into Go value of type []backend.CephCluster
2016-07-14T11:38:34.265Z ERROR    handler.go:167 csrfTokenFromSession] Cokkies: []
2016-07-14T11:38:34.265Z ERROR    handler.go:174 csrfTokenFromSession] Returning blank XSRF
2016-07-14T11:38:34.28Z WARNING  handler.go:60 HttpGet] Session seems invalidated. Trying to login again.
2016-07-14T11:38:34.28Z ERROR    handler.go:61 HttpGet] Session: &{0xc820192000 <nil> 0xc820154010 0}
2016-07-14T11:38:34.28Z ERROR    handler.go:62 HttpGet] Mon: magna031.ceph.redhat.com
2016-07-14T11:38:34.28Z ERROR    handler.go:63 HttpGet] URL: https://magna031.ceph.redhat.com:8002/api/v2/auth/login/
2016-07-14T11:38:34.43Z ERROR    handler.go:67 HttpGet] CSRF: 
2016-07-14T11:38:34.43Z ERROR    handler.go:68 HttpGet] Get URL: https://magna031.ceph.redhat.com:8002/api/v2/cluster?format=json
2016-07-14T11:38:34.482Z ERROR    api.go:852 GetCluster] Resp Body: {"detail": "Authentication credentials were not provided."}
2016-07-14T11:38:34.482Z ERROR    import_cluster.go:49 GetClusterNodesForImport] admin:b7b0fd08-c505-4fe2-b4d1-03953a0f56f9-Error getting cluster details. error: json: cannot unmarshal object into Go value of type []backend.CephCluster
2016-07-14T12:46:55.173Z ERROR    handler.go:167 csrfTokenFromSession] Cokkies: []
2016-07-14T12:46:55.173Z ERROR    handler.go:174 csrfTokenFromSession] Returning blank XSRF
2016-07-14T12:46:55.189Z WARNING  handler.go:60 HttpGet] Session seems invalidated. Trying to login again.
2016-07-14T12:46:55.189Z ERROR    handler.go:61 HttpGet] Session: &{0xc82014a240 <nil> 0xc82010a2f8 0}
2016-07-14T12:46:55.189Z ERROR    handler.go:62 HttpGet] Mon: magna031.ceph.redhat.com
2016-07-14T12:46:55.189Z ERROR    handler.go:63 HttpGet] URL: https://magna031.ceph.redhat.com:8002/api/v2/auth/login/
2016-07-14T12:46:55.364Z ERROR    handler.go:67 HttpGet] CSRF: 
2016-07-14T12:46:55.364Z ERROR    handler.go:68 HttpGet] Get URL: https://magna031.ceph.redhat.com:8002/api/v2/cluster?format=json
2016-07-14T12:46:55.416Z ERROR    api.go:852 GetCluster] Resp Body: {"detail": "Authentication credentials were not provided."}
2016-07-14T12:46:55.416Z ERROR    import_cluster.go:49 GetClusterNodesForImport] admin:e3ce6e01-827c-4b29-b439-d959ed4b3e11-Error getting cluster details. error: json: cannot unmarshal object into Go value of type []backend.CephCluster
2016-07-14T13:24:10.858Z ERROR    handler.go:167 csrfTokenFromSession] Cokkies: []
2016-07-14T13:24:10.859Z ERROR    handler.go:174 csrfTokenFromSession] Returning blank XSRF
2016-07-14T13:24:10.873Z WARNING  handler.go:60 HttpGet] Session seems invalidated. Trying to login again.
2016-07-14T13:24:10.873Z ERROR    handler.go:61 HttpGet] Session: &{0xc8201ac000 <nil> 0xc8201b0000 0}
2016-07-14T13:24:10.873Z ERROR    handler.go:62 HttpGet] Mon: magna031.ceph.redhat.com
2016-07-14T13:24:10.873Z ERROR    handler.go:63 HttpGet] URL: https://magna031.ceph.redhat.com:8002/api/v2/auth/login/
2016-07-14T13:24:11.015Z ERROR    handler.go:67 HttpGet] CSRF: 
2016-07-14T13:24:11.015Z ERROR    handler.go:68 HttpGet] Get URL: https://magna031.ceph.redhat.com:8002/api/v2/cluster?format=json
2016-07-14T13:24:11.067Z ERROR    api.go:852 GetCluster] Resp Body: {"detail": "Authentication credentials were not provided."}
2016-07-14T13:24:11.067Z ERROR    import_cluster.go:49 GetClusterNodesForImport] admin:74706d8c-d227-42cd-ac87-b1ace0f946be-Error getting cluster details. error: json: cannot unmarshal object into Go value of type []backend.CephCluster

Comment 2 Daniel Horák 2016-07-15 12:26:54 UTC
Hi Tejas,
did you configure calamari user "admin" with password "admin"?

See Bug 1345983.

Comment 3 Shubhendu Tripathi 2016-07-15 13:06:39 UTC
Gregory,

We need some help from you to understand why CSRF token could be coming as blank for setup where it migrated to higher versions of ceph and calamari.

After migration we had done "calamarictl inistailize" with admin user name, password.

We suspected firewalld and selinux and tried with disabled mode as well but no luck. 

Request to have a look at the logs and let us know if you you can make out something.

Comment 5 Shubhendu Tripathi 2016-07-18 14:53:45 UTC
I tried simulating the issue today and find import cluster working with one work-around. The steps followed are as below -

Step-1: Created a ceph 1.3.2 cluster with 1 MON and 2 OSD nodes (total 8 osds). Thanks to Tejas for getting this in place :)

Step-2: Followed Red Hat Ceph Storage Installation guide, section-5.1/5.2 and upgraded the MON and OSD nodes to higher versions (2.0).

Step-3: In this case calamari-1.3.3 was installed on the MON node only, so removed all calamari related rpms (diamond, graphite, salt-minion, calamari) using the command "yum remove"

Step-4: After upgrade rebooted all the MON and OSD nodes and verified that "ceph -s" works fine

Step-5: Created a fresh node and installed latest USM server bits and executed skyring-setup

Step-6: On MON and OSD nodes added the USM agent repo and bootstrapped the nodes with ceph-installer using the ceph-installer apis /setup/ and /setup/agent

Step-7: Verified that salt-minions are up and running on MON and OSD nodes and salt-master running on USM server node. Verified if salt-minions on MON and OSD nodes are properly referring USM server node's salt-master by looking at /etc/saly/minion.d/ceph_agent.conf

Step-8: Opened USM UI and selected option for import cluster. All the 3 nodes (1 MON + 2 OSDs) get listed. Selected the MON node where calamari is up and running now. Continued and it listed all the participating nodes of the cluster. Submitted and cluster import started as a task.

There was a glitch here while importing cluster as "rbd ls" command was hanging in MON node. Suspected default created pool "rbd" and removed the same using command "ceph osd pool delete rbd rbd --yes-i-really-really-mean-it"

Step-9: Once default created pool "rbd" was removed, the import cluster task got completed immediately as it was still waiting for calamari to respond.

SO, THIS WAY USM IMPORT CLUSTER FEATURE I FIND WORKING PERFECTLY FINE.

@Gregory, I have a question here, why the default created pool "rbd" is causing issue and command "rbd ls" hangs. Cluster was and is in WARN state only.

@Tejas, as discussed in person, this whole flow I tried with rhevm VMs and once more you would like to simulate the same with magna nodes in QE setup.

Comment 6 Shubhendu Tripathi 2016-07-19 06:49:02 UTC
Today self and Tejas tried the import cluster cluster flow with migrated cluster from 1.3.2 to 2.0 and it properly imported the cluster in USM.

I suspect in old setup there was some cleanup issue with calamari old version as we need to do a "calamari-ctl clear" for old and then "calamari-ctl initialize" and it works fie.

Tejas, please move/close the BZ accordingly.
Thanks

Comment 8 Shubhendu Tripathi 2016-07-19 09:01:45 UTC
(In reply to Tejas from comment #7)
> Thanks Shubhendu for helping me with this bug. I was not aware of 2 steps
> that needed to be done for this. So I will move this bug to Doc, as we need
> this upgrade process to be documented.
> 
> Here are the list of steps for Calamari and USM setup on a ceph upgrade:
> 
> 1. Create a Ceph 1.3.2 cluster with nodes :
> - Admin + calamari server (1.3.3.1)
> - MON
> - OSD
> 
> 2. Install  USM packages on the Admin node ( rhscon-core rhscon-ceph
> rhscon-ui)
>  and install the rhscon-agent on the MON's and OSD's.
> 
> 3. Setup the USM server(skyrng setup)
> 
> 4. Upgrade the Ceph cluster from 1.3.2 to 2.0.
> 
> 5. Remove the Calamari Salt packages, diamond and graphite packages from Admin node.
> 
> 6. Also remove the /etc/salt/minion.d/calamari.conf and
> /etc/salt/pki/minion/* files on all the MON and OSD nodes.
> 
> 7. Install latest calamari-server (say calamari-lite 1.4.5.1) on the first MON node, and configure /etc/salt/minion.d/ceph_agent.conf

8. Run "calamari-ctl clear --yes-i-am-sure" and "calamari-ctl initialize --admin-username admin --admin-password admin --admin-email junk" on calamari node. Then restart supervisord service once on node.

> 
> 9. The salt-keys are still accepted by the older calamari-server, so delete
> the salt-keys on the admin. Now these keys are ready to be accepted by the
> USM server.
> 
> 10. Import the cluster to USM using the GUI.
> 
> 
> Thanks,
> Tejas

Comment 15 Tejas 2016-07-27 04:51:28 UTC
Thanks Aron.
About comment 12 step 2, yes you are right. I suggest we ask the customers to delete the older packages first, before asking them to install the new RHSC.

That is step 2 and 3 should come before step 1.

Shubhendu,
    Can you let us know if our thinking is right?

Thanks,
Tejas

Comment 16 Shubhendu Tripathi 2016-07-27 06:33:14 UTC
Thats correct I feel. We might ask the user to manually remove older version of salt from storage nodes and then migrate cluster to higher version of ceph.

What you say Tejas?

Comment 17 Tejas 2016-07-27 07:39:55 UTC
Thanks Shubhendu, yes we can ask customers to delete older version before ceph upgrade.
But I feel ceph upgrade has nothing to do with the Calamari and USM setup on the new admin node. We can do these steps even after a  ceph upgrade.

Thanks,
Tejas

Comment 20 Tejas 2016-08-01 13:58:07 UTC
I feel this section is clear to the customer.


Note You need to log in before you can comment on or make changes to this bug.