Bug 1374224 - [RFE] RHCS-2 add a tool to rebuild mon store from OSD
Summary: [RFE] RHCS-2 add a tool to rebuild mon store from OSD
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 1.3.2
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: rc
: 2.1
Assignee: Kefu Chai
QA Contact: shylesh
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1374225 1378288 1383917
TreeView+ depends on / blocked
 
Reported: 2016-09-08 09:43 UTC by Vikhyat Umrao
Modified: 2019-11-14 09:01 UTC (History)
13 users (show)

Fixed In Version: RHEL: ceph-10.2.3-4.el7cp Ubuntu: ceph_10.2.3-5redhat1
Doc Type: Enhancement
Doc Text:
.Support for rebuilding Monitor store from OSD nodes The `ceph-objectstore-tool` and `ceph-monstore-tool` utilities now enables you to rebuild the Monitor database and keyring files from OSD nodes. This ability is especially useful when all Monitors fail at the same time to boot due to an underlying `leveldb` corruption. For details, see the https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/single/administration-guide/#all_monitors_failed_to_start_because_of_a_corrupted_store[All Monitors Failed to Start Because of a Corrupted Store] section in the Administration Guide for Red Hat Ceph Storage 2.
Clone Of:
: 1374225 (view as bug list)
Environment:
Last Closed: 2016-11-22 19:30:39 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 17179 None None None 2016-09-08 09:43:32 UTC
Red Hat Product Errata RHSA-2016:2815 normal SHIPPED_LIVE Moderate: Red Hat Ceph Storage security, bug fix, and enhancement update 2017-03-22 02:06:33 UTC

Description Vikhyat Umrao 2016-09-08 09:43:32 UTC
Description of problem:

[RFE] add a tool to rebuild mon store from OSD

Backport: http://tracker.ceph.com/issues/17179

Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 2.0

Comment 1 Vikhyat Umrao 2016-09-21 07:37:54 UTC
jewel backport PR : https://github.com/ceph/ceph/pull/11126

Comment 20 Ramakrishnan Periyasamy 2016-11-09 16:15:01 UTC
As per this https://bugzilla.redhat.com/show_bug.cgi?id=1378288#c13

unable to verify this bug.

Comment 22 Kefu Chai 2016-11-10 06:12:17 UTC
talked over IRC with shylesh.


please check http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds for details. what you are missing is the cap fields in the keyring


instead of


[mon.]
    key = AQBY5B1Y55SqARAAuCOaRjhvoUig1WTQWa1jOQ==
[client.admin]
    key = AQBY5B1YuHvaAhAApmJ1Qm6HwPJFxvmSRRS3hw==

we should have a keyring file looks like, 

[mon.]
    key = AQBY5B1Y55SqARAAuCOaRjhvoUig1WTQWa1jOQ==
    caps mon = "allow *"
[client.admin]
    key = AQBY5B1YuHvaAhAApmJ1Qm6HwPJFxvmSRRS3hw==
    auid = 0
    caps mds = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"

when "ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /path/to/admin.keyring".

Comment 23 shylesh 2016-11-10 13:56:06 UTC
(In reply to Kefu Chai from comment #22)
> talked over IRC with shylesh.
> 
> 
> please check
> http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-mon/
> #recovery-using-osds for details. what you are missing is the cap fields in
> the keyring
> 
> 
> instead of
> 
> 
> [mon.]
>     key = AQBY5B1Y55SqARAAuCOaRjhvoUig1WTQWa1jOQ==
> [client.admin]
>     key = AQBY5B1YuHvaAhAApmJ1Qm6HwPJFxvmSRRS3hw==
> 
> we should have a keyring file looks like, 
> 
> [mon.]
>     key = AQBY5B1Y55SqARAAuCOaRjhvoUig1WTQWa1jOQ==
>     caps mon = "allow *"
> [client.admin]
>     key = AQBY5B1YuHvaAhAApmJ1Qm6HwPJFxvmSRRS3hw==
>     auid = 0
>     caps mds = "allow *"
>     caps mon = "allow *"
>     caps osd = "allow *"
> 
> when "ceph-monstore-tool /tmp/mon-store rebuild -- --keyring
> /path/to/admin.keyring".

@Kefu,

I performed following steps and succeeded upto getting the store from osds and I was able to replace monstore and bring up the mons.

1. export ms=/home/ubuntu/monstore;mkdir /home/ubuntu/monstore
2.  for host in `cat osds`;  do  
        rsync -avz "$ms" root@$host:"$ms"; rm -rf "$ms"; 
        ssh root@$host  'export ms="/home/ubuntu/monstore"; for osd in /var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --data-path $osd --op update-mon-db --mon-store-path "$ms";done';  
    rsync -avz root@$host:"$ms" "$ms"; 
    done

the above script successfully brought storedata from all the osds and this is how directory structure looks like

3. ceph-authtool /path/to/admin.keyring -n client.admin \
  --cap mon allow 'allow *' --cap osd 'allow *' 

4. ceph-monstore-tool /home/ubuntu/monstore/monstore rebuild -- --keyring /path/admin.keyring

5. mv /home/ubuntu/monstore/monstore/store.db /var/lib/ceph/mon/ceph-magna105/store.db

6. chown -R ceph:ceph /var/lib/ceph/mon/ceph-magna105/store.db

7. bring up all mons and osds

Result:-
=====
All the mons are up and ceph command are working.
I had 9 osds but only 3 are able to comeup because of authentication problem

after I import auth list looks like 

[root@magna105 ~]# ceph auth list
installed auth entries:

osd.1
        key: AQAKCvVX1OcDFhAAcVgFE/mA9wPzzAaT8M8suA==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.3
        key: AQAWCvVX2A8BERAAPvZZD3ZHtc36muKpCyAW5w==
        caps: [mon] allow profile osd
        caps: [osd] allow *
osd.6
        key: AQAhCvVXPwn3ORAAvpuM+3MkMMSoKhs44qE92g==
        caps: [mon] allow profile osd
        caps: [osd] allow *
client.admin
        key: AQBzCfVXOcOmMhAA1ZKm6ZnwhrCdt3NTN7/JtQ==
        caps: [mon] allow *
        caps: [osd] allow *


out of 9 osds only 3 have auth info and these osds are from the last node in the for loop where we brought the store data that means first 2 host's store data is not considered while rebuilding the mon store.

Not sure if its problem with the for loop I wrote or with ceph-monstore-tool.

You can have a look at magna105 , the setup is still intact.

Comment 24 Samuel Just 2016-11-10 20:58:46 UTC
I really can't tell what's going on without more information, but it really seems like your for loop is wrong.  Perhaps you should revcrt to using the form in the documentation and add an echo so you can verify what it's doing?

Comment 25 Ken Dreyer (Red Hat) 2016-11-10 21:08:07 UTC
FYI Kefu's written automated tests that exercise ceph-monstore-tool.

https://github.com/ceph/ceph-qa-suite/blob/master/tasks/rebuild_mondb.py

https://github.com/ceph/ceph-qa-suite/blob/master/suites/rados/singleton/all/rebuild-mondb.yaml

Not to interrupt the current debugging session, but from what I understand, QE should be able to use those to verify ceph-monstore-tool's behavior without writing new scripts.

Comment 39 errata-xmlrpc 2016-11-22 19:30:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html


Note You need to log in before you can comment on or make changes to this bug.