Description of problem: [RFE] add a tool to rebuild mon store from OSD Backport: http://tracker.ceph.com/issues/17179 Version-Release number of selected component (if applicable): Red Hat Ceph Storage 2.0
jewel backport PR : https://github.com/ceph/ceph/pull/11126
As per this https://bugzilla.redhat.com/show_bug.cgi?id=1378288#c13 unable to verify this bug.
talked over IRC with shylesh. please check http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds for details. what you are missing is the cap fields in the keyring instead of [mon.] key = AQBY5B1Y55SqARAAuCOaRjhvoUig1WTQWa1jOQ== [client.admin] key = AQBY5B1YuHvaAhAApmJ1Qm6HwPJFxvmSRRS3hw== we should have a keyring file looks like, [mon.] key = AQBY5B1Y55SqARAAuCOaRjhvoUig1WTQWa1jOQ== caps mon = "allow *" [client.admin] key = AQBY5B1YuHvaAhAApmJ1Qm6HwPJFxvmSRRS3hw== auid = 0 caps mds = "allow *" caps mon = "allow *" caps osd = "allow *" when "ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /path/to/admin.keyring".
(In reply to Kefu Chai from comment #22) > talked over IRC with shylesh. > > > please check > http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-mon/ > #recovery-using-osds for details. what you are missing is the cap fields in > the keyring > > > instead of > > > [mon.] > key = AQBY5B1Y55SqARAAuCOaRjhvoUig1WTQWa1jOQ== > [client.admin] > key = AQBY5B1YuHvaAhAApmJ1Qm6HwPJFxvmSRRS3hw== > > we should have a keyring file looks like, > > [mon.] > key = AQBY5B1Y55SqARAAuCOaRjhvoUig1WTQWa1jOQ== > caps mon = "allow *" > [client.admin] > key = AQBY5B1YuHvaAhAApmJ1Qm6HwPJFxvmSRRS3hw== > auid = 0 > caps mds = "allow *" > caps mon = "allow *" > caps osd = "allow *" > > when "ceph-monstore-tool /tmp/mon-store rebuild -- --keyring > /path/to/admin.keyring". @Kefu, I performed following steps and succeeded upto getting the store from osds and I was able to replace monstore and bring up the mons. 1. export ms=/home/ubuntu/monstore;mkdir /home/ubuntu/monstore 2. for host in `cat osds`; do rsync -avz "$ms" root@$host:"$ms"; rm -rf "$ms"; ssh root@$host 'export ms="/home/ubuntu/monstore"; for osd in /var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --data-path $osd --op update-mon-db --mon-store-path "$ms";done'; rsync -avz root@$host:"$ms" "$ms"; done the above script successfully brought storedata from all the osds and this is how directory structure looks like 3. ceph-authtool /path/to/admin.keyring -n client.admin \ --cap mon allow 'allow *' --cap osd 'allow *' 4. ceph-monstore-tool /home/ubuntu/monstore/monstore rebuild -- --keyring /path/admin.keyring 5. mv /home/ubuntu/monstore/monstore/store.db /var/lib/ceph/mon/ceph-magna105/store.db 6. chown -R ceph:ceph /var/lib/ceph/mon/ceph-magna105/store.db 7. bring up all mons and osds Result:- ===== All the mons are up and ceph command are working. I had 9 osds but only 3 are able to comeup because of authentication problem after I import auth list looks like [root@magna105 ~]# ceph auth list installed auth entries: osd.1 key: AQAKCvVX1OcDFhAAcVgFE/mA9wPzzAaT8M8suA== caps: [mon] allow profile osd caps: [osd] allow * osd.3 key: AQAWCvVX2A8BERAAPvZZD3ZHtc36muKpCyAW5w== caps: [mon] allow profile osd caps: [osd] allow * osd.6 key: AQAhCvVXPwn3ORAAvpuM+3MkMMSoKhs44qE92g== caps: [mon] allow profile osd caps: [osd] allow * client.admin key: AQBzCfVXOcOmMhAA1ZKm6ZnwhrCdt3NTN7/JtQ== caps: [mon] allow * caps: [osd] allow * out of 9 osds only 3 have auth info and these osds are from the last node in the for loop where we brought the store data that means first 2 host's store data is not considered while rebuilding the mon store. Not sure if its problem with the for loop I wrote or with ceph-monstore-tool. You can have a look at magna105 , the setup is still intact.
I really can't tell what's going on without more information, but it really seems like your for loop is wrong. Perhaps you should revcrt to using the form in the documentation and add an echo so you can verify what it's doing?
FYI Kefu's written automated tests that exercise ceph-monstore-tool. https://github.com/ceph/ceph-qa-suite/blob/master/tasks/rebuild_mondb.py https://github.com/ceph/ceph-qa-suite/blob/master/suites/rados/singleton/all/rebuild-mondb.yaml Not to interrupt the current debugging session, but from what I understand, QE should be able to use those to verify ceph-monstore-tool's behavior without writing new scripts.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2815.html