Bug 1969383

Summary: [RHCS-baremetal] issue with Recovering the Ceph Monitor Store on 4.x
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: skanta
Component: RADOSAssignee: skanta
Status: CLOSED NOTABUG QA Contact: skanta
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.2CC: akupczyk, assingh, bhubbard, ceph-eng-bugs, gsitlani, hfukumot, kjosy, nojha, rzarzyns, sseshasa, vereddy, vumrao
Target Milestone: ---   
Target Release: 5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-19 19:07:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1973033, 1995854, 1995859    
Bug Blocks:    
Attachments:
Description Flags
Log file none

Description skanta 2021-06-08 10:27:50 UTC
Description of problem:

      This is an internal functional bug tracker to track the verification of steps mentioned in the document - https://access.redhat.com/solutions/4202871


We perform the steps mentioned in the above document and noticed the following-

1) Ensure all OSD containers are stopped. 

    Command - podman stop <container-id>
   
   Issues-
   -------
             1.1. After stopping all OSD containers, "ceph osd tree" output 
                 shows 5 OSD's are UP  even though the services are down(Noticed in bare metal also).
             1.2.Services are still running.stopped the services by 
                  executing "systemctl stop <servicenamee>". This is not the 
                  case with 5.0. if this is expected in 4.x please ignore. 
 
2) Add Ceph repos on Ceph nodes based on their roles
       Observations-
       ------------
                    Please modify according to 4.x

3) Install packages on the nodes based on their role
   
   for Ceph-MON nodes:
   # yum install -y ceph-mon ceph-test rsync  -------> Successfully installed

   for Ceph Storage nodes:
   # yum install -y ceph-osd ceph-test rsync --------> Successfully installed

4) On the Ceph storage nodes mount all Ceph-data disks into temporary locations
        issues:-
        --------
       4.1. data partitions can be listed with:
            # ceph-disk list     
          
             Need to change the command #ceph-volume lvm list

       4.2. # mkdir /mnt/ceph-tmp-001
            # mount /dev/sda1 /mnt/ceph-tmp-001
        
mount OSD's step failed and getting the following Error on non-encrypted OSD's 

  Error output:
  -------------
        [root@ceph-pdhiran-1623053041789-node3-osd cephuser]# mount /dev/vg1/data-lv4  /mnt/ceph-tmp-001
mount: /mnt/ceph-tmp-001: wrong fs type, bad option, bad superblock on /dev/mapper/vg1-data--lv4, missing codepage or helper program, or other error.
[root@ceph-pdhiran-1623053041789-node3-osd cephuser]# cd /var/lib/ceph/osd/


On encrypted OSD's-

 [root@ceph-pdhiran-1623053041789-node10-osd cephuser]# mount /dev/vg1/data-lv1 /mnt/ceph-tmp-001
mount: /mnt/ceph-tmp-001: unknown filesystem type 'crypto_LUKS'.
[root@ceph-pdhiran-1623053041789-node10-osd cephuser]#
      
            
Ceph Version:
-------------            
ceph version 14.2.11-179.el8cp (29de9ae52bcc20e38eb86cb8e4163bff2d1719c8) nautilus (stable)

Comment 2 skanta 2021-06-09 06:35:42 UTC
Document Reference- https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/troubleshooting_guide/index?lb_target=production#recovering-the-ceph-monitor-store

Performed the steps as mentioned in the DOC, Faced the issue at Step-2 of the "4.7.1. Recovering the Ceph Monitor store when using BlueStore" chapter.

The parameter "$osd_nodes" is not initialized or exported in the script.

Error Snippet:-
---------------

         [root@mon ~]# for host in $osd_nodes; do
                echo "$host"
                rsync -avz $ms $host:$ms
                rsync -avz $db $host:$db
                rsync -avz $db_slow $host:$db_slow
               ------------------------------------------
               -----------------------------------------------

Comment 3 skanta 2021-06-10 08:11:00 UTC
As workaround a executed the script with "for host in ceph-bharath-1623199697633-node3-mon-osd ceph-bharath-1623199697633-node4-osd-client ceph-bharath-1623199697633-node5-osd ceph-bharath-1623199697633-node6-osd ceph-bharath-1623199697633-node7-osd". 

  Execution failed with error messages. For more details please check the attached log file.

Comment 4 skanta 2021-06-10 08:12:12 UTC
Created attachment 1789773 [details]
Log file

Comment 5 skanta 2021-06-17 06:45:03 UTC
Performed the following steps to recover the MON-


PROCEDURE
---------

Perform the steps as a root user on the installer node
    	ssh-keygen

copy ssh key to all OSD nodes
                  ssh-copy-id root@<osd nodes>

    	cd /root/
    	ms=/tmp/monstore/
   	mkdir $ms
    	

Create a script file with the following code-

NOTE: I included the sleep command after stopping and starting the service. This needs to be modified. Check that service is stopped or started before proceeding with the further steps.

vi recover.sh

#!/bin/bash -x

ms=/tmp/monstore/
rm -rf $ms
mkdir $ms

for host in ceph-bharath-1623727358372-node3-mon-osd ceph-bharath-1623727358372-node4-osd-client ceph-bharath-1623727358372-node5-osd ceph-bharath-1623727358372-node6-osd ceph-bharath-1623727358372-node7-osd;
do
echo "The Host name is :$host"

ssh -l root $host "rm -rf  $ms"
ssh -l root $host "mkdir $ms”

scp -r $ms"store.db" $host:$ms
rm -rf $ms
mkdir $ms

#ssh -l root $host "mkdir $ms"

ssh -t root@$host <<'EOT'
ms=/tmp/monstore
for osd in /var/lib/ceph/osd/ceph-*;
do
     IN=$osd
     arrIN=(${IN//-/ })
     systemctl stop ceph-osd@${arrIN[1]}.service
     sleep 5
     echo "ceph-objectstore-tool --type bluestore --data-path $osd --op update-mon-db --no-mon-config --mon-store-path $ms"
     ceph-objectstore-tool --type bluestore --data-path $osd --op update-mon-db --no-mon-config --mon-store-path $ms
     systemctl start ceph-osd@${arrIN[1]}.service
     sleep 5
     echo "Pulling data finished in OSD-$arrIN"
done
EOT

scp -r $host:$ms"*" $ms
echo "Finished Pulling data: $host"

done

Execute ./recover.sh



 Creating a file with all keyrings (/tmp/all.keyring) and using this file in the 

[root@MON]# ceph-authtool /etc/ceph/ceph.client.admin.keyring -n mon. --cap mon 'allow *' --gen-key

After generating the “/ceph.client.admin.keyring “ file   add all the existing  keyring to the  /etc/ceph/ceph.client.admin.keyring file

Example:-
   
[root@ceph-bharath-1623839999591-node1-mon-mgr-installer /]# find / -name *.keyring
/etc/ceph/ceph.client.admin.keyring
/etc/ceph/ceph.mgr.ceph-bharath-1623839999591-node1-mon-mgr-installer.keyring
/etc/ceph/ceph.client.crash.keyring
/var/lib/ceph/bootstrap-mds/ceph.keyring
/var/lib/ceph/bootstrap-mgr/ceph.keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring
/var/lib/ceph/bootstrap-rbd/ceph.keyring
/var/lib/ceph/bootstrap-rbd-mirror/ceph.keyring
/var/lib/ceph/bootstrap-rgw/ceph.keyring
/var/lib/ceph/tmp/ceph.mon..keyring
[root@ceph-bharath-1623839999591-node1-mon-mgr-installer /]#

[root@ceph-bharath-1623839999591-node1-mon-mgr-installer /]# cat /etc/ceph/ceph.mgr.ceph-bharath-1623839999591-node1-mon-mgr-installer.keyring  /etc/ceph/ceph.client.crash.keyring  /var/lib/ceph/bootstrap-mds/ceph.keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring /var/lib/ceph/bootstrap-rbd/ceph.keyring /var/lib/ceph/bootstrap-rbd-mirror/ceph.keyring /var/lib/ceph/bootstrap-rgw/ceph.keyring /var/lib/ceph/tmp/ceph.mon..keyring >>/etc/ceph/ceph.client.admin.keyring


[root@MON]#ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmap 

[root@MON]#monmaptool /tmp/monmap --print

Notice theNo such file or directory  error message if monmap is missed

Example- 

[root@ceph-bharath-1623839999591-node1-mon-mgr-installer /]# monmaptool /tmp/monmap --print
monmaptool: monmap file /tmp/monmap
monmaptool: couldn't open /tmp/monmap: (2) No such file or directory
[root@ceph-bharath-1623839999591-node1-mon-mgr-installer /]#

If monmap is missed then create a new monmap

[root@MON]# monmaptool --create --add <mon-id> <mon-a-ip> --enable-all-features --clobber /root/monmap.mon-a --fsid <fsid>

Get the mon-id,mon-a-ip and fsid details from the /etc/ceph/ceph.conf file

Example -

   [root@ceph-bharath-1623839999591-node1-mon-mgr-installer ceph-ceph-bharath-1623839999591-node1-mon-mgr-installer]# cat /etc/ceph/ceph.conf 
[client]
rgw crypt require ssl = False
rgw crypt s3 kms encryption keys = testkey-1=YmluCmJvb3N0CmJvb3N0LWJ1aWxkCmNlcGguY29uZgo= testkey-2=aWIKTWFrZWZpbGUKbWFuCm91dApzcmMKVGVzdGluZwo=

# Please do not change this file directly since it is managed by Ansible and will be overwritten
[global]
cluster network = 10.0.208.0/22
fsid = 345ecf3f-1494-4b35-80cb-1df54355362b
mon host = [v2:10.0.210.146:3300,v1:10.0.210.146:6789],[v2:10.0.209.3:3300,v1:10.0.209.3:6789],[v2:10.0.208.15:3300,v1:10.0.208.15:6789]
mon initial members = ceph-bharath-1623839999591-node1-mon-mgr-installer,ceph-bharath-1623839999591-node2-mon,ceph-bharath-1623839999591-node3-mon-osd
mon_max_pg_per_osd = 1024
osd pool default crush rule = -1
osd_default_pool_size = 2
osd_pool_default_pg_num = 64
osd_pool_default_pgp_num = 64
public network = 10.0.208.0/22

[mon]
mon_allow_pool_delete = True

[root@ceph-bharath-1623839999591-node1-mon-mgr-installer ceph-ceph-bharath-1623839999591-node1-mon-mgr-installer]#


[root@ceph-bharath-1623839999591-node1-mon-mgr-installer /]# monmaptool --create --addv ceph-bharath-1623839999591-node2-mon  [v2:10.0.209.3:3300,v1:10.0.209.3:6789]  --addv ceph-bharath-1623839999591-node1-mon-mgr-installer [v2:10.0.210.146:3300,v1:10.0.210.146:6789] --addv ceph-bharath-1623839999591-node3-mon-osd  [v2:10.0.208.15:3300,v1:10.0.208.15:6789] --enable-all-features  --clobber /root/monmap.mon-a  --fsid 345ecf3f-1494-4b35-80cb-1df54355362b


To check the  generated monmap

[root@MON]# monmaptool /root/monmap.mon-a --print

import monmap


[root@MON]# ceph-monstore-tool /tmp/monstore rebuild -- --keyring /etc/ceph/ceph.client.admin.keyring --monmap /root/monmap.mon-a
[root@MON]# chown -R ceph:ceph /tmp/monstore

Repeat this step for all Ceph Monitor nodes:


[root@MON]#mv /var/lib/ceph/mon/ceph-HOSTNAME/store.db /var/lib/ceph/mon/ceph-HOSTNAME/store.db.corrupted   

Replace the corrupted store. Repeat this step for all Ceph Monitor nodes:

[root@MON]#scp -r /tmp/monstore/store.db HOSTNAME:/var/lib/ceph/mon/ceph-HOSTNAME/

Replace HOSTNAME with the host name of the Monitor node

Example: 
scp -r /tmp/monstore/store.db  ceph-bharath-1623839999591-node2-mon:/var/lib/ceph/mon/ceph-ceph-bharath-1623839999591-node2-mon/

Change the owner of the new store. Repeat this step for all Ceph Monitor nodes:

[root@MON]# chown -R ceph:ceph /var/lib/ceph/mon/ceph-HOSTNAME/store.db

Replace HOSTNAME with the host name of the Monitor node


Example :
    chown -R ceph:ceph /var/lib/ceph/mon/ceph-ceph-bharath-1623839999591-node3-mon-osd/store.db

Start all the Ceph Monitor daemons:

 [root@MON]# start ceph-mon@ceph-*

Example :
            [root@ceph-bharath-1623839999591-node2-mon /]# systemctl start ceph-mon

Comment 6 skanta 2021-06-17 06:59:59 UTC
The chapter "4.7.1. Recovering the Ceph Monitor store when using BlueStoreTroubleshooting guide"  procedure needs to be modified.

The script which is provided in the guide is modified as -

#!/bin/bash -x

ms=/tmp/monstore/
rm -rf $ms
mkdir $ms

for host in ceph-bharath-1623727358372-node3-mon-osd ceph-bharath-1623727358372-node4-osd-client ceph-bharath-1623727358372-node5-osd ceph-bharath-1623727358372-node6-osd ceph-bharath-1623727358372-node7-osd;
do
echo "The Host name is :$host"

ssh -l root $host "rm -rf  $ms"
ssh -l root $host "mkdir $ms”

scp -r $ms"store.db" $host:$ms
rm -rf $ms
mkdir $ms

#ssh -l root $host "mkdir $ms"

ssh -t root@$host <<'EOT'
ms=/tmp/monstore
for osd in /var/lib/ceph/osd/ceph-*;
do
     IN=$osd
     arrIN=(${IN//-/ })
     systemctl stop ceph-osd@${arrIN[1]}.service
     sleep 5
     echo "ceph-objectstore-tool --type bluestore --data-path $osd --op update-mon-db --no-mon-config --mon-store-path $ms"
     ceph-objectstore-tool --type bluestore --data-path $osd --op update-mon-db --no-mon-config --mon-store-path $ms
     systemctl start ceph-osd@${arrIN[1]}.service
     sleep 5
     echo "Pulling data finished in OSD-$arrIN"
done
EOT

scp -r $host:$ms"*" $ms
echo "Finished Pulling data: $host"

done



For testing purposes, I hardcoded the OSD hostnames and included the sleep commands in the script.

The developer has to verify this and needs to provide the refined script. To track the issue I am raising the dependency bug.