Bug 1973033
| Summary: | [RHCS-baremetal] - Provided shell script in the chapter "Recovering the Ceph Monitor store when using BlueStore" is failing | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | skanta |
| Component: | RADOS | Assignee: | Neha Ojha <nojha> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Manohar Murthy <mmurthy> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.2 | CC: | akupczyk, assingh, bhubbard, bkunal, ceph-eng-bugs, linuxkidd, nojha, rzarzyns, sseshasa, vereddy, vumrao |
| Target Milestone: | --- | ||
| Target Release: | 5.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-04-30 16:21:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1969383 | ||
|
Description
skanta
2021-06-17 07:16:18 UTC
@vikhyat- Please let me know any further information is required?
Do I need to assign this ticket to any developer?
I think for the script we do not need a RADOS developer :). Let me ask one of our Ceph support team members and he can help you. @linuxkidd can you please help Bharath here this is for Monitor DB restore script. I see three errors in the script from the documentation:
1: The 'ssh' command is missing an 's' - "sh -t $host" should be "ssh -t $host"
2: Bash doesn't stop the 'ssh' input at EOF when the EOF doesn't start at the beginning of line -- thus, the heavily spaced ( for visual aesthetics ) EOF in the docs does not work.
3: The need for '--no-mon-config' on the ceph-objectstore-tool (COT) line.
- I'm not sure if this is strictly required in all cases, but it has been needed in the most recent times I've used COT.
The addition of the stop/start commands is not expected to be needed since the cluster would be down.
- In "Prerequisites" before the script, Containerized deployments section includes that all OSD containers should be stopped.
- The docs should be updated for the Bare-metal deployments to include all OSD services should be stopped.
There are some other optimizations that can be done in the script...
- it's not necessary to 'rm -rf' the paths each loop iteration. Instead, the rsync command can be modified to ensure the contents remain pure without wasting 'rm' cycles.
- modify the 'pull' rsync to remove the source files from the OSD nodes so space consumed during the process is freed.
- modify the 'host' variable name to 'osd_node' to be more clear.
I do still recommend using rsync over scp due to the efficiencies of only transferring modified files and removing the source files without an additional separate ssh/rm command sequence.
Here's my final, recommended ( but untested ) script:
## --------------------------------------------------------------------------
## NOTE: The directory names specified by 'ms', 'db', and 'db_slow' must end
## with a trailing / otherwise rsync will not operate properly.
## --------------------------------------------------------------------------
ms=/tmp/monstore/
db=/root/db/
db_slow=/root/db.slow/
mkdir -p $ms $db $db_slow
## --------------------------------------------------------------------------
## NOTE: Replace the contents inside double quotes for 'osd_nodes' below with
## the list of OSD nodes in the environment.
## --------------------------------------------------------------------------
osd_nodes="osdnode1 osdnode2 osdnode3..."
for osd_node in $osd_nodes; do
echo "Operating on $osd_node"
rsync -avz --delete $ms $osd_node:$ms
rsync -avz --delete $db $osd_node:$db
rsync -avz --delete $db_slow $osd_node:$db_slow
ssh -t $osd_node <<EOF
for osd in /var/lib/ceph/osd/ceph-*; do
ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --no-mon-config --mon-store-path $ms
done
EOF
rsync -avz --delete --remove-source-files $osd_node:$ms $ms
rsync -avz --delete --remove-source-files $osd_node:$db $db
rsync -avz --delete --remove-source-files $osd_node:$db_slow $db_slow
done
## --------------------------------------------------------------------------
## End of script
## --------------------------------------------------------------------------
@skanta Please test the above modified script - once confirmed functional, the documentation should be updated as specified.
The script mentioned in the description is modified and tested. Script link - https://bugzilla.redhat.com/show_bug.cgi?id=1973033#c0 1. Even though the MON's are down the OSD services are running in the cluster because of that if I try to execute the "ceph-objectstore-tool" command I am getting below error- Error Message- Mount failed with ‘(11) Resource temporarily unavailable. As per the document, this occurs when ceph-objectstore-tool executed on a running OSD. Reference Doc- https://docs.ceph.com/en/latest/man/8/ceph-objectstore-tool/ It is a valid point that to avoid the above we can add the step to stop all OSD's. I will remove the stop and start service steps in the script 2. rsync is removed and cp is used in the script 3. I am initializing the osd_nodes variable with all existing OSD nodes. It will be a tedious task for the customers if the OSD count is more. Modified script to also populate a keyring file inside of $ms so it is copied and updated throughout the process.
## --------------------------------------------------------------------------
## NOTE: The directory names specified by 'ms', 'db', and 'db_slow' must end
## with a trailing / otherwise rsync will not operate properly.
## --------------------------------------------------------------------------
ms=/tmp/monstore/
db=/root/db/
db_slow=/root/db.slow/
mkdir -p $ms $db $db_slow
## --------------------------------------------------------------------------
## NOTE: Replace the contents inside double quotes for 'osd_nodes' below with
## the list of OSD nodes in the environment.
## --------------------------------------------------------------------------
osd_nodes="osdnode1 osdnode2 osdnode3..."
for osd_node in $osd_nodes; do
echo "Operating on $osd_node"
rsync -avz --delete $ms $osd_node:$ms
rsync -avz --delete $db $osd_node:$db
rsync -avz --delete $db_slow $osd_node:$db_slow
ssh -t $osd_node <<EOF
for osd in /var/lib/ceph/osd/ceph-*; do
ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --no-mon-config --mon-store-path $ms
if [ -e \$osd/keyring ]; then
cat \$osd/keyring >> $ms/keyring
echo ' caps mgr = "allow profile osd"' >> $ms/keyring
echo ' caps mon = "allow profile osd"' >> $ms/keyring
echo ' caps osd = "allow *"' >> $ms/keyring
EOT
else
echo WARNING: \$osd on $osd_node does not have a local keyring.
fi
done
EOF
rsync -avz --delete --remove-source-files $osd_node:$ms $ms
rsync -avz --delete --remove-source-files $osd_node:$db $db
rsync -avz --delete --remove-source-files $osd_node:$db_slow $db_slow
done
## --------------------------------------------------------------------------
## End of script
## --------------------------------------------------------------------------
Fixing a typo ( left-over EOT from multiple experiments ).. apologies.
## --------------------------------------------------------------------------
## NOTE: The directory names specified by 'ms', 'db', and 'db_slow' must end
## with a trailing / otherwise rsync will not operate properly.
## --------------------------------------------------------------------------
ms=/tmp/monstore/
db=/root/db/
db_slow=/root/db.slow/
mkdir -p $ms $db $db_slow
## --------------------------------------------------------------------------
## NOTE: Replace the contents inside double quotes for 'osd_nodes' below with
## the list of OSD nodes in the environment.
## --------------------------------------------------------------------------
osd_nodes="osdnode1 osdnode2 osdnode3..."
for osd_node in $osd_nodes; do
echo "Operating on $osd_node"
rsync -avz --delete $ms $osd_node:$ms
rsync -avz --delete $db $osd_node:$db
rsync -avz --delete $db_slow $osd_node:$db_slow
ssh -t $osd_node <<EOF
for osd in /var/lib/ceph/osd/ceph-*; do
ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --no-mon-config --mon-store-path $ms
if [ -e \$osd/keyring ]; then
cat \$osd/keyring >> $ms/keyring
echo ' caps mgr = "allow profile osd"' >> $ms/keyring
echo ' caps mon = "allow profile osd"' >> $ms/keyring
echo ' caps osd = "allow *"' >> $ms/keyring
else
echo WARNING: \$osd on $osd_node does not have a local keyring.
fi
done
EOF
rsync -avz --delete --remove-source-files $osd_node:$ms $ms
rsync -avz --delete --remove-source-files $osd_node:$db $db
rsync -avz --delete --remove-source-files $osd_node:$db_slow $db_slow
done
## --------------------------------------------------------------------------
## End of script
## --------------------------------------------------------------------------
Verified the following script and working as expected.
## --------------------------------------------------------------------------
## NOTE: The directory names specified by 'ms', 'db', and 'db_slow' must end
## with a trailing / otherwise rsync will not operate properly.
## --------------------------------------------------------------------------
ms=/tmp/monstore/
db=/root/db/
db_slow=/root/db.slow/
mkdir -p $ms $db $db_slow
## --------------------------------------------------------------------------
## NOTE: Replace the contents inside double quotes for 'osd_nodes' below with
## the list of OSD nodes in the environment.
## --------------------------------------------------------------------------
osd_nodes="osdnode1 osdnode2 osdnode3..."
for osd_node in $osd_nodes; do
echo "Operating on $osd_node"
rsync -avz --delete $ms $osd_node:$ms
rsync -avz --delete $db $osd_node:$db
rsync -avz --delete $db_slow $osd_node:$db_slow
ssh -t $osd_node <<EOF
for osd in /var/lib/ceph/osd/ceph-*; do
ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --no-mon-config --mon-store-path $ms
if [ -e \$osd/keyring ]; then
cat \$osd/keyring >> $ms/keyring
echo ' caps mgr = "allow profile osd"' >> $ms/keyring
echo ' caps mon = "allow profile osd"' >> $ms/keyring
echo ' caps osd = "allow *"' >> $ms/keyring
else
echo WARNING: \$osd on $osd_node does not have a local keyring.
fi
done
EOF
rsync -avz --delete --remove-source-files $osd_node:$ms $ms
rsync -avz --delete --remove-source-files $osd_node:$db $db
rsync -avz --delete --remove-source-files $osd_node:$db_slow $db_slow
done
## --------------------------------------------------------------------------
## End of script
## --------------------------------------------------------------------------
|