Bug 2258542 - ceph-objectstore-tool and ceph-bluestore-tool fail with mount error post upgrade
Summary: ceph-objectstore-tool and ceph-bluestore-tool fail with mount error post upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 7.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 7.1
Assignee: Adam King
QA Contact: Harsh Kumar
Akash Raj
URL:
Whiteboard:
Depends On:
Blocks: 2267614 2298578 2298579
TreeView+ depends on / blocked
 
Reported: 2024-01-16 03:41 UTC by Harsh Kumar
Modified: 2024-07-18 07:59 UTC (History)
10 users (show)

Fixed In Version: ceph-18.2.1-99.el9cp
Doc Type: Bug Fix
Doc Text:
.Using the '--name NODE' flag with the cephadm shell to start a stopped OSD no longer returns the wrong image container Previously, in some cases, when using the `cephadm shell --name NODE` command, the command would start the container with the wrong version of the tools. This would occur when a user has a newer ceph container image on the host than the one that their OSDs are using. With this fix, Cephadm determines the container image for stopped daemons when using the cephadm shell command with the `--name` flag. Users no longer have any issues with the `--name` flag, and the command works as expected.
Clone Of:
Environment:
Last Closed: 2024-06-13 14:24:35 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-8169 0 None None None 2024-01-16 03:42:22 UTC
Red Hat Product Errata RHSA-2024:3925 0 None None None 2024-06-13 14:24:41 UTC

Description Harsh Kumar 2024-01-16 03:41:26 UTC
Description of problem:
ceph-objectstore-tool and ceph-bluestore-tool fail with I/O mount error in a brownfield deployment
Observed when cluster was upgraded in these scenarios:
5.3 (16.2.10-247) -> 6.1 (17.2.6-190)
5.3 (16.2.10-247) -> 7.0 GA (18.2.0-131)
6.1 (17.2.6-189) -> 7.0 GA (18.2.0-131)

Version-Release number of selected component (if applicable):
ceph version 18.2.0-131.el9cp (d2f32f94f1c60fec91b161c8a1f200fca2bb8858) reef (stable)

How reproducible:
5/5

Steps to Reproduce:
1. Deploy a RHCS 5.3 or RHCS 6.1 cluster
2. Upgrade the cluster to RHCS 7.0 GA
	https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/7/html/upgrade_guide/upgrade-a-red-hat-ceph-storage-cluster-using-cephadm#upgrading-the-red-hat-ceph-storage-cluster_upgrade
3. Post successful upgrade, choose an OSD at random
4. On the osd host machine, stop the osd service through systemctl
	# systemctl stop ceph-65b38b76-b148-11ee-9c08-fa163e93f383.service
5. Through cephadm get inside the osd container's shell
	cephadm shell --name osd.#
6. Execute ceph-objectstore-tool or ceph-bluestore-tool commands

Actual results:
	[root@ceph-upgrade-cot-3qfewv-node4 ~]# cephadm shell --name osd.0
	Inferring fsid e2f98e88-b255-11ee-8c99-fa163ed2a99b
	Inferring config /var/lib/ceph/e2f98e88-b255-11ee-8c99-fa163ed2a99b/osd.0/config
	Using ceph image with id '024e63562755' and tag '<none>' created on 2024-01-11 23:23:22 +0000 UTC
	registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:1d2572ec0dd321c9f6327b3687b25579eb42e37d9cc607a70ae554cbcc1b970d

	[ceph: root@ceph-upgrade-cot-3qfewv-node4 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --op list
	Mount failed with '(5) Input/output error'

	[root@ceph-hakumar-fvlbhb-node4 yum.repos.d]# systemctl stop ceph-2ca31110-b118-11ee-a618-fa163e02665f.service
	[root@ceph-hakumar-fvlbhb-node4 yum.repos.d]# cephadm shell --name osd.2
	Inferring fsid 2ca31110-b118-11ee-a618-fa163e02665f
	Inferring config /var/lib/ceph/2ca31110-b118-11ee-a618-fa163e02665f/osd.2/config
	Using ceph image with id '7accf94326fc' and tag '<none>' created on 2024-01-11 06:35:02 +0000 UTC
	registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:48496e4a881e0a4e6dcafd9dadf77fe2f41f43246dc5645af89b27d22dd813d3
	[ceph: root@ceph-hakumar-fvlbhb-node4 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --op list
	Mount failed with '(5) Input/output error'
	[ceph: root@ceph-hakumar-fvlbhb-node4 /]# exit
	exit
	[root@ceph-hakumar-fvlbhb-node4 yum.repos.d]# cephadm version
	cephadm version 18.2.0-131.el9cp (d2f32f94f1c60fec91b161c8a1f200fca2bb8858) reef (stable)


	[ceph: root@ceph-hakumar-12u5ze-node4 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-8/ fsck
	2024-01-11T21:37:51.871+0000 7f8a1ddb5540 -1 bluefs _replay 0x426000: stop: unrecognized op 12
	2024-01-11T21:37:51.871+0000 7f8a1ddb5540 -1 bluefs mount failed to replay log: (5) Input/output error
	2024-01-11T21:37:51.871+0000 7f8a1ddb5540 -1 bluestore(/var/lib/ceph/osd/ceph-8) _open_bluefs failed bluefs mount: (5) Input/output error
	2024-01-11T21:37:51.871+0000 7f8a1ddb5540 -1 bluestore(/var/lib/ceph/osd/ceph-8) _open_db failed to prepare db environment: 
	fsck failed: (5) Input/output error

Expected results:
	[ceph: root@ceph-hakumar-n69ot0-node3 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-3/ fsck
	fsck success
	[ceph: root@ceph-hakumar-n69ot0-node3 /]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-3 --op list
	["6.1f",{"oid":"notify.0","key":"","snapid":-2,"hash":1126365855,"max":0,"pool":6,"namespace":"","max":0}]
	["5.19",{"oid":"obj_delete_at_hint.0000000100","key":"","snapid":-2,"hash":27468569,"max":0,"pool":5,"namespace":"","max":0}]
	["5.19",{"oid":"lc.9","key":"","snapid":-2,"hash":2339156633,"max":0,"pool":5,"namespace":"lc","max":0}]
	["5.19",{"oid":"gc.0","key":"","snapid":-2,"hash":52147801,"max":0,"pool":5,"namespace":"gc","max":0}]
	["5.19",{"oid":"obj_delete_at_hint.0000000093","key":"","snapid":-2,"hash":103627321,"max":0,"pool":5,"namespace":"","max":0}]
	["5.19",{"oid":"lc.15","key":"","snapid":-2,"hash":877642937,"max":0,"pool":5,"namespace":"lc","max":0}]
	["5.19",{"oid":"lc.19","key":"","snapid":-2,"hash":4147430329,"max":0,"pool":5,"namespace":"lc","max":0}]
	["5.19",{"oid":"obj_delete_at_hint.0000000004","key":"","snapid":-2,"hash":3987729273,"max":0,"pool":5,"namespace":"","max":0}]
	["5.19",{"oid":"obj_delete_at_hint.0000000063","key":"","snapid":-2,"hash":2378447737,"max":0,"pool":5,"namespace":"","max":0}]
	["5.19",{"oid":"obj_delete_at_hint.0000000122","key":"","snapid":-2,"hash":2989065465,"max":0,"pool":5,"namespace":"","max":0}]
	["5.19",{"oid":"lc.21","key":"","snapid":-2,"hash":305547001,"max":0,"pool":5,"namespace":"lc","max":0}]
	["5.19",{"oid":"obj_delete_at_hint.0000000057","key":"","snapid":-2,"hash":2350856185,"max":0,"pool":5,"namespace":"","max":0}]
	["2.13",{"oid":"mds0_inotable","key":"","snapid":-2,"hash":3092428947,"max":0,"pool":2,"namespace":"","max":0}]
	["2.13",{"oid":"100.00000000","key":"","snapid":-2,"hash":3307625139,"max":0,"pool":2,"namespace":"","max":0}]
	["5.14",{"oid":"reshard.0000000001","key":"","snapid":-2,"hash":3291492404,"max":0,"pool":5,"namespace":"reshard","max":0}]
	["5.14",{"oid":"obj_delete_at_hint.0000000088","key":"","snapid":-2,"hash":2187825332,"max":0,"pool":5,"namespace":"","max":0}]
	["5.14",{"oid":"obj_delete_at_hint.0000000080","key":"","snapid":-2,"hash":2088300980,"max":0,"pool":5,"namespace":"","max":0}]
	["5.14",{"oid":"obj_delete_at_hint.0000000090","key":"","snapid":-2,"hash":1552334260,"max":0,"pool":5,"namespace":"","max":0}]
	["5.14",{"oid":"lc.22","key":"","snapid":-2,"hash":455327348,"max":0,"pool":5,"namespace":"lc","max":0}]
	["5.14",{"oid":"obj_delete_at_hint.0000000110","key":"","snapid":-2,"hash":3549983604,"max":0,"pool":5,"namespace":"","max":0}]
	["5.14",{"oid":"lc.17","key":"","snapid":-2,"hash":293272820,"max":0,"pool":5,"namespace":"lc","max":0}]
	["2.d",{"oid":"mds_snaptable","key":"","snapid":-2,"hash":3640815789,"max":0,"pool":2,"namespace":"","max":0}]
	["5.12",{"oid":"obj_delete_at_hint.0000000064","key":"","snapid":-2,"hash":2380537618,"max":0,"pool":5,"namespace":"","max":0}]
	["5.12",{"oid":"obj_delete_at_hint.0000000071","key":"","snapid":-2,"hash":2801038418,"max":0,"pool":5,"namespace":"","max":0}]
	["5.12",{"oid":"obj_delete_at_hint.0000000074","key":"","snapid":-2,"hash":769383634,"max":0,"pool":5,"namespace":"","max":0}]
	["5.12",{"oid":"gc.8","key":"","snapid":-2,"hash":1167531762,"max":0,"pool":5,"namespace":"gc","max":0}]
	["5.17",{"oid":"reshard.0000000000","key":"","snapid":-2,"hash":492705943,"max":0,"pool":5,"namespace":"reshard","max":0}]
	["5.17",{"oid":"obj_delete_at_hint.0000000050","key":"","snapid":-2,"hash":2363050551,"max":0,"pool":5,"namespace":"","max":0}]
	["5.17",{"oid":"obj_delete_at_hint.0000000102","key":"","snapid":-2,"hash":3543557943,"max":0,"pool":5,"namespace":"","max":0}]
	["5.17",{"oid":"lc.1","key":"","snapid":-2,"hash":1269515895,"max":0,"pool":5,"namespace":"lc","max":0}]
	["5.7",{"oid":"gc.15","key":"","snapid":-2,"hash":312892935,"max":0,"pool":5,"namespace":"gc","max":0}]
	["5.7",{"oid":"obj_delete_at_hint.0000000086","key":"","snapid":-2,"hash":2649155847,"max":0,"pool":5,"namespace":"","max":0}]
	["5.7",{"oid":"reshard.0000000008","key":"","snapid":-2,"hash":2959979399,"max":0,"pool":5,"namespace":"reshard","max":0}]
	["5.7",{"oid":"lc.28","key":"","snapid":-2,"hash":4206672071,"max":0,"pool":5,"namespace":"lc","max":0}]
	["5.7",{"oid":"gc.29","key":"","snapid":-2,"hash":1154039335,"max":0,"pool":5,"namespace":"gc","max":0}]
	["5.7",{"oid":"gc.20","key":"","snapid":-2,"hash":1458521191,"max":0,"pool":5,"namespace":"gc","max":0}]

Additional info:
Could be similar to https://tracker.ceph.com/issues/15376
Cluster access: ssh -l root 10.0.208.194
pass: passwd
Upgraded from Quincy to Reef

Comment 25 errata-xmlrpc 2024-06-13 14:24:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925


Note You need to log in before you can comment on or make changes to this bug.