1262480 – osd: hammer: fail to start due to stray pgs after firefly->hammer upgrade

Bug 1262480 - osd: hammer: fail to start due to stray pgs after firefly->hammer upgrade

Summary: osd: hammer: fail to start due to stray pgs after firefly->hammer upgrade

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	1.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	1.3.0
Assignee:	Ken Dreyer (Red Hat)
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1262485 (view as bug list)
Depends On:
Blocks:	1262485
TreeView+	depends on / blocked

Reported:	2015-09-11 20:36 UTC by Vasu Kulkarni
Modified:	2022-02-21 18:35 UTC (History)
CC List:	7 users (show)
Fixed In Version:	ceph-0.94.1-19.el7cp (RHEL) ceph v0.94.1.8 (Ubuntu)
Doc Type:	Bug Fix
Doc Text:	In a scenario where user is running a version of Ceph older than v0.94, and Ceph's Object Storage Daemon (OSD) restarts before completing a placement group (PG) removal operation, and the user upgrades to RHCS 1.3, Ceph's OSD could fail to start when it encounters remnants of the old placement group. With this update, Ceph's OSD ignores the old PG and starts up successfully.
Clone Of:
Clones:	1262485 (view as bug list)
Environment:
Last Closed:	2015-10-08 18:59:44 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	13060	None	None	None	Never
Red Hat Issue Tracker	RHCEPH-3485	None	None	None	2022-02-21 18:35:34 UTC
Red Hat Product Errata	RHBA-2015:1882	normal	SHIPPED_LIVE	ceph bug fix update	2015-10-08 22:59:37 UTC

Description Vasu Kulkarni 2015-09-11 20:36:49 UTC

Description of problem:

Notes from upstream tracker:

On Fri, Sep 11, 2015 at 8:56 PM, Sage Weil <sage> wrote:
On Fri, 11 Sep 2015, ?? wrote:

Thank Sage Weil:

1. I delete some testing pools in the past, but is was a long

time ago (may be 2 months ago), in recently upgrade, do not
delete pools.

2.? ceph osd dump please see the (attachment file

ceph.osd.dump.log)

3. debug osd = 20' and 'debug filestore = 20? (attachment file

ceph.osd.5.log.tar.gz)

This one is failing on pool 54, which has been deleted.? In this
case you
can work around it by renaming current/54.* out of the way.

4. i install the ceph-test, but output error
ceph-kvstore-tool /ceph/data5/current/db list
Invalid argument: /ceph/data5/current/db: does not exist

(create_if_missing is false)

Sorry, I should have said current/omap, not current/db.? I'm
still curious
to see the key dump.? I'm not sure why the leveldb key for these
pgs is
missing...

Yesterday I have a chat with wangrui and the reason is "infos"(legacy oid)
is missing. I'm not sure why it's missing.

Probably

https://github.com/ceph/ceph/blob/hammer/src/osd/OSD.cc#L2908

Oh, I think I see what happened:

- the pg removal was aborted pre-hammer.  On pre-hammer, thsi means that
load_pgs skips it here:
https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L2121
- we upgrade to hammer.  we skip this pg (same reason), don't upgrade it,
but delete teh legacy infos object
https://github.com/ceph/ceph/blob/hammer/src/osd/OSD.cc#L2908
- now we see this crash...
I think the fix is, in hammer, to bail out of peek_map_epoch if the infos
object isn't present, here

https://github.com/ceph/ceph/blob/hammer/src/osd/PG.cc#L2867
Probably we should restructure so we can return a 'fail' value 
instead of a magic epoch_t meaning the same...


Version-Release number of selected component (if applicable):


How reproducible:

Looks like sage can recreate this with modified yaml

Steps to Reproduce:

1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Ken Dreyer (Red Hat) 2015-09-18 20:35:52 UTC

Fix that went into upstream's hammer: https://github.com/ceph/ceph/pull/5892

Comment 5 Tanay Ganguly 2015-09-23 10:53:12 UTC

Hi Ken,

Based of the comment, i can think of the below mentioned scenario. Please correct me.

1. Bring the Cluster in 1.2.3 version.
2. Create a pool with 128 PGs.
3. Fill it up with some data.
4. Try to delete the pool.
5. While step 4 is in happening shutdown the OSD immediately [ The OSD must be part of the acting set for few PGs ]

So when the OSD is brought back it will still have the information of the PGs which are already being deleted from other OSD, hence leading to inconsistency after upgrade.

6. When the OSD comes back and the cluster becomes Healthy, start upgrading from 1.2.3 to 1.3.0


Can you please let me know what can be other scenario's.

Also let me know if the below makes sense.
e.g. un-mounting the OSD partition while deleting the Pool is in progress.

Thanks,
Tanay

Comment 6 Harish NV Rao 2015-09-23 13:56:01 UTC

Ken, can you please also let us know which version to use to start upgrading from? We are planning to use 1.2.3 on RHEL 7.1 and upgrade from there to 1.3.0 async. If this is not the version to start upgrading from, then please let us know the right version.

Comment 7 Vasu Kulkarni 2015-09-23 16:36:08 UTC

Following test will be run downstream to verify on RH 7.1

https://github.com/ceph/ceph-qa-suite/blob/f0c925e30a1d6fc9db00a220d129f63274cdf94f/suites/rados/singleton-nomsgr/all/11429.yaml

Comment 8 Samuel Just 2015-09-23 17:21:36 UTC

Tanay:

Looking at Sage's changes to 11429.yaml, that looks like the right idea.  You probably need a lot more than 128 pgs, though.  The trick is that when the 'delete pool' command completes, it actually just begins an async pg deletion process.  The key is to kill the osds after the deletion has begun, but before it has completed so that some of the pgs are caught in the intermediate state.  You probably want to wait a bit (10s from 11429.yaml) between running the command to remove the pools and shutting down the osds (you probably want to stop all of them).  I don't think un-mounting the OSD partitions is necessary.

Comment 9 Samuel Just 2015-09-23 17:22:09 UTC

Using the 11429.yaml directly would be better, of course!

Comment 11 Federico Lucifredi 2015-09-24 23:58:34 UTC

*** Bug 1262485 has been marked as a duplicate of this bug. ***

Comment 12 Vasu Kulkarni 2015-09-25 00:08:43 UTC

Sorry for the confusion Federico, will close this as is gets verified in 1.3.0 async.

Comment 13 Vasu Kulkarni 2015-09-28 19:12:20 UTC

Verified on magna076/magna059 using 1.2.3->1.3.0 , partial logs at : http://pastebin.test.redhat.com/315887

Comment 15 errata-xmlrpc 2015-10-08 18:59:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1882

Note You need to log in before you can comment on or make changes to this bug.