Bug 1213723 - Compensate for pg removal bug from firefly and earlier when upgrading to hammer
Summary: Compensate for pg removal bug from firefly and earlier when upgrading to hammer
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.3.0
Hardware: All
OS: All
unspecified
medium
Target Milestone: rc
: 1.3.0
Assignee: Samuel Just
QA Contact: Harish NV Rao
URL:
Whiteboard:
: 1293832 (view as bug list)
Depends On: 1227358 1230832
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-21 07:17 UTC by Samuel Just
Modified: 2019-09-12 08:24 UTC (History)
9 users (show)

Fixed In Version: ceph-0.94.1-8.el7cp
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1219179 (view as bug list)
Environment:
Last Closed: 2015-06-24 15:52:07 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 11429 0 None None None Never
Red Hat Knowledge Base (Solution) 2108461 0 None None None 2017-07-30 15:08:08 UTC
Red Hat Product Errata RHBA-2015:1183 0 normal SHIPPED_LIVE Ceph bug fix and enhancement update 2015-06-24 19:49:46 UTC

Description Samuel Just 2015-04-21 07:17:20 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.  Set up v0.80.7 firefly cluster
2.  Create Pool populated with many small objects
3.  Remove the pool
4.  Restart osds
5.  Note the current osdmap, allow the mons to trim past that osdmap.
5.  Upgrade to Hammer
6.  Observe some osds crashing on missing maps

Actual results:
Crashing

Expected results:
Not crashing


Additional info:

Comment 1 Ken Dreyer (Red Hat) 2015-04-22 22:58:29 UTC
From the upstream report it's unclear whether there's a patch to fix this. I've asked for clarification there. http://tracker.ceph.com/issues/11429

Comment 2 Ian Colle 2015-04-24 07:33:02 UTC
Sam's working on a patch.

Comment 3 Harish NV Rao 2015-04-28 11:20:17 UTC
QE will be testing this bug fix while upgrading from 1.2.* to 1.3.0.

Comment 4 Ken Dreyer (Red Hat) 2015-05-01 14:20:36 UTC
Hi Sam, any idea on an ETA for a patch for this issue?

Comment 5 Ken Dreyer (Red Hat) 2015-05-01 15:56:38 UTC
From IRC, it sounds like a PR will be submitted for review upstream to master on Monday. Sam's working to test this out in Teuthology today.

(09:53:22 AM) sjust: small change to OSD::load_pgs to skip the offending pgs in the case of that bug
(09:53:24 AM) sjust: very simple

Assuming this goes smoothly, it should be a week or less to land a fix in a build downstream.

Comment 9 Harish NV Rao 2015-06-09 15:21:34 UTC
Ken, can you please confirm whether the test plan we have in comment 3 above is correct or not? QE can not test upgrade from Firefly to Hammer.

Comment 10 Ken Dreyer (Red Hat) 2015-06-09 17:41:09 UTC
The test plan sounds right to me. Technically the Firefly cluster should be v0.80.8, and comment 1 above mentions v0.80.7. But that's a minor detail, and you have the general concept correct.

Comment 11 Samuel Just 2015-06-17 15:16:11 UTC
https://github.com/ceph/ceph-qa-suite/blob/master/suites/rados/singleton-nomsgr/all/11429.yaml

(08:12:43 AM) sjust: the mon trimming thing is that the bug requires that the map on the pg which is left on the osd is no longer present on the cluster
(08:12:51 AM) sjust: the mons trim old maps based on some config values
(08:13:33 AM) sjust: mostly         mon min osdmap epochs: 3
(08:13:35 AM) sjust: that is
(08:13:51 AM) sjust: the mons keep mon_min_osdmap_epochs old maps around even when the cluster is clean
(08:13:59 AM) sjust: (when the cluster is not clean, they don't trim at all)
(08:14:09 AM) sjust: it defaults to something like 1000 or something
(08:15:00 AM) sjust: to force it to trim, that test I linked sets mon_min_osdmap_epochs to 3 and in the middle loops 100 times setting the min_size on newpool (which only exists to do this to) to 2 and then back to 1
(08:15:05 AM) sjust: since each of those requires a new map
(08:15:14 AM) sjust: by the end of that, at least 200 maps will have been created
(08:15:49 AM) sjust: which combined with the mon_min_osdmap_epochs config value ensures that the pg which got left on the osd is now referring to a non-existent map

Comment 12 Samuel Just 2015-06-17 15:16:42 UTC
To clarify, the github link above is to the teuthology automated test for this issue in the rados suite.

Comment 13 Samuel Just 2015-06-17 15:27:20 UTC
(08:23:44 AM) sjust: it looks like I just ran rados bench for 120 seconds with 1 byte objects
(08:23:51 AM) sjust:   - radosbench:
(08:23:51 AM) sjust:       clients: [client.0]
(08:23:51 AM) sjust:       time: 120
(08:23:51 AM) sjust:       size: 1
(08:23:51 AM) sjust:       pool: toremove
(08:23:51 AM) sjust:       create_pool: false
(08:24:24 AM) sjust: then I removed the pool
(08:24:28 AM) sjust:   - ceph_manager.remove_pool:
(08:24:28 AM) sjust:       args: ['toremove']
(08:24:35 AM) sjust: waited 10s for the pool removal to propogate to osds
(08:24:39 AM) sjust:   - sleep:
(08:24:39 AM) sjust:       duration: 10
(08:24:44 AM) sjust: restarted all three osds
(08:24:50 AM) sjust: (to trigger the bug)
(08:24:53 AM) sjust:   - ceph.restart:
(08:24:53 AM) sjust:       daemons:
(08:24:53 AM) sjust:         - osd.0
(08:24:53 AM) sjust:         - osd.1
(08:24:53 AM) sjust:         - osd.2
(08:25:11 AM) sjust: waited 30 more seconds for the cluster to stabilize
(08:25:13 AM) sjust:   - sleep:
(08:25:13 AM) sjust:       duration: 30
(08:25:16 AM) sjust: waited for it to go clean
(08:25:24 AM) sjust:   - ceph_manager.wait_for_clean: null
(08:25:41 AM) sjust: wrote 1 byte objects to some new pool for 60s
(08:25:43 AM) sjust:   - radosbench:
(08:25:43 AM) sjust:       clients: [client.0]
(08:25:43 AM) sjust:       time: 60
(08:25:43 AM) sjust:       size: 1
(08:25:49 AM) sjust: created a new pool
(08:25:51 AM) sjust:   - ceph_manager.create_pool:
(08:25:51 AM) sjust:       args: ['newpool']
(08:26:03 AM) sjust: generated 200 map changes using the new pool
(08:26:05 AM) sjust:   - loop:
(08:26:05 AM) sjust:       count: 100
(08:26:05 AM) sjust:       body:
(08:26:05 AM) sjust:         - ceph_manager.set_pool_property:
(08:26:05 AM) sjust:             args: ['newpool', 'min_size', 2]
(08:26:05 AM) sjust:         - ceph_manager.set_pool_property:
(08:26:05 AM) sjust:             args: ['newpool', 'min_size', 1]
(08:26:24 AM) sjust: then slept for 30s and generated 200 more
(08:26:26 AM) sjust:   - sleep:
(08:26:26 AM) sjust:       duration: 30
(08:26:26 AM) sjust:   - ceph_manager.wait_for_clean: null
(08:26:26 AM) sjust:   - loop:
(08:26:26 AM) sjust:       count: 100
(08:26:26 AM) sjust:       body:
(08:26:26 AM) sjust:         - ceph_manager.set_pool_property:
(08:26:26 AM) sjust:             args: ['newpool', 'min_size', 2]
(08:26:26 AM) sjust:         - ceph_manager.set_pool_property:
(08:26:26 AM) sjust:             args: ['newpool', 'min_size', 1]
(08:26:45 AM) sjust: then it upgrades and runs a bit more of a workload to give the cluster a chance to crash after the upgrade
(08:26:50 AM) sjust: make sense?

Comment 14 Hemanth Kumar 2015-06-18 15:10:49 UTC
Verified.

Reproduce the osd crash on 1.2.3 and post upgrade did the same thing as mentioned by Sam, The OSD's did not crash this time..

Comment 16 errata-xmlrpc 2015-06-24 15:52:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1183

Comment 17 Vikhyat Umrao 2015-12-24 07:39:21 UTC
*** Bug 1293832 has been marked as a duplicate of this bug. ***

Comment 18 Vikhyat Umrao 2016-02-25 04:34:35 UTC
*** Bug 1293832 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.