Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Set up v0.80.7 firefly cluster 2. Create Pool populated with many small objects 3. Remove the pool 4. Restart osds 5. Note the current osdmap, allow the mons to trim past that osdmap. 5. Upgrade to Hammer 6. Observe some osds crashing on missing maps Actual results: Crashing Expected results: Not crashing Additional info:
From the upstream report it's unclear whether there's a patch to fix this. I've asked for clarification there. http://tracker.ceph.com/issues/11429
Sam's working on a patch.
QE will be testing this bug fix while upgrading from 1.2.* to 1.3.0.
Hi Sam, any idea on an ETA for a patch for this issue?
From IRC, it sounds like a PR will be submitted for review upstream to master on Monday. Sam's working to test this out in Teuthology today. (09:53:22 AM) sjust: small change to OSD::load_pgs to skip the offending pgs in the case of that bug (09:53:24 AM) sjust: very simple Assuming this goes smoothly, it should be a week or less to land a fix in a build downstream.
Ken, can you please confirm whether the test plan we have in comment 3 above is correct or not? QE can not test upgrade from Firefly to Hammer.
The test plan sounds right to me. Technically the Firefly cluster should be v0.80.8, and comment 1 above mentions v0.80.7. But that's a minor detail, and you have the general concept correct.
https://github.com/ceph/ceph-qa-suite/blob/master/suites/rados/singleton-nomsgr/all/11429.yaml (08:12:43 AM) sjust: the mon trimming thing is that the bug requires that the map on the pg which is left on the osd is no longer present on the cluster (08:12:51 AM) sjust: the mons trim old maps based on some config values (08:13:33 AM) sjust: mostly mon min osdmap epochs: 3 (08:13:35 AM) sjust: that is (08:13:51 AM) sjust: the mons keep mon_min_osdmap_epochs old maps around even when the cluster is clean (08:13:59 AM) sjust: (when the cluster is not clean, they don't trim at all) (08:14:09 AM) sjust: it defaults to something like 1000 or something (08:15:00 AM) sjust: to force it to trim, that test I linked sets mon_min_osdmap_epochs to 3 and in the middle loops 100 times setting the min_size on newpool (which only exists to do this to) to 2 and then back to 1 (08:15:05 AM) sjust: since each of those requires a new map (08:15:14 AM) sjust: by the end of that, at least 200 maps will have been created (08:15:49 AM) sjust: which combined with the mon_min_osdmap_epochs config value ensures that the pg which got left on the osd is now referring to a non-existent map
To clarify, the github link above is to the teuthology automated test for this issue in the rados suite.
(08:23:44 AM) sjust: it looks like I just ran rados bench for 120 seconds with 1 byte objects (08:23:51 AM) sjust: - radosbench: (08:23:51 AM) sjust: clients: [client.0] (08:23:51 AM) sjust: time: 120 (08:23:51 AM) sjust: size: 1 (08:23:51 AM) sjust: pool: toremove (08:23:51 AM) sjust: create_pool: false (08:24:24 AM) sjust: then I removed the pool (08:24:28 AM) sjust: - ceph_manager.remove_pool: (08:24:28 AM) sjust: args: ['toremove'] (08:24:35 AM) sjust: waited 10s for the pool removal to propogate to osds (08:24:39 AM) sjust: - sleep: (08:24:39 AM) sjust: duration: 10 (08:24:44 AM) sjust: restarted all three osds (08:24:50 AM) sjust: (to trigger the bug) (08:24:53 AM) sjust: - ceph.restart: (08:24:53 AM) sjust: daemons: (08:24:53 AM) sjust: - osd.0 (08:24:53 AM) sjust: - osd.1 (08:24:53 AM) sjust: - osd.2 (08:25:11 AM) sjust: waited 30 more seconds for the cluster to stabilize (08:25:13 AM) sjust: - sleep: (08:25:13 AM) sjust: duration: 30 (08:25:16 AM) sjust: waited for it to go clean (08:25:24 AM) sjust: - ceph_manager.wait_for_clean: null (08:25:41 AM) sjust: wrote 1 byte objects to some new pool for 60s (08:25:43 AM) sjust: - radosbench: (08:25:43 AM) sjust: clients: [client.0] (08:25:43 AM) sjust: time: 60 (08:25:43 AM) sjust: size: 1 (08:25:49 AM) sjust: created a new pool (08:25:51 AM) sjust: - ceph_manager.create_pool: (08:25:51 AM) sjust: args: ['newpool'] (08:26:03 AM) sjust: generated 200 map changes using the new pool (08:26:05 AM) sjust: - loop: (08:26:05 AM) sjust: count: 100 (08:26:05 AM) sjust: body: (08:26:05 AM) sjust: - ceph_manager.set_pool_property: (08:26:05 AM) sjust: args: ['newpool', 'min_size', 2] (08:26:05 AM) sjust: - ceph_manager.set_pool_property: (08:26:05 AM) sjust: args: ['newpool', 'min_size', 1] (08:26:24 AM) sjust: then slept for 30s and generated 200 more (08:26:26 AM) sjust: - sleep: (08:26:26 AM) sjust: duration: 30 (08:26:26 AM) sjust: - ceph_manager.wait_for_clean: null (08:26:26 AM) sjust: - loop: (08:26:26 AM) sjust: count: 100 (08:26:26 AM) sjust: body: (08:26:26 AM) sjust: - ceph_manager.set_pool_property: (08:26:26 AM) sjust: args: ['newpool', 'min_size', 2] (08:26:26 AM) sjust: - ceph_manager.set_pool_property: (08:26:26 AM) sjust: args: ['newpool', 'min_size', 1] (08:26:45 AM) sjust: then it upgrades and runs a bit more of a workload to give the cluster a chance to crash after the upgrade (08:26:50 AM) sjust: make sense?
Verified. Reproduce the osd crash on 1.2.3 and post upgrade did the same thing as mentioned by Sam, The OSD's did not crash this time..
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2015:1183
*** Bug 1293832 has been marked as a duplicate of this bug. ***