Description of problem: When OSDs restart after upgrade from 1.3.x to 2.0 they can crash almost immediately. This could happen any number of times, but eventually the OSD will start normally. How reproducible: It should happen upon upgrade, but I haven't determined why we aren't seeing this in our testing.
To test the fix: 1. Bring up a Jewel cluster using filestore 2. Create a pool with size 1 3. Write a bunch of data to a pool 4. Stop the cluster 5. Find a */current osd dir which doesn't include the a pg_head for a valid PGs 6. Create pg_TEMP for that pg (e.g. mkdir ....current/1.5_TEMP 7. Find any other OSD current directory with a pg_head without a pg_TEMP 8. Rename the pg_head to pg_TEMP (e.g. mv ....current/1.0_head ...current/1.0_TEMP 9. Start cluster The 2 OSDs that were manipulated should NOT crash
We now fully understand the bug in Hammer which triggers this bug in later releases. An OSD must meet the following criteria: 1. An OSD has been marked out then back in or gets pg(s) pushed to it due to a different map change. This creates some pg_TEMP dirs 2. The pg_num/pgp_num is increased causing pgs to split including one with pg_TEMP 3. The OSD is NOT restarted prior to upgrade Workarounds: 1. Restart all ODSs before installing upgrade 2. With old OSD stopped and before starting an upgraded OSDs manually search for and rmdir all pg_TEMP directories without a corresponding pg_head directory (scripting this would be helpful) I would say that this isn't a blocker. The upstream fix is ready to merge having passed Rados suite testing. To avoid customers running in to this I advise including it in 2.0 if it isn't too late.
Fix merged to Upstream Jewel branch: https://github.com/ceph/ceph/pull/10561
accidentally moved to verified, changing it back to ON_QA
I followed the steps as mentioned in the comment 5. And after restarting the modified osds, they did not crash. here are the steps that I followed. 1. I selected a PG 5.0 which exists in osds [3,1,2]. so created a pg_head in osd.8 where it does not exits ( /current/5.0_head ) 2. then I selected another osd, osd.7. I searched for a pg without pg_temp . there were none, all had head associated with temp 3. so I selected pg 6.0_head and deleted it's 6.0_TEMP in that osd. And renamed 6.0_HEAD to 6.0_TEMP. pg 6.0_TEMP was empty before deleting the original dir. 4. I restarted the cluster and did some IOs and none of the osds crashed.. I am moving this bug to verified stage
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1755.html