Description of problem:
When OSDs restart after upgrade from 1.3.x to 2.0 they can crash almost immediately. This could happen any number of times, but eventually the OSD will start normally.
It should happen upon upgrade, but I haven't determined why we aren't seeing this in our testing.
To test the fix:
1. Bring up a Jewel cluster using filestore
2. Create a pool with size 1
3. Write a bunch of data to a pool
4. Stop the cluster
5. Find a */current osd dir which doesn't include the a pg_head for a valid PGs
6. Create pg_TEMP for that pg (e.g. mkdir ....current/1.5_TEMP
7. Find any other OSD current directory with a pg_head without a pg_TEMP
8. Rename the pg_head to pg_TEMP (e.g. mv ....current/1.0_head ...current/1.0_TEMP
9. Start cluster
The 2 OSDs that were manipulated should NOT crash
We now fully understand the bug in Hammer which triggers this bug in later releases.
An OSD must meet the following criteria:
1. An OSD has been marked out then back in or gets pg(s) pushed to it due to a different map change. This creates some pg_TEMP dirs
2. The pg_num/pgp_num is increased causing pgs to split including one with pg_TEMP
3. The OSD is NOT restarted prior to upgrade
1. Restart all ODSs before installing upgrade
2. With old OSD stopped and before starting an upgraded OSDs manually search for and rmdir all pg_TEMP directories without a corresponding pg_head directory (scripting this would be helpful)
I would say that this isn't a blocker. The upstream fix is ready to merge having passed Rados suite testing. To avoid customers running in to this I advise including it in 2.0 if it isn't too late.
Fix merged to Upstream Jewel branch: https://github.com/ceph/ceph/pull/10561
accidentally moved to verified, changing it back to ON_QA
I followed the steps as mentioned in the comment 5. And after restarting the modified osds, they did not crash. here are the steps that I followed.
1. I selected a PG 5.0 which exists in osds [3,1,2]. so created a pg_head in osd.8 where it does not exits ( /current/5.0_head )
2. then I selected another osd, osd.7. I searched for a pg without pg_temp . there were none, all had head associated with temp
3. so I selected pg 6.0_head and deleted it's 6.0_TEMP in that osd. And renamed 6.0_HEAD to 6.0_TEMP. pg 6.0_TEMP was empty before deleting the original dir.
4. I restarted the cluster and did some IOs and none of the osds crashed..
I am moving this bug to verified stage
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.