Bug 1307177 - after upgrading from 1.2.3 to 1.3.0 Journal file sym link missing and osd is down
after upgrading from 1.2.3 to 1.3.0 Journal file sym link missing and osd is ...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Disk (Show other bugs)
1.2.3
Unspecified Unspecified
unspecified Severity unspecified
: rc
: 1.3.4
Assigned To: Loic Dachary
ceph-qe-bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-12 17:33 EST by Warren
Modified: 2017-07-30 10:58 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-05-30 12:51:30 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Error (1.61 KB, text/plain)
2016-02-12 17:33 EST, Warren
no flags Details

  None (edit)
Description Warren 2016-02-12 17:33:19 EST
Created attachment 1123652 [details]
Error

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Warren 2016-02-12 17:57:53 EST
Ugh.  I hit CR too early.

Description:
A few Ceph journal partitions were empty on a fairly large upgrade test.  (31 OSD hosts, 12 OSDs per host).  The upgrade was from 1.2.3 to 1.3.  The problems were not noticed until the 1.3 upgrade was in progress. 

What happened was that the partition for the journal disk appeared cleared before the upgrade on 3 separate OSDs.  It is quite possible that two of the errors may be due to a combination of operator error and possibly a known bug (tracker issues http://tracker.ceph.com/issues/9665 or http://tracker.ceph.com/issues/10375), but one of the errors we are not sure of.

We noticed this problem after the upgrade of one OSD host when the journal file's symlink was broken, causing an unhealthy ceph cluster.  The OSD did not come up because the journal link was missing.  We did not run into a problem until the upgrade, but it is unclear how long this link was bad before this point.

After noticing that the partitions were unavailable, we fixed the situation by using sgdisk to copy another partition to the clobbered partition, finding the old guid in the symlink name, and editing the partition's guid to match the original.

Note You need to log in before you can comment on or make changes to this bug.