| Summary: | after upgrading from 1.2.3 to 1.3.0 Journal file sym link missing and osd is down | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Ceph Storage | Reporter: | Warren <wusui> | ||||
| Component: | Ceph-Disk | Assignee: | Loic Dachary <ldachary> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 1.2.3 | CC: | adeza, ceph-eng-bugs, kdreyer, tmuthami | ||||
| Target Milestone: | rc | ||||||
| Target Release: | 1.3.4 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-05-30 16:51:30 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Ugh. I hit CR too early. Description: A few Ceph journal partitions were empty on a fairly large upgrade test. (31 OSD hosts, 12 OSDs per host). The upgrade was from 1.2.3 to 1.3. The problems were not noticed until the 1.3 upgrade was in progress. What happened was that the partition for the journal disk appeared cleared before the upgrade on 3 separate OSDs. It is quite possible that two of the errors may be due to a combination of operator error and possibly a known bug (tracker issues http://tracker.ceph.com/issues/9665 or http://tracker.ceph.com/issues/10375), but one of the errors we are not sure of. We noticed this problem after the upgrade of one OSD host when the journal file's symlink was broken, causing an unhealthy ceph cluster. The OSD did not come up because the journal link was missing. We did not run into a problem until the upgrade, but it is unclear how long this link was bad before this point. After noticing that the partitions were unavailable, we fixed the situation by using sgdisk to copy another partition to the clobbered partition, finding the old guid in the symlink name, and editing the partition's guid to match the original. |
Created attachment 1123652 [details] Error Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: