Bug 1305812
Summary: | osd down during upgrade to 1.3.2 latest build | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | rakesh-gm <rgowdege> | ||||
Component: | Unclassified | Assignee: | ceph-eng-bugs <ceph-eng-bugs> | ||||
Status: | CLOSED NOTABUG | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 1.3.2 | CC: | hnallurv, kdreyer, ldachary, rgowdege | ||||
Target Milestone: | rc | ||||||
Target Release: | 1.3.3 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-02-15 14:31:44 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
It seems to indicate that during the upgrade the /dev/disk/by-partuuid/37a4cc7f-ec14-4a14-90b6-2c8db3caae18 symlink was broken. This symlink is not created by ceph itself but by the underlying operating system. I don't see how a ceph upgrade could damage it. Are you able to repeat the problem ? Hi Loic, I have testing this upgrade path again. I did not hit this issue and hence could reproduce. closing this as not a bug as of now. |
Created attachment 1122366 [details] mon and osd logs My cluster set up 3 mons 3 osds 1 admin initially installed ceph 1.3.2 of jn 16th build. and then updated today to ceph 1.3.2 build of feb 5th build. after update, one of the osd . i.e osd.4 in one node is down when checked with logs. it was not pointing to right journal, and hence the service was not starting. when i did ls -l in /var/lib/ceph/osd/ceph-4/ -rw-r--r-- 1 root root 490 Feb 5 14:56 activate.monmap -rw-r--r-- 1 root root 3 Feb 5 14:56 active -rw-r--r-- 1 root root 37 Feb 5 14:56 ceph_fsid drwxr-xr-x 190 root root 8192 Feb 8 09:29 current -rw-r--r-- 1 root root 37 Feb 5 14:56 fsid lrwxrwxrwx 1 root root 58 Feb 5 14:56 journal -> /dev/disk/by-partuuid/37a4cc7f-ec14-4a14-90b6-2c8db3caae18 -rw-r--r-- 1 root root 37 Feb 5 14:56 journal_uuid -rw------- 1 root root 56 Feb 5 14:56 keyring -rw-r--r-- 1 root root 21 Feb 5 14:56 magic -rw-r--r-- 1 root root 6 Feb 5 14:56 ready -rw-r--r-- 1 root root 4 Feb 5 14:56 store_version -rw-r--r-- 1 root root 53 Feb 5 14:56 superblock -rw-r--r-- 1 root root 0 Feb 5 14:56 upstart but by-partuuid shows it does not have journal ls -l /dev/disk/by-partuuid/ total 0 lrwxrwxrwx 1 root root 10 Feb 5 15:00 7f223239-d39b-4401-8794-78a00ee95c68 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Feb 8 11:03 c4110af4-b22a-43c1-869d-f4e01dc334aa -> ../../sdb2 lrwxrwxrwx 1 root root 10 Feb 8 11:05 e60c60ea-9076-48e2-8bf9-2683f067051d -> ../../sdd2 lrwxrwxrwx 1 root root 10 Feb 5 15:00 f588e1f6-ec3f-4dd7-a2df-1f5a7cb6b44f -> ../../sdd1 this is on ubuntu 14.04 I had kernel 3.16 and then updated to kernel 3.19. this bug was hit when kernel was in 3.16 and still existed when updated to 3.19 the osd.4 logs are attached